Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(proguard): associate release with proguard mapping files #48511

Merged
merged 27 commits into from
Jul 14, 2023

Conversation

buenaflor
Copy link
Contributor

Ref: getsentry/sentry-android-gradle-plugin#40

This allows to create a weak association between a release and a proguard mapping file

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 4, 2023

This PR has a migration; here is the generated SQL for src/sentry/migrations/0436_add_proguard_release_association.py ()

--
-- Create model ProguardArtifactRelease
--
CREATE TABLE "sentry_proguardartifactrelease" ("id" bigserial NOT NULL PRIMARY KEY, "organization_id" bigint NOT NULL, "project_id" bigint NOT NULL, "release_name" varchar(250) NOT NULL, "proguard_uuid" uuid NOT NULL, "date_added" timestamp with time zone NOT NULL);
CREATE UNIQUE INDEX CONCURRENTLY "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" ON "sentry_proguardartifactrelease" ("organization_id", "project_id", "release_name");
ALTER TABLE "sentry_proguardartifactrelease" ADD CONSTRAINT "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" UNIQUE USING INDEX "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq";
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_organization_id_bb2944ba" ON "sentry_proguardartifactrelease" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_project_id_d890d161" ON "sentry_proguardartifactrelease" ("project_id");

@codecov
Copy link

codecov bot commented May 4, 2023

Codecov Report

Merging #48511 (999226d) into master (e295002) will decrease coverage by 0.02%.
The diff coverage is 97.29%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #48511      +/-   ##
==========================================
- Coverage   80.94%   80.93%   -0.02%     
==========================================
  Files        4819     4819              
  Lines      201975   202326     +351     
  Branches    11446    11500      +54     
==========================================
+ Hits       163492   163750     +258     
- Misses      38228    38321      +93     
  Partials      255      255              
Impacted Files Coverage Δ
src/sentry/api/urls.py 100.00% <ø> (ø)
src/sentry/api/endpoints/debug_files.py 86.60% <95.83%> (+1.03%) ⬆️
src/sentry/models/debugfile.py 74.56% <100.00%> (+0.99%) ⬆️

... and 54 files with indirect coverage changes

@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2023

This PR has a migration; here is the generated SQL for src/sentry/migrations/0438_add_proguard_release_association.py ()

--
-- Create model ProguardArtifactRelease
--
CREATE TABLE "sentry_proguardartifactrelease" ("id" bigserial NOT NULL PRIMARY KEY, "organization_id" bigint NOT NULL, "project_id" bigint NOT NULL, "release_name" varchar(250) NOT NULL, "proguard_uuid" uuid NOT NULL, "date_added" timestamp with time zone NOT NULL);
CREATE UNIQUE INDEX CONCURRENTLY "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" ON "sentry_proguardartifactrelease" ("organization_id", "project_id", "release_name");
ALTER TABLE "sentry_proguardartifactrelease" ADD CONSTRAINT "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" UNIQUE USING INDEX "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq";
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_organization_id_bb2944ba" ON "sentry_proguardartifactrelease" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_project_id_d890d161" ON "sentry_proguardartifactrelease" ("project_id");

@buenaflor buenaflor marked this pull request as ready for review May 10, 2023 09:48
@buenaflor buenaflor requested review from a team as code owners May 10, 2023 09:48
@buenaflor buenaflor requested review from iambriccardo and removed request for a team May 10, 2023 09:48
proguard_uuid = request.data.get("proguard_uuid")
if not all([release_name, proguard_uuid]):
return Response(
data={"error": "Missing required fields"}, status=status.HTTP_400_BAD_REQUEST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to let the user know which fields are required.

)
releases = releases.values()

return Response(list(releases))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be wary of top level lists in response payloads. It makes API responses much harder to evolve as you can't add new keys for additional metadata. For newer endpoints we have been making the response bodies a dictionary with a single key for the results.

Comment on lines 369 to 370
organization_id = BoundedBigIntegerField(db_index=True)
project_id = BoundedBigIntegerField(db_index=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it can be done in a future pull request. It would be good to have this model added to project deletions so that when customers remove a project proguard metadata is also removed.


organization_id = BoundedBigIntegerField(db_index=True)
project_id = BoundedBigIntegerField(db_index=True)
release_name = models.CharField(max_length=250)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use the release name here vs a link to the actual release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wanted, the goal here, like with sourcemaps, is to have a weak release association. With this association, you will optimistically tell the user to specify a future release that will be created later on down the pipeline. Ofc this won't result in any database consistency mechanisms helping us to maintain data consistency but we aligned on such an approach.

@github-actions github-actions bot added the Scope: Frontend Automatically applied to PRs that change frontend components label Jul 11, 2023
@buenaflor buenaflor force-pushed the feat/associate-release-with-proguard-mapping-files branch from 4c342fa to 6c08935 Compare July 11, 2023 13:41
],
options={
"db_table": "sentry_proguardartifactrelease",
"unique_together": {("organization_id", "project_id", "release_name")},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition here is not uptodate with the one in the model file. In particular, proguard_uuid is missing here.
Also, the project implies the org, so putting it into this uniqueness constraint does not make too much sense.

organization_id = BoundedBigIntegerField(db_index=True)
project_id = BoundedBigIntegerField(db_index=True)
release_name = models.CharField(max_length=250)
proguard_uuid = models.UUIDField()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is "just a uuid", it does not have a link back to the uploaded proguard file.
We are working on eventually expiring and auto-removing expiring debug files, which also applies to proguard files.

So the underlying proguard projectdsymfile may be going away, and you will be left with some "dead" rows here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This detail highly depends on the reading assumptions that the application layer does. Seeing how information is queried, what @Swatinem says it's very good. Either we enforce constraints at the db level or we make sure that the application will do it.

I personally would prefer to link to the file that contains the proguard_uuid and have the db handle the consistency requirements. Of course you would also keep the proguard_uuid, similarly to how the DebugIdArtifactBundle table is designed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would definitely link to the ProjectDebugFile, and also query it on post. Right now you can just post any random uuid / release_name, and there is no validation whatsoever, and no cleanup. So malicious users could just spam random uuids to fill up the database with junk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, makes sense, thx for taking a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem still exists, but to a lesser extent with loose release association, also relevant for artifact bundles (@iambriccardo):
malicious users could spam random release names and fill up the database.
However if you have a strong relationship to the ProjectDebugFile, or an ArtifactBundle, that junk is at least being cleaned up once the debug file expires and is being cleaned up.

Copy link
Member

@iambriccardo iambriccardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I would now try to figure out how to deal with the File object which contains the proguard_uuid.

organization_id = BoundedBigIntegerField(db_index=True)
project_id = BoundedBigIntegerField(db_index=True)
release_name = models.CharField(max_length=250)
proguard_uuid = models.UUIDField()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This detail highly depends on the reading assumptions that the application layer does. Seeing how information is queried, what @Swatinem says it's very good. Either we enforce constraints at the db level or we make sure that the application will do it.

I personally would prefer to link to the file that contains the proguard_uuid and have the db handle the consistency requirements. Of course you would also keep the proguard_uuid, similarly to how the DebugIdArtifactBundle table is designed.

class ProguardArtifactRelease(Model): # type: ignore
__include_in_export__ = False

organization_id = BoundedBigIntegerField(db_index=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding indexing, I would suggest to reason about the size of the result set once an index run is done, that is, if I use the org_id in the index, will I get a very small amount of rows to filter sequentially? This depends on data distribution and I have no idea about that, so you might try to think what is the best approach here.

I would expect that having an index on project and org as you did is fine, unless the proguard_uuid has mostly unique entries, in that case, it would be better to add an index there.

Of course the best solution would be to use a composite index but it would be useful only in case we always query nearly all composite fields and each field has a very high cardinality, in the other cases it will just occupy more space on disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case proguard uuids are mostly unique. Thx for the insight

@buenaflor buenaflor added Status: In Progress and removed Scope: Frontend Automatically applied to PRs that change frontend components Status: Stale labels Jul 12, 2023
@@ -363,6 +363,24 @@ def _analyze_progard_filename(filename: str) -> Optional[str]:
return None


@region_silo_only_model
class ProguardArtifactRelease(Model): # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the # type: ignore here? mypy should be fine with subclassing Model now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup should not be there, thx

Copy link
Member

@iambriccardo iambriccardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great work!

@buenaflor buenaflor merged commit 2324f4e into master Jul 14, 2023
@buenaflor buenaflor deleted the feat/associate-release-with-proguard-mapping-files branch July 14, 2023 09:21
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants