-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(proguard): associate release with proguard mapping files #48511
feat(proguard): associate release with proguard mapping files #48511
Conversation
This PR has a migration; here is the generated SQL for --
-- Create model ProguardArtifactRelease
--
CREATE TABLE "sentry_proguardartifactrelease" ("id" bigserial NOT NULL PRIMARY KEY, "organization_id" bigint NOT NULL, "project_id" bigint NOT NULL, "release_name" varchar(250) NOT NULL, "proguard_uuid" uuid NOT NULL, "date_added" timestamp with time zone NOT NULL);
CREATE UNIQUE INDEX CONCURRENTLY "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" ON "sentry_proguardartifactrelease" ("organization_id", "project_id", "release_name");
ALTER TABLE "sentry_proguardartifactrelease" ADD CONSTRAINT "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" UNIQUE USING INDEX "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq";
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_organization_id_bb2944ba" ON "sentry_proguardartifactrelease" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_project_id_d890d161" ON "sentry_proguardartifactrelease" ("project_id"); |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #48511 +/- ##
==========================================
- Coverage 80.94% 80.93% -0.02%
==========================================
Files 4819 4819
Lines 201975 202326 +351
Branches 11446 11500 +54
==========================================
+ Hits 163492 163750 +258
- Misses 38228 38321 +93
Partials 255 255
|
This PR has a migration; here is the generated SQL for --
-- Create model ProguardArtifactRelease
--
CREATE TABLE "sentry_proguardartifactrelease" ("id" bigserial NOT NULL PRIMARY KEY, "organization_id" bigint NOT NULL, "project_id" bigint NOT NULL, "release_name" varchar(250) NOT NULL, "proguard_uuid" uuid NOT NULL, "date_added" timestamp with time zone NOT NULL);
CREATE UNIQUE INDEX CONCURRENTLY "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" ON "sentry_proguardartifactrelease" ("organization_id", "project_id", "release_name");
ALTER TABLE "sentry_proguardartifactrelease" ADD CONSTRAINT "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq" UNIQUE USING INDEX "sentry_proguardartifactr_organization_id_project__0a751b1a_uniq";
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_organization_id_bb2944ba" ON "sentry_proguardartifactrelease" ("organization_id");
CREATE INDEX CONCURRENTLY "sentry_proguardartifactrelease_project_id_d890d161" ON "sentry_proguardartifactrelease" ("project_id"); |
proguard_uuid = request.data.get("proguard_uuid") | ||
if not all([release_name, proguard_uuid]): | ||
return Response( | ||
data={"error": "Missing required fields"}, status=status.HTTP_400_BAD_REQUEST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to let the user know which fields are required.
) | ||
releases = releases.values() | ||
|
||
return Response(list(releases)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be wary of top level lists in response payloads. It makes API responses much harder to evolve as you can't add new keys for additional metadata. For newer endpoints we have been making the response bodies a dictionary with a single key for the results.
src/sentry/models/debugfile.py
Outdated
organization_id = BoundedBigIntegerField(db_index=True) | ||
project_id = BoundedBigIntegerField(db_index=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While it can be done in a future pull request. It would be good to have this model added to project deletions so that when customers remove a project proguard metadata is also removed.
|
||
organization_id = BoundedBigIntegerField(db_index=True) | ||
project_id = BoundedBigIntegerField(db_index=True) | ||
release_name = models.CharField(max_length=250) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use the release name here vs a link to the actual release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wanted, the goal here, like with sourcemaps, is to have a weak release association. With this association, you will optimistically tell the user to specify a future release that will be created later on down the pipeline. Ofc this won't result in any database consistency mechanisms helping us to maintain data consistency but we aligned on such an approach.
4c342fa
to
6c08935
Compare
], | ||
options={ | ||
"db_table": "sentry_proguardartifactrelease", | ||
"unique_together": {("organization_id", "project_id", "release_name")}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition here is not uptodate with the one in the model file. In particular, proguard_uuid
is missing here.
Also, the project implies the org, so putting it into this uniqueness constraint does not make too much sense.
src/sentry/models/debugfile.py
Outdated
organization_id = BoundedBigIntegerField(db_index=True) | ||
project_id = BoundedBigIntegerField(db_index=True) | ||
release_name = models.CharField(max_length=250) | ||
proguard_uuid = models.UUIDField() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this is "just a uuid", it does not have a link back to the uploaded proguard file.
We are working on eventually expiring and auto-removing expiring debug files, which also applies to proguard files.
So the underlying proguard projectdsymfile
may be going away, and you will be left with some "dead" rows here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This detail highly depends on the reading assumptions that the application layer does. Seeing how information is queried, what @Swatinem says it's very good. Either we enforce constraints at the db level or we make sure that the application will do it.
I personally would prefer to link to the file that contains the proguard_uuid
and have the db handle the consistency requirements. Of course you would also keep the proguard_uuid
, similarly to how the DebugIdArtifactBundle
table is designed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would definitely link to the ProjectDebugFile
, and also query it on post
. Right now you can just post
any random uuid / release_name, and there is no validation whatsoever, and no cleanup. So malicious users could just spam random uuids to fill up the database with junk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, makes sense, thx for taking a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem still exists, but to a lesser extent with loose release association, also relevant for artifact bundles (@iambriccardo):
malicious users could spam random release names and fill up the database.
However if you have a strong relationship to the ProjectDebugFile
, or an ArtifactBundle
, that junk is at least being cleaned up once the debug file expires and is being cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, I would now try to figure out how to deal with the File
object which contains the proguard_uuid
.
src/sentry/models/debugfile.py
Outdated
organization_id = BoundedBigIntegerField(db_index=True) | ||
project_id = BoundedBigIntegerField(db_index=True) | ||
release_name = models.CharField(max_length=250) | ||
proguard_uuid = models.UUIDField() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This detail highly depends on the reading assumptions that the application layer does. Seeing how information is queried, what @Swatinem says it's very good. Either we enforce constraints at the db level or we make sure that the application will do it.
I personally would prefer to link to the file that contains the proguard_uuid
and have the db handle the consistency requirements. Of course you would also keep the proguard_uuid
, similarly to how the DebugIdArtifactBundle
table is designed.
src/sentry/models/debugfile.py
Outdated
class ProguardArtifactRelease(Model): # type: ignore | ||
__include_in_export__ = False | ||
|
||
organization_id = BoundedBigIntegerField(db_index=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding indexing, I would suggest to reason about the size of the result set once an index run is done, that is, if I use the org_id
in the index, will I get a very small amount of rows to filter sequentially? This depends on data distribution and I have no idea about that, so you might try to think what is the best approach here.
I would expect that having an index on project and org as you did is fine, unless the proguard_uuid
has mostly unique entries, in that case, it would be better to add an index there.
Of course the best solution would be to use a composite index but it would be useful only in case we always query nearly all composite fields and each field has a very high cardinality, in the other cases it will just occupy more space on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case proguard uuids are mostly unique. Thx for the insight
src/sentry/models/debugfile.py
Outdated
@@ -363,6 +363,24 @@ def _analyze_progard_filename(filename: str) -> Optional[str]: | |||
return None | |||
|
|||
|
|||
@region_silo_only_model | |||
class ProguardArtifactRelease(Model): # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the # type: ignore
here? mypy should be fine with subclassing Model
now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup should not be there, thx
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work!
Ref: getsentry/sentry-android-gradle-plugin#40
This allows to create a weak association between a release and a proguard mapping file