Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize model run listing #520

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

zachmullen
Copy link
Contributor

@zachmullen zachmullen commented Oct 16, 2024

  • Remove unnecessary aliases
  • Remove ground truth aggregate computation and rely on ModelRun's ground_truth field instead
  • Short-circuit matching Region lookup (not in hot path)
  • Remove caching infrastructure since this branch should obviate it
  • Add denormalized fields to ModelRun for caching expensive summary statistics
  • Add method to compute the new fields on a single ModelRun
  • Add function to compute the new fields on all ModelRuns (mostly for use during migration)
  • Call computation function in all places where data might change

* Remove unnecessary aliases
* Remove ground truth aggregate computation and rely on ModelRun's ground_truth field instead
* Short-circuit matching Region lookup (not in hot path)
* Remove caching infrastructure since this branch should obviate it
@zachmullen zachmullen force-pushed the optimize-model-run-list-endpoint branch from ac50b12 to 8d72baf Compare October 17, 2024 17:16
@zachmullen
Copy link
Contributor Author

As of 8d72baf, total response time of all queries for the model run listing endpoint is down to tens of ms:

Screenshot 2024-10-17 at 1 17 04 PM

Quite a few further optimizations are possible.

  • We only need to compute the aggregate bounding box and time range when filters change, not when a new page of results is requested.
  • We could also cache the properties of model runs that require joins/subqueries against other model runs, but that would likely require much more complicated invalidation logic. If the cardinality of model runs stays in the hundreds, this isn't worth doing.
  • We could index certain fields of model runs, but again that won't be necessary if the number of model runs is on the order of hundreds.
  • We could switch to cursor-based pagination, but that would have negligible impact until there are far more model runs in the database.

The thing remaining to do on this issue is to add the calls to the stats computation method in any of the places we need to.

@@ -78,6 +78,12 @@ def __init__(self, field):
return super().__init__(json_str, JSONField()) # noqa: B037


class AsGeoJSONDeserialized(Cast):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's an easier way of getting AsGeoJSON's result into a deserialized python object, I'd love to know it.

@zachmullen
Copy link
Contributor Author

This is ready for review. It contains a breaking change for model runs that are created through the REST API rather than processed via the ModelRunUpload pathway. Namely, once all data underneath a model run has been added, it is the clients' responsibility to POST to the new /api/model-runs/{id}/finalization/ endpoint, which will ensure all the relevant stats are computed and saved on the model run.

@mvandenburgh
Copy link
Member

This is ready for review. It contains a breaking change for model runs that are created through the REST API rather than processed via the ModelRunUpload pathway. Namely, once all data underneath a model run has been added, it is the clients' responsibility to POST to the new /api/model-runs/{id}/finalization/ endpoint, which will ensure all the relevant stats are computed and saved on the model run.

Sorry, I missed this message @zachmullen. I'll take a look at this

@mvandenburgh mvandenburgh self-requested a review November 1, 2024 14:49
@zachmullen zachmullen marked this pull request as ready for review November 4, 2024 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants