Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data sync] Make data sync less error prone and easier to maintain #2840

Closed
11 tasks
gino-m opened this issue Nov 17, 2024 · 5 comments
Closed
11 tasks

[Data sync] Make data sync less error prone and easier to maintain #2840

gino-m opened this issue Nov 17, 2024 · 5 comments
Assignees
Labels
type: code health Improvements to readability or robustness of codebase
Milestone

Comments

@gino-m
Copy link
Collaborator

gino-m commented Nov 17, 2024

The current data sync workflow is complicated, making it hard to understand and prone to errors (#2726, #2377, #2751, #2684). This is arguably most critical part of the system, so improving this should be P0+ priority.

The main components in question are LocalMutationSyncWorker, MediaUploadWorker, and SyncStatusViewModel and related fragment and repositories.

Suggestions:

  • Expose a repository method to return mutations grouped by a new UploadQueueEntry model object, grouping create LOI+submission mutations together, sorted in reverse chronological order (oldest first, FIFO).
  • Update LocalMutationSyncWorker as follows:
    • Use the above repo method to look at the entire queue on each run rather than just for one LOI.
    • Look at all incomplete mutations, not just FAILED. This will capture stuck operations. Since there will now only be at most one LocalMutationSyncWorker running at a time, this should be safe.
    • Treat each Operation as an atomic batch, let mutations in the operation fail or succeed together.
    • If any operation failed, retry() the worker.
  • In addition to triggering LocalMutationSyncWorker on "submit", also trigger periodically (15 min?) to catch any "stuck" mutations, ie in case the worker crashed before it could retry() for some reason.
  • Stop using "delete+insert" instead of "update" throughout and enable cascading on all foreign keys to guarantee consistency of local db and prevent accidental deletion of data.
  • Ensure there's adequate unit test coverage for main use cases as well as all edge cases.
  • Update the sync status screen to use the new UploadQueueEntry s returned by the repo.
  • Simplify Mutation.SyncStatus and split off UploadQueueEntry.Status

See also #2235

@shobhitagarwal1612 @scolsen @sufyanAbbasi @jabramowitz5 @lecrabe @jo-spek @kenstershiro I would suggest prioritizing this over all other feature requests. Let's discuss.

@gino-m gino-m added type: code health Improvements to readability or robustness of codebase type: fr Request for new feature labels Nov 17, 2024
@kenstershiro kenstershiro added this to the GA release milestone Nov 17, 2024
@kenstershiro kenstershiro moved this to Todo in Ground Nov 17, 2024
@gino-m gino-m added the for pm/ux review Product decision needed label Nov 18, 2024
@gino-m
Copy link
Collaborator Author

gino-m commented Nov 20, 2024

@shobhitagarwal1612 Let's clean split this work up and make this airtight.

@shobhitagarwal1612
Copy link
Member

Should we also add a manual button to trigger data sync?

@gino-m
Copy link
Collaborator Author

gino-m commented Nov 26, 2024

Should we also add a manual button to trigger data sync?

The current behavior is to automatically retry whenever a connection is available, and I plan to add logic to periodically check the queue in case the last sync crashed. In that case, what purpose would the button serve? My concern is that by adding a button users will think they need to click it to have data be uploaded.

gino-m added a commit that referenced this issue Nov 26, 2024
* Refactor photo data accessor fun

* Import constants for readability

* Delete obsolete TODO

* Static import

* Tweak ktdoc

* Lift ktdoc into interface
@gino-m
Copy link
Collaborator Author

gino-m commented Dec 2, 2024

A few action items remaining after #2867:

Essential cleanup:

Nice-to-have:

  • [PhotoSync] Track upload/retry status of individual photos #2120, potentially adding entries to a new table in the local db only once LocalMutationSyncWorker is successful. Once we do this, we can then only retry failed photos rather than all photos in a submission. This will be useful in jobs with more than one photo task.

@gino-m
Copy link
Collaborator Author

gino-m commented Dec 3, 2024

A few action items remaining after #2867:

Essential cleanup:

Nice-to-have:

  • [PhotoSync] Track upload/retry status of individual photos #2120, potentially adding entries to a new table in the local db only once LocalMutationSyncWorker is successful. Once we do this, we can then only retry failed photos rather than all photos in a submission. This will be useful in jobs with more than one photo task.

Closing this issue in favor of the above cleanup tickets. W00t!

@gino-m gino-m closed this as completed Dec 3, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in Ground Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: code health Improvements to readability or robustness of codebase
Projects
Status: Done
Development

No branches or pull requests

3 participants