-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot, summary, Tree entries: Need standardization and reduction of concepts. #4683
Comments
@arinwt - would love you taking a look and providing feedback, |
tagging @agarwal-navin |
Yes, I agree that we need to standardize summaries and snapshots. In my GC work, I came across these and got confused as well. More than that, I had to add the GC data across these formats and make it compatible and make sure there are no bugs. Standardizing this will help making changes across this space easier and less error prone. IMO, there are 2 major chunks of work that need to be cleaned up -
|
This issue has been automatically marked as stale because it has had no activity for 180 days. It will be closed if no further activity occurs within 8 days of this comment. Thank you for your contributions to Fluid Framework! |
@scarlettjlee, since you have been poking in this area (indirectly), I hope you do not mind if I assign this item to you. We can chat more about details, priority, and what else needs to be included into this Epic. |
@scarlettjlee I added #4728 to this list. Please feel free to remove it from here if it seems not relevant to this epic. |
@vladsud & @agarwal-navin, could you review the updated [sub-]issues list in the original comment up top? |
Looks reasonable to me (though it does feel similar to what it used to be when I opened it, maybe minus sub-items). The opposite of that would be something like commits on ISnapshotTree. It's not used, and will not be used in runtime, so it's the cleanup that we would love to do, but it's not becoming more expensive over time. I'd also optimize towards consistency. We should consistently use "summary" anywhere we mean upload, and snapshot anywhere we mean download format. That should be consistent in comments, names of functions, interfaces. |
@vladsud, I reorganized the task list to try to categorize them. A bunch of the desired work has already happened since this item was first opened. I'd focus any concerted effort toward getting rid of ITree, especially #9227. And then unifying snapshot and summary (which I haven't thought much about yet). The cleanup tasks can be done whenever and aren't causing much trouble (maybe some "good first issue"s that we could do opportunistically). There's some other minor misusage of terms in the codebase that I don't think are serious but would be nice to fix, (like "snapshot" in matrix DDS to get DDS content that is not a snapshot type). |
Related: #8979 |
Thanks for clarification! If you feel like new tasks are very clear and Ok for new hires, please mark them as first issue. |
Our File I/O types are very confusing - it's very hard to follow code on where we use certain types and why:
ISnapshotTree - used when loading from storage, could have name collisions between blobs & trees.
ITree / ITreeEntry - entries, could have name collisions; used when attaching DDS / data store, also used on legacy path
git.ITree - somewhat similar shape as above, mostly used in R11s, but leaks to various helper functions
ISummaryTree - summarization process, no collisions, supports symbolic reuse of content (through handle + path used as a reference)
We have a bunch of helper functions to convert from one type to another:
convertSnapshotTreeToSummaryTree: ISnapshotTree -> ISummaryTreeWithStats
convertToSummaryTree: ITree -> ISummarizeResult
convertSummaryTreeToITree : ISummaryTree -> ITree
buildSnapshotTree: ITree -> ISnapshotTree
While we can't completely remove all these types from runtime (as some are tied to file format), I believe we could reduce it to a minimum, by always converting any time to summary type and all the code working only with summaries.
This would substantially reduce mental workload when working with all these layers.
This work can be done incrementally, but adding more converter functions to support any to any conversions, and thus allowing us to address layer by layer, keeping system functional during this process.
Somewhat good example why not acting causes more types used even on same path:
We summarize using summaries. But because we load using snapshots, IRuntime.stop() method uses snapshot() path to get snapshot of container, forcing us to have both types of serialization. Ideally both save & load would use same format, but if they can't, we should serialize only in one format, and use helper functions (here - at Container level) to convert to write format.
Completed
Independent cleanup tasks
These can be done at any time.
Eliminate ITree usages
This gets us close to being able to remove ITree from runtime. There might still be some usages in tests and utils.
We could consider renaming since "snapshot" and "summary" indicate nothing about the directional distinction.
The text was updated successfully, but these errors were encountered: