Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc grows without any changes #31

Open
2 tasks done
raineorshine opened this issue Jun 12, 2023 · 11 comments
Open
2 tasks done

Doc grows without any changes #31

raineorshine opened this issue Jun 12, 2023 · 11 comments
Assignees
Labels

Comments

@raineorshine
Copy link

raineorshine commented Jun 12, 2023

Checklist

Describe the bug

The size of the object store grows every time the page is refreshed. This occurs because the doc is encoded as an update and appended to the object store every time IndexeddbPersistence is instantiated.

The db is compacted after reaching the PREFERRED_TRIM_SIZE whenever an update is stored. However, this assumes 1) a small number of Docs, and 2) Docs will be modified. While reasonable on the surface, neither of these assumptions hold in all cases. I currently instantiate thousands of Docs for an offline-first graph database built on YJS. Many hundreds are instantiated and destroyed dynamically as the user navigates the graph, and some may never be modified.

To Reproduce

Minimal example:

https://codesandbox.io/p/sandbox/pedantic-euler-jje92f?file=%2Fsrc%2FApp.tsx%3A1%2C1-2%2C1

Notice that no modifications are made to the Doc, yet it grows in size.

Expected behavior

The persisted Doc should not grow indefinitely when no changes are made.

Environment Information

  • Browser: Chrome
  • Yjs: v13.6.2
  • y-indexeddb: v9.0.11

Additional context

I can see that the behavior was introduced in this commit.

One possible solution is adding the condition if (updates.length === 0) to beforeApplyUpdatesCallback, as it seems to only be needed on a new Doc. That stops the infinite growth, and the tests pass, however I don't know the full implications of this change. If beforeApplyUpdatesCallback needs to be invoked in other cases that are not covered by the tests that would be important to know.

Is there a condition that detects when beforeApplyUpdatesCallback does not need to be called? I have tried skipping it when the Doc is empty, but that does not work.

@himself65
Copy link

Maybe this is related to #25

@raineorshine
Copy link
Author

raineorshine commented Jun 14, 2023

Yes, it is the same cause. One difference though is that I never instantiate more than one IndexeddbPersistence per Doc. I just have a lot of Docs.

Our one need is to support connect and disconnect many times. For example we will have many ydoc in storage but only enable the foreground ydoc.

Exactly.

Perhaps if I better understand the purpose of beforeApplyUpdatesCallback I can craft a more fine-tuned solution. Infinite growth of hundreds of unmodified Doc is less than ideal.

The other approach is to compact the db much more aggressively, though my hope is that this won't be necessary.

@himself65 Can you share the source for https://www.npmjs.com/package/@toeverything/y-indexeddb? I don't see it on your GitHub.

@himself65
Copy link

@himself65 Can you share the source for https://www.npmjs.com/package/@toeverything/y-indexeddb? I don't see it on your GitHub.

https://github.com/toeverything/AFFiNE/tree/master/packages/y-indexeddb

It's open source, I might should update package.json later

raineorshine added a commit to raineorshine/y-indexeddb that referenced this issue Jun 16, 2023
DinosaurDad added a commit to DinosaurDad/y-indexeddb that referenced this issue Jun 17, 2023
No need to add an update in the constructor because we are just loading the document. This commit adds a no-op `beforeApplyUpdatesCallback` handler to avoid creating a spurious update.
@raineorshine raineorshine mentioned this issue Jun 18, 2023
@jeffrafter
Copy link

Just wanted to leave a note here that this conversation continued on the discuss board:

https://discuss.yjs.dev/t/refreshing-page-causes-y-indexeddb-to-accumulate-db-entries/984/17

@raineorshine your solution looked good:

I have resorted to a more brute force approach of triggering a database compaction on the initial sync. You can do this by setting PREFERRED_TRIM_SIZE = 0, or calling a debounced storeUpdate(this, true) after the initial sync. Here is an example of the latter approach, which I am currently using: raineorshine@178718b

I wonder if you are still using that approach?

@raineorshine
Copy link
Author

Yes, the initial compaction on load has been working well for me. Performance seems fine anecdotally (with hundreds of Docs) though I haven't done any measurements.

@HMarzban
Copy link

I've noticed that the IndexedDB size keeps growing after each reload. Although this increase does not immediately affect performance, as @raineorshine mentioned, it could become a major issue over time.

@raineorshine, what's your solution to prevent this issue, and how do you handle it in production?
@dmonad, what do you suggest we do? Should we continue using this version, or can we expect an update soon?

@raineorshine
Copy link
Author

@HMarzban My solution is here: raineorshine@178718b.

I've been using it for a while now and it works well. I haven't measured the performance impact, but one IndexedDB write on load doesn't seem very significant.

@HMarzban
Copy link

HMarzban commented Jan 1, 2024

@HMarzban My solution is here: raineorshine@178718b.

I've been using it for a while now and it works well. I haven't measured the performance impact, but one IndexedDB write on load doesn't seem very significant.

Oh, great job, sounds straightforward. did you submit a PR for your solution?

@raineorshine
Copy link
Author

No, because it comes with a performance cost and may not be the right solution for everybody. It could be opt-in though.

disarticulate added a commit to disarticulate/y-indexeddb that referenced this issue Jun 12, 2024
start update after sync to avoid yjs#25 and yjs#31 and listen for updates after the sync to avoid duplicating updates.

Client will need to ensure no updates are made before sync is completed. A more robust implementation would need to queue updates.
@disarticulate
Copy link

disarticulate commented Jun 12, 2024

Hey all @raineorshine @HMarzban @jeffrafter @himself65 I forked this and implemented what I think is the obvious bug here:

master...disarticulate:y-indexeddb:patch-1

In short, the constructor function calls to initiate then doc sync:

this._db.then(db => {
  [...]
  fetchUpdates(this, beforeApplyUpdatesCallback, afterApplyUpdatesCallback)
}

This is an async call; before this is completed a listener is already attached via:
doc.on('update', this._storeUpdate)

So the update itself gets stored when it's already listening for updates, hence the duplication. I've only implemented the minimum fix here, but a more robust implementation would add a pre-sync function listener doc.on('update', this._queueUpdate) for the time between constructor and whenSynced; apply any updates received, then replace doc.on('update', this._storeUpdate.

However, it seems like if you simply ensure the client always calls whenSynced before changes are made, it should work. I'll be testing it shortly.

@disarticulate
Copy link

disarticulate commented Jun 12, 2024

Looks like I'm still getting a update every time I refresh, but I need to create a minimal reproduction without other events occurring.

A minimal implement still gets a update. The next suspect is fetchUpdates, here:

      beforeApplyUpdatesCallback(updatesStore)
      Y.transact(idbPersistence.doc, () => {
        updates.forEach(val => Y.applyUpdate(idbPersistence.doc, val))
      }, idbPersistence, false)
      afterApplyUpdatesCallback(updatesStore)

The problem with afterApplyUpdatesCallback is it likely occurs before the updates are completed, because the transaction will be scheduled later. The solution i've implemented elsewhere in client code is to create a origin system as Y.applyUpdate takes an optional origina object. However, then you're checking for the init origin every time you get an update, althought it might suffice to simply swap function listeners once you pass the whenSynced stage.

It might actually suffice to just put afterApplyUpdatesCallback(updatesStore) inside the transaction, but I'll need to test that later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants