Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mfs: clean cache on sync #751

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

mfs: clean cache on sync #751

wants to merge 1 commit into from

Conversation

hsanjuan
Copy link
Contributor

There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder.

One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes.

The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed.

When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work.

Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items.

This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now.

The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time.

I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.

@hsanjuan hsanjuan requested a review from a team as a code owner December 12, 2024 15:26
There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder.

One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes.

The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed.

When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes  in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work.

Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items.

This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now.

The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time.

I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.
@hsanjuan hsanjuan changed the title mfs: limit directory cache size mfs: clean cache on sync Dec 12, 2024
@@ -99,10 +99,11 @@ func (d *Directory) localUpdate(c child) (*dag.ProtoNode, error) {
d.lock.Lock()
defer d.lock.Unlock()

err := d.updateChild(c)
err := d.unixfsDir.AddChild(d.ctx, c.Name, c.Node)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are just subtituting one-liner functions for the actual call.

Comment on lines -128 to -152
// Update child entry in the underlying UnixFS directory.
func (d *Directory) updateChild(c child) error {
err := d.unixfsDir.AddChild(d.ctx, c.Name, c.Node)
if err != nil {
return err
}

return nil
}

func (d *Directory) Type() NodeType {
return TDir
}

// childNode returns a FSNode under this directory by the given name if it exists.
// it does *not* check the cached dirs and files
func (d *Directory) childNode(name string) (FSNode, error) {
nd, err := d.childFromDag(name)
if err != nil {
return nil, err
}

return d.cacheNode(name, nd)
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one-liners removed.

Comment on lines +364 to +366
if clean {
d.entriesCache = make(map[string]FSNode)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only relevant change.

Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.35%. Comparing base (7c459af) to head (19419fb).

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #751      +/-   ##
==========================================
- Coverage   60.40%   60.35%   -0.05%     
==========================================
  Files         245      245              
  Lines       31063    31054       -9     
==========================================
- Hits        18764    18744      -20     
- Misses      10629    10637       +8     
- Partials     1670     1673       +3     
Files with missing lines Coverage Δ
mfs/dir.go 53.48% <100.00%> (+0.58%) ⬆️

... and 13 files with indirect coverage changes

Copy link
Contributor

@gammazero gammazero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT. One question.

}

func (d *Directory) sync() error {
func (d *Directory) syncCache(clean bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any situation where we would not want clean = true? Seems like we should always clear the cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I wanted to not clean on GetNode() while cleaning on reaching maxCacheSize.

I am not sure this PR is good in general... probably there should be some refcounting about which objects can be pushed out of the cache. My feeling based on how tests misbehave when removing things is at least that files that are open for writing should probably stay...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there's also the issue that this still allows the cache to grow forever until sync. And such sync may take minutes or hours...

@@ -22,6 +22,8 @@ The following emojis are used to highlight certain changes:

### Fixed

* `mfs`: directory cache is now cleared every time the directory node is read, somewhat limiting unbounded growth and time to sync it to the underlying unixfs.
Copy link
Member

@lidel lidel Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm scared to replace known MFS problems with new unknown MFS problems, but also not familiar with codebase well enough to have informed opinion :(

I know folks have pipelines that do a lot of concurrent writes to MFS via ipfs files CLI/RPC and likely figured out what works and what does not.

Are we sure new version removes or at least decreases surface for data loss during concurrent writes?

Perhaps useful context: in kubo, we have --flush flag that (if true) calls mfs.FlushPath after each MFS operation.

Intention behind disabling flush is to allow user to avoid sync during many small operations and flush to disk until "final" version of file or directory is ready, but that comes at consistency risk if there is more than one thread doing mutation:

Most of the subcommands of ipfs files accept the --flush flag. It defaults
to true. Use caution when setting this flag to false. It will improve
performance for large numbers of file operations, but it does so at the cost
of consistency guarantees.

So effectively, patching a lot of things should be faster with flush=false but you need to do things sequentially. This is the mental model Kubo users have.

Are edge cases mentioned in https://github.com/ipfs/boxo/pull/751/files avoidable with --flush=true ? Or are we introducing racy risk there, where it was not before?

If we make "safe" default less "safe", could/should we tie cache cleanup to happen only in mfs.FlushPath instead (explicit --flush=true)? (call syncCache with true equivalent only if in the context of explicit mfs.FlushPath?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be an option...

I did not know we flushed by default... that also means this commit will essentially kill the cache on every operation. I cannot say with certainty but this probably brings the new "unknown MFS problems" as it will be equivalent to disabling the cache.

Let's do something, let me spend another while trying to figure out why exactly concurrent writes to a single file break with things like cache-size=1 and if something else is broken that way. Maybe I can find a workaround to feel better about it.

@gammazero
Copy link
Contributor

This helps/solves a known problem, so this change does have positive value. Can we run this is cluster or somewhere to gain more confidence that there are not any negative behaviors with this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants