mfs: clean cache on sync #751

hsanjuan · 2024-12-12T15:26:07Z

There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder.

One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes.

The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed.

When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work.

Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items.

This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now.

The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time.

I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.

There have been multiple reports of people complaining that MFS slows down to a crawl and triggers OOM events when adding many items to a folder. One underlying issue is that the MFS Directory has a cache which can grown unbounded. All items added to a directory are cached here even though they are already written to the underlying Unixfs folder on mfs writes. The cache serves Child() requests only. The purpose is the a Directory can then give updated information about its children without being "up to date" on its own parent: when a new write happens to a folder, the parent folder needs to be notified about the new unixfs node of their child. With this cache, this notification can be delayed until a Flush() is requested, one of the children triggers a bubbling notification, or the parent folder is traversed. When the parent folder is traversed, it will write cached entries of its childen to disk, including that of the current folder. In order to write the current folder it has triggered a GetNode() on it, which in turn has written all the cached nodes in the current folder to disk. This cascades. In principle, leafs do not have children and should not need to be rewritten to unixfs, but without caching the TestConcurrentWrites tests start failing. Caching provides a way for concurrent writes to access a single reference to a unixfs node, somehow making things work. Apart from cosmetic changes, this commits allows resetting the cache when it has been synced to disk. Other approaches have been tested, like fully removing the cache, or resetting it when it reaches certain size. All were insatisfactory in that MFS breaks in one way or other (i.e. when setting maxCacheSize to 1). Setting size to ~50 allows tests to pass, but just because no tests are testing with 50+1 items. This change does not break any tests, but after what I've seen, I can assume that concurrent writes during a Flush event probably result in broken expectations, and I wouldn't be surprised if there is a way to concatenate writes, lookups and flushes on references to multiple folders on the tree and trigger errors where things just work now. The main problem remains: the current setup essentially keeps all mfs in memory. This commit could allevaite the fact that reading the directory object triggers as many unixfs.AddChild() as nodes in the directory. After this, AddChild() would be triggered only on nodes cached since the last time. I don't find a reliable way of fixing MFS and I have discovered multiple gotchas into what was going to be a quick fix.

hsanjuan · 2024-12-12T15:27:08Z

mfs/dir.go

@@ -99,10 +99,11 @@ func (d *Directory) localUpdate(c child) (*dag.ProtoNode, error) {
 	d.lock.Lock()
 	defer d.lock.Unlock()

-	err := d.updateChild(c)
+	err := d.unixfsDir.AddChild(d.ctx, c.Name, c.Node)


These are just subtituting one-liner functions for the actual call.

hsanjuan · 2024-12-12T15:27:34Z

mfs/dir.go

-// Update child entry in the underlying UnixFS directory.
-func (d *Directory) updateChild(c child) error {
-	err := d.unixfsDir.AddChild(d.ctx, c.Name, c.Node)
-	if err != nil {
-		return err
-	}
-
-	return nil
-}
-
 func (d *Directory) Type() NodeType {
 	return TDir
 }

-// childNode returns a FSNode under this directory by the given name if it exists.
-// it does *not* check the cached dirs and files
-func (d *Directory) childNode(name string) (FSNode, error) {
-	nd, err := d.childFromDag(name)
-	if err != nil {
-		return nil, err
-	}
-
-	return d.cacheNode(name, nd)
-}
-


one-liners removed.

hsanjuan · 2024-12-12T15:27:54Z

mfs/dir.go

+	if clean {
+		d.entriesCache = make(map[string]FSNode)
+	}


The only relevant change.

codecov · 2024-12-12T15:34:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.35%. Comparing base (7c459af) to head (19419fb).

@@            Coverage Diff             @@
##             main     #751      +/-   ##
==========================================
- Coverage   60.40%   60.35%   -0.05%     
==========================================
  Files         245      245              
  Lines       31063    31054       -9     
==========================================
- Hits        18764    18744      -20     
- Misses      10629    10637       +8     
- Partials     1670     1673       +3

Files with missing lines	Coverage Δ
mfs/dir.go	`53.48% <100.00%> (+0.58%)`	⬆️

... and 13 files with indirect coverage changes

gammazero

LGMT. One question.

gammazero · 2024-12-12T22:21:34Z

mfs/dir.go

 }

-func (d *Directory) sync() error {
+func (d *Directory) syncCache(clean bool) error {


Is there any situation where we would not want clean = true? Seems like we should always clear the cache.

Originally I wanted to not clean on GetNode() while cleaning on reaching maxCacheSize.

I am not sure this PR is good in general... probably there should be some refcounting about which objects can be pushed out of the cache. My feeling based on how tests misbehave when removing things is at least that files that are open for writing should probably stay...

And there's also the issue that this still allows the cache to grow forever until sync. And such sync may take minutes or hours...

lidel · 2024-12-13T15:46:32Z

CHANGELOG.md

@@ -22,6 +22,8 @@ The following emojis are used to highlight certain changes:

 ### Fixed

+* `mfs`: directory cache is now cleared every time the directory node is read, somewhat limiting unbounded growth and time to sync it to the underlying unixfs.


I'm scared to replace known MFS problems with new unknown MFS problems, but also not familiar with codebase well enough to have informed opinion :(

I know folks have pipelines that do a lot of concurrent writes to MFS via ipfs files CLI/RPC and likely figured out what works and what does not.

Are we sure new version removes or at least decreases surface for data loss during concurrent writes?

Perhaps useful context: in kubo, we have --flush flag that (if true) calls mfs.FlushPath after each MFS operation.

Intention behind disabling flush is to allow user to avoid sync during many small operations and flush to disk until "final" version of file or directory is ready, but that comes at consistency risk if there is more than one thread doing mutation:

Most of the subcommands of ipfs files accept the --flush flag. It defaults
to true. Use caution when setting this flag to false. It will improve
performance for large numbers of file operations, but it does so at the cost
of consistency guarantees.

So effectively, patching a lot of things should be faster with flush=false but you need to do things sequentially. This is the mental model Kubo users have.

Are edge cases mentioned in https://github.com/ipfs/boxo/pull/751/files avoidable with --flush=true ? Or are we introducing racy risk there, where it was not before?

If we make "safe" default less "safe", could/should we tie cache cleanup to happen only in mfs.FlushPath instead (explicit --flush=true)? (call syncCache with true equivalent only if in the context of explicit mfs.FlushPath?)

That might be an option...

I did not know we flushed by default... that also means this commit will essentially kill the cache on every operation. I cannot say with certainty but this probably brings the new "unknown MFS problems" as it will be equivalent to disabling the cache.

Let's do something, let me spend another while trying to figure out why exactly concurrent writes to a single file break with things like cache-size=1 and if something else is broken that way. Maybe I can find a workaround to feel better about it.

gammazero · 2024-12-13T22:06:21Z

This helps/solves a known problem, so this change does have positive value. Can we run this is cluster or somewhere to gain more confidence that there are not any negative behaviors with this change?

hsanjuan requested a review from a team as a code owner December 12, 2024 15:26

hsanjuan force-pushed the fix/mfs-cache-3 branch from 320bf6d to 19419fb Compare December 12, 2024 15:26

hsanjuan changed the title ~~mfs: limit directory cache size~~ mfs: clean cache on sync Dec 12, 2024

hsanjuan commented Dec 12, 2024

View reviewed changes

hsanjuan mentioned this pull request Dec 12, 2024

[skip changelog] pinmfs: mitigate slow mfs writes when it triggers ipfs/kubo#10623

Open

lidel mentioned this pull request Dec 12, 2024

Release 0.33 ipfs/kubo#10580

Open

40 tasks

gammazero approved these changes Dec 12, 2024

View reviewed changes

lidel reviewed Dec 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mfs: clean cache on sync #751

mfs: clean cache on sync #751

hsanjuan commented Dec 12, 2024

hsanjuan Dec 12, 2024

hsanjuan Dec 12, 2024

hsanjuan Dec 12, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading

gammazero left a comment

gammazero Dec 12, 2024

hsanjuan Dec 13, 2024

hsanjuan Dec 13, 2024

lidel Dec 13, 2024 •

edited

Loading

hsanjuan Dec 13, 2024

gammazero commented Dec 13, 2024

		@@ -22,6 +22,8 @@ The following emojis are used to highlight certain changes:

		### Fixed

		* `mfs`: directory cache is now cleared every time the directory node is read, somewhat limiting unbounded growth and time to sync it to the underlying unixfs.

mfs: clean cache on sync #751

Are you sure you want to change the base?

mfs: clean cache on sync #751

Conversation

hsanjuan commented Dec 12, 2024

hsanjuan Dec 12, 2024

Choose a reason for hiding this comment

hsanjuan Dec 12, 2024

Choose a reason for hiding this comment

hsanjuan Dec 12, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 12, 2024 • edited Loading

Codecov Report

gammazero left a comment

Choose a reason for hiding this comment

gammazero Dec 12, 2024

Choose a reason for hiding this comment

hsanjuan Dec 13, 2024

Choose a reason for hiding this comment

hsanjuan Dec 13, 2024

Choose a reason for hiding this comment

lidel Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

hsanjuan Dec 13, 2024

Choose a reason for hiding this comment

gammazero commented Dec 13, 2024

codecov bot commented Dec 12, 2024 •

edited

Loading

lidel Dec 13, 2024 •

edited

Loading