-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARC causing slow system performance on any version higher than v2.1.0 (v2.1.6+) #127
Comments
The problem reproducable on wide range of systems. I tested it on 10.13, 10.14, 10.15... And even CLEAN installation with newest pool and settings are affected. The tests was provided on two different computers. |
Thanks so much for all the work you've put into tracking this down; I've also linked in the other thread you've found which does sound like the same issue. Ranvel suggests in that thread it might be writes causing userspace to lock up, which is as good a theory as any, though my own case isn't especially write intensive. Updating ZFS often triggers In addition to my affected main machine (2018 Mac Mini), I have two older systems, a mid 2010 Mac Mini and a late 2009 iMac (so very similar CPUs) both running High Sierra (10.13), and they each have a much more basic ZFS setup – a single compressed and encrypted zvol for hosting Time Machine backups. While ZFS performance for them isn't amazing (they have very little hardware acceleration support) neither seems to be experiencing the same issue with system slow downs. They're both operating with much smaller ARC sizes (around 256mb) as neither has a lot of RAM, and ARC isn't as important for what they're doing. But it makes me wonder if the issue could be related to ARC size, or perhaps it gets worse the larger it gets and these older systems just aren't at a point where it becomes noticeable? |
As per Arne's post to the forum topic, another possible workaround is to disable compressed ARC like so (set in
This may allow I'll try to verify this on my system(s) over the weekend. Update: Unfortunately this didn't "fix" the problem, only make it less severe, see below. |
Unfortunately while disabling compressed ARC did improve performance overall, it didn't solve this problem – while the system was much more responsive with less entries in ARC and relatively low write activity, as soon as I started writing large quantities of data and ARC reached around ~4gb, performance took a nose dive as normal. For anyone else looking to test with compressed ARC disabled, make sure to disable it before importing your pool(s), otherwise you need to export then re-import them after; in my case the difference was only noticeable after primary ARC was emptied. I also tested removing my L2ARC device from my main working pool, but this made no discernible difference. The only thing that seems to help is bypassing primary ARC using The issue is definitely proportional to the amount of primary ARC being utilised; when primary ARC is under 1gb my system was still generally responsive, but as this amount climbed to 4gb and beyond it became more and more unusable, and at around the 10gb mark Before My conclusions from all of this are:
|
After upgrading to macOS Sonoma, I've been able to confirm that this extreme performance degradation does not occur under Sonoma, I have been able to upgrade to 2.2.3rc4 without any issues, and even once my ARC has filled no performance loss is observed, with no need to turn off ARC or any features. Whatever the problem is it appears to be specific to Catalina, and possibly all pre-ARM/M1 versions of macOS, though I can't confirm as I skipped Big Sur, Monterey and Ventura to upgrade directly to Sonoma instead. |
System information
Describe the problem you're observing
Updating to any macOS ZFS version above v2.1.0 results in extremely poor system performance while datasets are mounted and in use.
The issue appears to be with ARC, as setting
primarycache=none
andsecondarycache=none
significantly improves system performance/responsiveness, minus the cost of loading more data from disk.Describe how to reproduce the problem
zfs set primarycache=none <pool>
andzfs set secondarycache=none <zpool>
(may need to set for additional datasets if values are not inherited).primarycache
andsecondarycache
back to previous values (default is=all
).Attachments
The following spindumps were all generated under v2.2.3rc4 with ARC configured as normal (in use), causing many programs to run extremely slowly as they spend large amounts of time waiting for data.
spindumps-v2.2.3rc4.zip
Unfortunately due to the slow system responsiveness it was difficult to generate spindumps at the moments of worst performance, though I tried. Of note,
spindump.6.txt
was taken while attempting to decrypt a dataset into a new (unencrypted) dataset for testing, so may give useful stack traces.Additional Notes
I'm not aware of any specific changes to ARC that are likely to have caused this drastic change in performance, but the fact that setting
primarycache=none
andsecondarycache=none
results in such an improvement in system responsiveness makes it clear that the issue is most likely either related to the ARC, or to something it relies upon.I would assume that if this issue also affected Linux there would have been a lot more issues reported about it, so either Linux is unaffected, or macOS is affected differently (more severely), resulting in a much more noticeable drop in performance.
Many, many thanks to armdn for discovering the workaround for this issue on the forum topic originally created for it. You can view the topic here for many more spindumps and
sysctl
output.As pointed out by cgiard, since the issue has occurred since at least the v2.1.6 macOS release, this makes the persistent L2ARC fixes a possible area to look at, though removing L2ARC does not appear to make a difference.
Another thread by ranvel, which you can see here proposes that the issue is write operations causing user space to freeze. My own experience hasn't occurred on write-intensive systems though, so I think the interaction may be more complex.
The text was updated successfully, but these errors were encountered: