-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache does not work unexpectedly #1588
Comments
Could it be related to having the ramdisk in /dev/ram0? Here you state to not use it http://open-cas.com/guide_advanced_options.html#multi-level-caching even in recent kernels documentations say it is not a problem. EDIT: We removed the brd ramdisk cache device and doing now the same rsyncs... we'll see if it was that. Also, no errors shown in casadm -P, so we don't know why cache decided to 'deactivate' itself. |
Without the ram cache over the nvme, just nvme in wb mode with same parameters happened again: No logs in dmesg at all this time also.
The pattern that seems to be repeating is 3 hours after we start the rsyncs... This is the actual server status while still doing the rsyncs and cache not working suddenly: |
Hi @jvinolas ! Do you trace cache occupancy over time? Can the failure be correlated with the moment the occupancy hits 100%? |
We tried the default io class, also without success in evicting old data... |
@jvinolas I think we got a reproduction of that behavior. I'll let you know once we identify the root cause. |
Thanks. Meanwhile, is there any setup that gives us at least write cache and does not increment occupancy over the dirty so we don't get stuck when it reaches full occupancy? |
I think the problem is not only the occupancy, but also the fact that it's 100% dirty data, which means that cache needs to perform on-demand cleaning in order to be able to perform any eviction. If you need it to work as a write buffer it may we worth trying ACP cleaning policy. For long, continuous stream of intensive writes it will not help much, but if your workload has periodic bursts of writes, maybe ACP will be able to clean the cache between the bursts. |
Also if there is any chance that your workload contains long ranges of sequential writes, setting sequential cutoff policy to |
we'll try and see, thanks |
We found that if we change to |
We are hitting more or less the same problem when occupancy gets 100%, low performance... Will This shows how the cache is not working well after occupancy hits 100% UPDATED: Aplying seq-cutoff to never (we forgot after last cache stop) seems to bring back performance UPDATE2: It worked for about 24 minutes, then came back to low performance... |
Description
Cache was working for some days and we've got two high iowait that let the cache not working. The cache is caching and we see dirties but suddendly does not work anymore, high iowait and no dmesg core dump messages, only tasks hung infos.
First one was while high dirty cache and swapping from wb to wt. We tried to bring it back to wb but we kept with high iowait from the first swap and no cache response anymore. The only solution was to flush the cache, stop it, force recreate it and start it again.
The second one was about 3am today while we were rsync files to this cache and suddenly it did fail, high iowait again and tasks got hung at dmesg info messages.
Expected Behavior
Cache behaviour should be somewhat predictable and do not stop working.
Actual Behavior
High iowait and cache does seem to not work again.
Steps to Reproduce
We could reproduce the first one:
Context
As the cache mountopoint is being served by an NFS with clients connected to it, we had to stop all clients, stop cache, reformat it again and start everything again.
Possible Fix
Logs
This is what suddenly happened this morning while only doing rsyncs:
And now happened again:
This time with no log in dmesg.
Configuration files
Your Environment
The text was updated successfully, but these errors were encountered: