Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache does not work unexpectedly #1588

Open
jvinolas opened this issue Nov 23, 2024 · 16 comments
Open

Cache does not work unexpectedly #1588

jvinolas opened this issue Nov 23, 2024 · 16 comments
Labels
bug Something isn't working

Comments

@jvinolas
Copy link

Description

Cache was working for some days and we've got two high iowait that let the cache not working. The cache is caching and we see dirties but suddendly does not work anymore, high iowait and no dmesg core dump messages, only tasks hung infos.

First one was while high dirty cache and swapping from wb to wt. We tried to bring it back to wb but we kept with high iowait from the first swap and no cache response anymore. The only solution was to flush the cache, stop it, force recreate it and start it again.

The second one was about 3am today while we were rsync files to this cache and suddenly it did fail, high iowait again and tasks got hung at dmesg info messages.

Expected Behavior

Cache behaviour should be somewhat predictable and do not stop working.

Actual Behavior

High iowait and cache does seem to not work again.

Steps to Reproduce

We could reproduce the first one:

  1. High dirties, cache being written and swapping from wb to wt.

Context

As the cache mountopoint is being served by an NFS with clients connected to it, we had to stop all clients, stop cache, reformat it again and start everything again.

Possible Fix

Logs

This is what suddenly happened this morning while only doing rsyncs:

image_2024-11-23_08-36-41

[Sat Nov 23 03:00:03 2024] workqueue: blk_mq_run_work_fn hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
[Sat Nov 23 03:38:06 2024] workqueue: do_global_cleanup [dm_bufio] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
[Sat Nov 23 04:44:07 2024] workqueue: blk_mq_run_work_fn hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
[Sat Nov 23 05:32:23 2024] workqueue: do_global_cleanup [dm_bufio] hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND
[Sat Nov 23 07:11:04 2024] workqueue: do_global_cleanup [dm_bufio] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND

And now happened again:

image_2024-11-23_14-56-12

image_2024-11-23_14-56-38

This time with no log in dmesg.

Configuration files

Your Environment

  • OpenCAS version (commit hash or tag): c142610
  • Operating System: 24.04.1 LTS (Noble Numbat)
  • Kernel version: 6.9.12-060912-generic #202408281052 SMP PREEMPT_DYNAMIC
  • Cache device type (NAND/Optane/other): ram disk (wa) over nvme (wb)
  • Core device type (HDD/SSD/other): dm_vdo over an mdadm raid0 over iscii disks
  • Cache configuration:
# casadm -P -i 1
Cache Id                  1
Cache Size                877915904 [4KiB Blocks] / 3348.98 [GiB]
Cache Device              /dev/nvme0n1p1
Exported Object           -
Core Devices              1
Inactive Core Devices     0
Write Policy              wb
Cleaning Policy           alru
Promotion Policy          always
Cache line size           64 [KiB]
Metadata Memory Footprint 4.0 [GiB]
Dirty for                 16090 [s] / 4 [h] 28 [m] 10 [s]
Status                    Running
# casadm -P -i 2
Cache Id                  2
Cache Size                65453856 [4KiB Blocks] / 249.69 [GiB]
Cache Device              /dev/ram0
Exported Object           -
Core Devices              1
Inactive Core Devices     0
Write Policy              wa
Cleaning Policy           alru
Promotion Policy          always
Cache line size           64 [KiB]
Metadata Memory Footprint 319.3 [MiB]
Dirty for                 0 [s] / Cache clean
Status                    Running
# casadm -L
type    id   disk             status    write policy   device
cache   1    /dev/nvme0n1p1   Running   wb             -
└core   1    /dev/dm-4        Active    -              /dev/cas1-1
cache   2    /dev/ram0        Running   wa             -
└core   1    /dev/cas1-1      Active    -              /dev/cas2-1
# casadm --get-param --name cleaning-alru -i 1
╔═════════════════════════╤═══════╗
║ Parameter name          │ Value ║
╠═════════════════════════╪═══════╣
║ Wake up time [s]        │     5 ║
║ Stale buffer time [s]   │     1 ║
║ Flush max buffers       │  9750 ║
║ Activity threshold [ms] │     0 ║
╚═════════════════════════╧═══════╝
# casadm --get-param --name cleaning-alru -i 2
╔═════════════════════════╤═══════╗
║ Parameter name          │ Value ║
╠═════════════════════════╪═══════╣
║ Wake up time [s]        │    20 ║
║ Stale buffer time [s]   │   120 ║
║ Flush max buffers       │   100 ║
║ Activity threshold [ms] │ 10000 ║
╚═════════════════════════╧═══════╝
    • Cache mode: wa rmdisk over wb nvme
    • Cache line size: 64
    • Promotion policy: always
    • Cleaning policy: alru
    • Sequential cutoff policy: full
# casadm --get-param --name seq-cutoff -i 1 -j 1
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ full  ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝
# casadm --get-param --name seq-cutoff -i 2 -j 1
╔═════════════════════════════════════════════════════╤═══════╗
║ Parameter name                                      │ Value ║
╠═════════════════════════════════════════════════════╪═══════╣
║ Sequential cutoff threshold [KiB]                   │  1024 ║
║ Sequential cutoff policy                            │ full  ║
║ Sequential cutoff promotion request count threshold │     8 ║
╚═════════════════════════════════════════════════════╧═══════╝
  • Other (e.g. lsblk, casadm -P, casadm -L)
@jvinolas jvinolas added the bug Something isn't working label Nov 23, 2024
@jvinolas
Copy link
Author

jvinolas commented Nov 23, 2024

Could it be related to having the ramdisk in /dev/ram0? Here you state to not use it http://open-cas.com/guide_advanced_options.html#multi-level-caching even in recent kernels documentations say it is not a problem.

EDIT: We removed the brd ramdisk cache device and doing now the same rsyncs... we'll see if it was that. Also, no errors shown in casadm -P, so we don't know why cache decided to 'deactivate' itself.

@jvinolas
Copy link
Author

jvinolas commented Nov 23, 2024

Without the ram cache over the nvme, just nvme in wb mode with same parameters happened again:

imatge

imatge

No logs in dmesg at all this time also.

Cache Id                  1
Cache Size                877915904 [4KiB Blocks] / 3348.98 [GiB]
Cache Device              /dev/nvme0n1p1
Exported Object           -
Core Devices              1
Inactive Core Devices     0
Write Policy              wb
Cleaning Policy           alru
Promotion Policy          always
Cache line size           64 [KiB]
Metadata Memory Footprint 4.0 [GiB]
Dirty for                 13014 [s] / 3 [h] 36 [m] 54 [s]
Status                    Running

╔══════════════════╤═══════════╤═══════╤═════════════╗
║ Usage statistics │   Count   │   %   │   Units     ║
╠══════════════════╪═══════════╪═══════╪═════════════╣
║ Occupancy        │ 877915904 │ 100.0 │ 4KiB Blocks ║
║ Free             │         0 │   0.0 │ 4KiB Blocks ║
║ Clean            │ 642185680 │  73.1 │ 4KiB Blocks ║
║ Dirty            │ 235730224 │  26.9 │ 4KiB Blocks ║
╚══════════════════╧═══════════╧═══════╧═════════════╝

╔══════════════════════╤════════╤═══════╤══════════╗
║ Request statistics   │ Count  │   %   │ Units    ║
╠══════════════════════╪════════╪═══════╪══════════╣
║ Read hits            │   1039 │   0.2 │ Requests ║
║ Read partial misses  │      0 │   0.0 │ Requests ║
║ Read full misses     │   2309 │   0.4 │ Requests ║
║ Read total           │   3348 │   0.6 │ Requests ║
╟──────────────────────┼────────┼───────┼──────────╢
║ Write hits           │ 154914 │  29.7 │ Requests ║
║ Write partial misses │     14 │   0.0 │ Requests ║
║ Write full misses    │ 345926 │  66.2 │ Requests ║
║ Write total          │ 500854 │  95.9 │ Requests ║
╟──────────────────────┼────────┼───────┼──────────╢
║ Pass-Through reads   │      0 │   0.0 │ Requests ║
║ Pass-Through writes  │  17963 │   3.4 │ Requests ║
║ Serviced requests    │ 504202 │  96.6 │ Requests ║
╟──────────────────────┼────────┼───────┼──────────╢
║ Total requests       │ 522165 │ 100.0 │ Requests ║
╚══════════════════════╧════════╧═══════╧══════════╝

╔══════════════════════════════════╤════════════╤═══════╤═════════════╗
║ Block statistics                 │   Count    │   %   │   Units     ║
╠══════════════════════════════════╪════════════╪═══════╪═════════════╣
║ Reads from core(s)               │       2561 │   0.0 │ 4KiB Blocks ║
║ Writes to core(s)                │  745094229 │ 100.0 │ 4KiB Blocks ║
║ Total to/from core(s)            │  745096790 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼────────────┼───────┼─────────────╢
║ Reads from cache                 │  644049412 │  42.2 │ 4KiB Blocks ║
║ Writes to cache                  │  881997809 │  57.8 │ 4KiB Blocks ║
║ Total to/from cache              │ 1526047221 │ 100.0 │ 4KiB Blocks ║
╟──────────────────────────────────┼────────────┼───────┼─────────────╢
║ Reads from exported object(s)    │       4661 │   0.0 │ 4KiB Blocks ║
║ Writes to exported object(s)     │  983061770 │ 100.0 │ 4KiB Blocks ║
║ Total to/from exported object(s) │  983066431 │ 100.0 │ 4KiB Blocks ║
╚══════════════════════════════════╧════════════╧═══════╧═════════════╝

╔════════════════════╤═══════╤═════╤══════════╗
║ Error statistics   │ Count │  %  │ Units    ║
╠════════════════════╪═══════╪═════╪══════════╣
║ Cache read errors  │     0 │ 0.0 │ Requests ║
║ Cache write errors │     0 │ 0.0 │ Requests ║
║ Cache total errors │     0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Core read errors   │     0 │ 0.0 │ Requests ║
║ Core write errors  │     0 │ 0.0 │ Requests ║
║ Core total errors  │     0 │ 0.0 │ Requests ║
╟────────────────────┼───────┼─────┼──────────╢
║ Total errors       │     0 │ 0.0 │ Requests ║
╚════════════════════╧═══════╧═════╧══════════╝

The pattern that seems to be repeating is 3 hours after we start the rsyncs...

This is the actual server status while still doing the rsyncs and cache not working suddenly:

imatge

@jvinolas
Copy link
Author

Could it be due to unresponsiveness of core device (high iowait) and fullfilling this buffers?

imatge

Some dmesg messages if that is the case would be helpful. We now activated the echo 8 kernel debug level messages and see if anything is thrown.

@jvinolas
Copy link
Author

jvinolas commented Nov 24, 2024

It is reproducible, now with

╔═════════════════════════╤═══════╗
║ Parameter name          │ Value ║
╠═════════════════════════╪═══════╣
║ Wake up time [s]        │     5 ║
║ Stale buffer time [s]   │     1 ║
║ Flush max buffers       │   500 ║
║ Activity threshold [ms] │    25 ║
╚═════════════════════════╧═══════╝

imatge

@robertbaldyga
Copy link
Member

Hi @jvinolas ! Do you trace cache occupancy over time? Can the failure be correlated with the moment the occupancy hits 100%?

@jvinolas
Copy link
Author

jvinolas commented Nov 25, 2024

Well, it seems... what's the solution? I mean, it goes pass through as I assume, but is there any way we can avoid that? Or event any way we could activate it again without having to stop and recreate the cache?

imatge

We are not using any io class right now. Do the default io class solve that problem?

@jvinolas
Copy link
Author

We tried the default io class, also without success in evicting old data...

@jvinolas
Copy link
Author

Looking at the code, could it be this function having problems in multi cpu (processor) socket servers? Ours is dual socket and we've got also numad active. We'll try to disable numad to see if that could interfere.

image

@robertbaldyga
Copy link
Member

@jvinolas I think we got a reproduction of that behavior. I'll let you know once we identify the root cause.

@jvinolas
Copy link
Author

Thanks. Meanwhile, is there any setup that gives us at least write cache and does not increment occupancy over the dirty so we don't get stuck when it reaches full occupancy?

@robertbaldyga
Copy link
Member

I think the problem is not only the occupancy, but also the fact that it's 100% dirty data, which means that cache needs to perform on-demand cleaning in order to be able to perform any eviction. If you need it to work as a write buffer it may we worth trying ACP cleaning policy. For long, continuous stream of intensive writes it will not help much, but if your workload has periodic bursts of writes, maybe ACP will be able to clean the cache between the bursts.

@robertbaldyga
Copy link
Member

robertbaldyga commented Nov 29, 2024

Also if there is any chance that your workload contains long ranges of sequential writes, setting sequential cutoff policy to never may help.

@jvinolas
Copy link
Author

we'll try and see, thanks

@jvinolas
Copy link
Author

jvinolas commented Dec 3, 2024

With write back, we set up at 23:10 ACP and seq-cutoff to never the dirty data keeps lower, as it should. And seems that eviction came correctly after we hit 100% occupancy doing some rsyncs: dirty still working, so cache working and not deactivated as it happened with ALRU:

imatge

@jvinolas
Copy link
Author

jvinolas commented Dec 9, 2024

We found that if we change to pt and back to wb the acp policy does not do on-demand cleaning, so we have to stop and start it again. For now we keep it in wb.

@jvinolas
Copy link
Author

jvinolas commented Dec 11, 2024

We are hitting more or less the same problem when occupancy gets 100%, low performance... Will wo mode avoid having clean lines at the cost of not caching reads? If we got a ram cache in wa mode in front of the nvme wo will it do the trick and keep occupancy just at the dirty line in the underlying nvme cache, thus avoiding the bug when hitting 100% occupancy?

This shows how the cache is not working well after occupancy hits 100%

imatge

UPDATED: Aplying seq-cutoff to never (we forgot after last cache stop) seems to bring back performance

imatge

UPDATE2: It worked for about 24 minutes, then came back to low performance...

imatge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants