Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: CGroup v1 is now deprecated #372

Open
2 of 3 tasks
rokam opened this issue Jun 20, 2024 · 26 comments
Open
2 of 3 tasks

Bug Report: CGroup v1 is now deprecated #372

rokam opened this issue Jun 20, 2024 · 26 comments
Labels
bug Something isn't working pinned Prevents from getting marked as stale

Comments

@rokam
Copy link
Contributor

rokam commented Jun 20, 2024

OS Version

Arch Linux

System Information

Linux pclucas 6.9.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sun, 16 Jun 2024 19:06:37 +0000 x86_64 GNU/Linux

What happened?

Systemd have deprecated cgroup v1, although it is still able to use v1 adding another kernel parameter. It'll still work for Debian 11, but I think the migration to v2 should be planned.
systemd/systemd#30852

Machine Type

generic-x86-64

Installer output

No response

Relevant log output

No response

ADR

  • I have read through the ADR and have confirmed that my system is compliant with the requirements
  • I understand that if my system is found to not be compliant, my issue will be closed immediately without further investigation

Code of Conduct

@rokam rokam added the bug Something isn't working label Jun 20, 2024
@frostworx
Copy link

frostworx commented Jul 6, 2024

Thanks for opening this issue.

I also opened a thread on community side:

https://community.home-assistant.io/t/cgroupsv1-no-longer-supported-by-systemd/742098

edit:
an ancient issue reported in the wrong place (no reference to a new one):

home-assistant/home-assistant.io#29200

@frostworx
Copy link

what is your current workaround @rokam?
provide systemd.unified_cgroup_hierarchy=false SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1
via cmdline as well (and wait additional 30 seconds because this should be removed)?

If so do home assistant logs work for you currently? all my home assistant logs are broken currently,
because the supervisor logs crash with

Jul 15 15:56:32 mini hassio_supervisor[1135]: 2024-07-15 15:56:32.898 ERROR (MainThread) [aiohttp.server] Error handling request
Jul 15 15:56:32 mini hassio_supervisor[1135]: Traceback (most recent call last):
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
Jul 15 15:56:32 mini hassio_supervisor[1135]:     resp = await request_handler(request)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/web_app.py", line 543, in _handle
Jul 15 15:56:32 mini hassio_supervisor[1135]:     resp = await handler(request)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/web_middlewares.py", line 114, in impl
Jul 15 15:56:32 mini hassio_supervisor[1135]:     return await handler(request)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 189, in block_bad_requests
Jul 15 15:56:32 mini hassio_supervisor[1135]:     return await handler(request)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 205, in system_validation
Jul 15 15:56:32 mini hassio_supervisor[1135]:     return await handler(request)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 272, in token_validation
Jul 15 15:56:32 mini hassio_supervisor[1135]:     return await handler(request)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 283, in core_proxy
Jul 15 15:56:32 mini hassio_supervisor[1135]:     return await handler(request)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/api/utils.py", line 103, in wrap_api
Jul 15 15:56:32 mini hassio_supervisor[1135]:     msg_data = await method(api, *args, **kwargs)
Jul 15 15:56:32 mini hassio_supervisor[1135]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/api/host.py", line 227, in advanced_logs
Jul 15 15:56:32 mini hassio_supervisor[1135]:     return await self.advanced_logs_handler(request, identifier, follow)
Jul 15 15:56:32 mini hassio_supervisor[1135]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/api/host.py", line 214, in advanced_logs_handler
Jul 15 15:56:32 mini hassio_supervisor[1135]:     async for line in journal_logs_reader(resp, log_formatter):
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/src/supervisor/supervisor/utils/systemd_journal.py", line 98, in journal_logs_reader
Jul 15 15:56:32 mini hassio_supervisor[1135]:     length_raw = await resp.content.readexactly(8)
Jul 15 15:56:32 mini hassio_supervisor[1135]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jul 15 15:56:32 mini hassio_supervisor[1135]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 452, in readexactly
Jul 15 15:56:32 mini hassio_supervisor[1135]:     raise asyncio.IncompleteReadError(partial, len(partial) + n)
Jul 15 15:56:32 mini hassio_supervisor[1135]: asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 8 expected bytes

triggered both via ha supervisor logs or clicking any log/protocol in HA.

would be interesting if cgroups v1 causes this on "brandnew" systems which require the uglySYSTEMD_CGROUP_ENABLE_LEGACY_FORCE workaround

@rokam
Copy link
Contributor Author

rokam commented Jul 15, 2024

I'm not using arch anymore. But that cmdline is the way

@frostworx
Copy link

yeah, the cmdline works "fine" for now at least.

sad to hear you left Arch - seems like I'm the only one left with this problem now ;)

@rokam
Copy link
Contributor Author

rokam commented Jul 15, 2024

It seems that the supervisor already supports v2. Would you be able to remove it from the kernel cmdline?

@frostworx
Copy link

frostworx commented Jul 15, 2024

thank you for trying to help!

It would explain why nobody has that problem, but
https://www.home-assistant.io/more-info/unsupported/cgroup_version/
still suggests to use cgroup v1.
just digging though supervised-installer trying to find something.

they still use cgroup v1:

https://github.com/home-assistant/supervised-installer/blob/main/homeassistant-supervised/DEBIAN/postinst#L166

@frostworx
Copy link

frostworx commented Jul 15, 2024

As there don't seem to be any apparent plans to change the situation, I hope using v2 doesn't break my whole HA functionality.

"Please understand that support for cgroupv1 is going to go away entirely very very soon now."

via

doesn't look like there's much time left to wait.

edit:

I switched to cgroups v2, HA still warns about it (and logs are still broken) but at least the host os is no longer in a pending dying state.

@agners
Copy link
Member

agners commented Jul 15, 2024

We do use cgroupsv2 in HAOS, it works quite well.

There is one feature of the Supervisor which relies on CGroupsV1 for Supervised installation: Device permission updates (see home-assistant/supervisor#3421). Unfortunately, the way this was implemented broke with CGroupsV2, and it was not straight forward to implement it in a similar fashion.

Eventually, I've added device permission update for CGroupsV2 via runc/containerd changes (see home-assistant/os-agent#92 and opencontainers/runc#3402). However, this change did not made it upstream yet to runc/containerd, so it remains a feature of the patched version in HAOS 😢

I've pinged folks again in opencontainers/runc#3402, let's see. If someone is connected to the containerd/OCI community, I'd be glad if you can make things move forward.

@frostworx
Copy link

frostworx commented Jul 15, 2024

Thank you very much for the heads-up, @agners. Very appreciated!

Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Sep 13, 2024
@frostworx
Copy link

doesn't look fixed

@github-actions github-actions bot removed the stale label Sep 13, 2024
@hbiyik
Copy link

hbiyik commented Oct 10, 2024

systemd is really blackmailing my system with an extra 30s of delay with the force kernel argument SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1 it is really really annying.

https://github.com/systemd/systemd/blob/727dc1f23a2e16da7f1e24810157d5b7c9136525/src/core/main.c#L3186

@atostivint
Copy link

Personally, I also gave up and disabled cgroupv1. Nothing changed on my side.

@agners
Copy link
Member

agners commented Oct 11, 2024

systemd is really blackmailing my system with an extra 30s of delay with the force kernel argument SYSTEMD_CGROUP_ENABLE_LEGACY_FORCE=1 it is really really annying.

https://github.com/systemd/systemd/blob/727dc1f23a2e16da7f1e24810157d5b7c9136525/src/core/main.c#L3186

Is that shipped in Debian Bookworm already? 🤔

We kinda have to move away from cgroupsv1, I guess we just need to accept that upstream is not interested in dynamic device permission settings for cgroupsv2 and live with the consequences: Supervised installations won't have support for dynamic hardware update 😢

@hbiyik
Copy link

hbiyik commented Oct 11, 2024

Is that shipped in Debian Bookworm already? 🤔

i am on arch, i think rolling distros are impacted first.

@frostworx
Copy link

good to see some motion here, thanks all.

(personally I disabled cgroups v1, but would be nice to get proper logging back one day)

@agners
Copy link
Member

agners commented Oct 11, 2024

(personally I disabled cgroups v1, but would be nice to get proper logging back one day)

What do you mean by proper logging? 🤔

@frostworx
Copy link

frostworx commented Oct 11, 2024

(personally I disabled cgroups v1, but would be nice to get proper logging back one day)

What do you mean by proper logging? 🤔

hm, all my home assistant logs are empty since I dropped cgroups v1.
I assumed this was related, but now after you ask I'm no longer sure it is :)
Didn't care that much and simply read the cointainer logs when I needed to.
So thank you for asking :)

editing, because offtopic and I want to avoid mail spam:

seems like the missing logs are caused by aiohttp:

Okt 11 15:25:22 mini hassio_supervisor[1102]: 2024-10-11 15:25:22.795 ERROR (MainThread) [aiohttp.server] Error handling request
Okt 11 15:25:22 mini hassio_supervisor[1102]: Traceback (most recent call last):
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/web_protocol.py", line 477, in _handle_request
Okt 11 15:25:22 mini hassio_supervisor[1102]:     resp = await request_handler(request)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/web_app.py", line 559, in _handle
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await handler(request)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/web_middlewares.py", line 117, in impl
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await handler(request)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 199, in block_bad_requests
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await handler(request)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 215, in system_validation
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await handler(request)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 285, in token_validation
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await handler(request)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/middleware/security.py", line 298, in core_proxy
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await handler(request)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/utils.py", line 104, in wrap_api
Okt 11 15:25:22 mini hassio_supervisor[1102]:     msg_data = await method(api, *args, **kwargs)
Okt 11 15:25:22 mini hassio_supervisor[1102]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/__init__.py", line 528, in get_addon_logs
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await self._api_host.advanced_logs(request, *args, **kwargs)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/utils.py", line 104, in wrap_api
Okt 11 15:25:22 mini hassio_supervisor[1102]:     msg_data = await method(api, *args, **kwargs)
Okt 11 15:25:22 mini hassio_supervisor[1102]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/host.py", line 257, in advanced_logs
Okt 11 15:25:22 mini hassio_supervisor[1102]:     return await self.advanced_logs_handler(request, identifier, follow)
Okt 11 15:25:22 mini hassio_supervisor[1102]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/api/host.py", line 244, in advanced_logs_handler
Okt 11 15:25:22 mini hassio_supervisor[1102]:     async for line in journal_logs_reader(resp, log_formatter):
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/src/supervisor/supervisor/utils/systemd_journal.py", line 99, in journal_logs_reader
Okt 11 15:25:22 mini hassio_supervisor[1102]:     length_raw = await resp.content.readexactly(8)
Okt 11 15:25:22 mini hassio_supervisor[1102]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Okt 11 15:25:22 mini hassio_supervisor[1102]:   File "/usr/local/lib/python3.12/site-packages/aiohttp/streams.py", line 455, in readexactly
Okt 11 15:25:22 mini hassio_supervisor[1102]:     raise asyncio.IncompleteReadError(partial, len(partial) + n)
Okt 11 15:25:22 mini hassio_supervisor[1102]: asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 8 expected bytes
Okt 11 15:25:22 mini homeassistant[1102]: 2024-10-11 15:25:22.796 ERROR (MainThread) [homeassistant.components.hassio.http] Client error on api addons/a0d7b954_spotify/logs request Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
Okt 11 15:25:25 mini homeassistant[1102]: 2024-10-11 15:25:25.372 ERROR (MainThread) [homeassistant.helpers.template] Template variable error: list object has no element 0 when rendering '{%- macro get_device_topic(device_id) %} {{ states((device_entities(device_id) | select('search','device_topic') | list)[0]) }} {%- endmacro %}

a quick search for "usr/local/lib/python3.12/site-packages/aiohttp/web_protocol.py", line 477" leads me here:
home-assistant/core#126961
so maybe this is fixed (soon)

edit2:
the above issue is likely not related
the main problem "asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 8 expected bytes" seems pretty rare, I guess I have a local problem. sorry for the noise

edit3:
in fact it WAS related - https://github.com/home-assistant/supervisor/releases/tag/2024.10.2 -> home-assistant/supervisor#5345 Bump aiohttp from 3.10.9 to 3.10.10 https://github.com/dependabot fixed the broken addon logs here :)

@agners
Copy link
Member

agners commented Oct 11, 2024

hm, all my home assistant logs are empty since I dropped cgroups v1.

That is more likely because we changed to fetch logs directly from systemd-journald through systemd-journal-gatewayd. See this and this.

@frostworx
Copy link

frostworx commented Oct 11, 2024

oh, thank you very much for your help, @agners!

Very likely the problem lies in gatewayd - haven't touched/looked into it since it worked. sorry for the offtopic noise all

edit:

the systemd-journal-gatewayd override

[Socket]
ListenStream=
ListenStream=/run/systemd-journal-gatewayd.sock

brought back the main home assistant logs (addons pending, but probably fixable)

Thank you very much again @agners! The pointer saved me some hours for sure!

@agners
Copy link
Member

agners commented Oct 11, 2024

So I checked quickly what features rely on the CGroups device permission updates. Currently it is used for add-ons to update permissions for mapped devices (see supervisor/docker/addon.py#L829), e.g.:

devices:
  - /dev/ttyUSB0

And those get hot-plugged after the add-on start.

In a quick test with Supervised + CGroupsV2 it seems that the OS Agent calls runc even though it does not support updates:

2024-10-11 13:43:48.434 debian-supervised os-agent[527]: INFO: 2024/10/11 15:43:48 cgroup.go:71: Successfully called runc for 'ff108dc3c484e7f6c0760593e2e2db7b9e5f7f5e8054432fa2319391779cf116', output
2024-10-11 13:43:48.434 debian-supervised os-agent[527]: INFO: 2024/10/11 15:43:48 cgroup.go:74: Permission 'c 188:0 rwm', granted for Container 'ff108dc3c484e7f6c0760593e2e2db7b9e5f7f5e8054432fa2319391779cf116' via runc

However, needless to say that permission update attempt doesn't work since that runc version doesn't support it, from within the add-on:

[local-ssh ~]$ cat /dev/ttyUSB0 
cat: can't open '/dev/ttyUSB0': Operation not permitted

Ideally we'd make sure that OS Agent returns an error, and that this gets logged. That way we have at least an error in the Supervisor log.

Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 14, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 21, 2024
@frostworx
Copy link

frostworx commented Dec 23, 2024

Seems like nothing changed here, the documentation still suggests to revert to dead cgroupv1, so I suggest a reopen here.

My addon logging problems mentioned above (500 Internal Server Error) seem to be Arch related (found some stray Arch threads, but no url ready for now), which makes me guess that the problem is caused by a not up2date homeassistant-supervised package. The package hast a pinned comment about missing cgroupv2 support (which likely is the reason for the package not being bumped).

bumping this thread and the package discussion simultaneously

edit: (sorry, forgot)

Arch AUR package is here

happy holidays and a great start into a hopefully good and peaceful new year!

@agners agners removed the stale label Dec 23, 2024
@agners agners added the pinned Prevents from getting marked as stale label Dec 23, 2024
@agners agners reopened this Dec 23, 2024
@agners
Copy link
Member

agners commented Dec 23, 2024

Actually, Supervisor 2024.12.0 contains #5419, so CGroupsV2 should no longer mark a Supervised installation as unsupported.

So this means you can use CGroupsV2 on Supervised installations. The only downside will be that hot-plugging of tty devices won't work if the add-on uses the above mentioned syntax. However, I think most add-ons nowadays use the more modern way of using the schema tag and a config of type device(subsystem=tty) (like OpenThread Border Router etc.) or uart: true (e.g. Z2M).

@frostworx
Copy link

Thank you for the head-up and reopening, @agners!

Maybe I didn't follow this close enough, but I guess several other people are still sitting on cgroupv1 for having missed the news as well.

@agners
Copy link
Member

agners commented Dec 23, 2024

Maybe I didn't follow this close enough, but I guess several other people are still sitting on cgroupv1 for having missed the news as well.

Yes I'd expect pretty much everyone to be still on cgroupsv1 at this point.

I think the next step would be to update the installer to no longer switch to cgroupsv1, so that new installations are no longer affected.

As for migrating existing installation, I am not sure how we handled this so far. I guess we could make the deb installation script to revert what has been done previously 🤔 @ikifar2012 any thoughts on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pinned Prevents from getting marked as stale
Projects
None yet
Development

No branches or pull requests

5 participants