Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frequent high CPU usage on TrueNAS #65

Open
Timofiev opened this issue Feb 2, 2023 · 12 comments
Open

Frequent high CPU usage on TrueNAS #65

Timofiev opened this issue Feb 2, 2023 · 12 comments
Labels
bug Something isn't working planned To be implemented in future release

Comments

@Timofiev
Copy link

Timofiev commented Feb 2, 2023

First of all I'm really happy that someone finally created a TrueNAS integration to Home Assistant. I'm using FreeNAS and then Truenas for almost 4 years already and I always missed an integration for it and then, it finally had..

So the issue is that I realized that, regular as clockwork every 60 seconds the CPU usage spikes (see picture below) quite a lot, additionally to it's normal usage.
image
Of course I googled, and tried everything to make it stop, deleted snapshots etc. I tried to systematically stop every plugin, jail but nothing stopped it. I later noticed that during HA restart it behaves differently and after restarting HA the 60 sec intervals shift.
So I tried to disable the integrations which have any relation to my NAS and I noticed that when I disable the TrueNAS integration these spikes stop appearing again.
image
Later of course I tried again, enabling it and then disabling it and when its enabled, the minutely spikes are appearing and when it's diasbled, there are no more spikes.
image

I'm not really sure why this is happening but I'm sure that I don't want an information gathering integration to spike my NAS to a 16% CPU usage every minute when it's idling CPU usage is around 5%, which also affects power consumtion. Since I disabled the integration the power consumption decreased by about 2-3W which is a very low consumption but it's significant enough compared to it's idling consumption.
If this is an adverse effect to the integration then I must learn how to live without it but if it's a bug and can be fixed I would be more happy to use it again. Maybe it's only affects me like this, I don't know.

Software versions

  • Home Assistant version: HA 2023.1.7
  • TrueNAS integration version: I don't know how to check it
  • TrueNAS version: TrueNAS-12.0-U8
@Timofiev Timofiev added the bug Something isn't working label Feb 2, 2023
@tomaae
Copy link
Owner

tomaae commented Feb 2, 2023

whats your truenas hardware?
I dont think I can do much since that would be api causing load. its worth investigating tho, could be specific endpoint we can make optional or change.
I should get hardware for proxmox server next week, so it should make investigating this easier.

@tomaae tomaae added the planned To be implemented in future release label Feb 2, 2023
@irakhlin
Copy link

I can second this issue and seemingly only tracked it down now. I noticed my truenas webui had been very sluggish when needing to investigate an unrelated issue (truenas is running 8 core intel avoton w/ 64GB ram and ssd boot drive). I noticed an almost constant 100% usage on a single core dedicated to middlewared, sometimes dropping around 70% and intermittently hogging two full cores.

Additionally.. in a totally separate investigation of optimizing my home assistant instance I was trying to track down custom integrations with longest load time, which this seemed to be coming in double of my frigate integration around ~80 seconds. On disabling the integration middlewared usage dropped to around 20% of a core.

Do you know if disabling the individual entities that I am not using prevents the integration from fetching the data from truenas?

@Timofiev
Copy link
Author

Timofiev commented Feb 10, 2023

Thank you for trying to help.
Here is my hardware:
Motherboard with integrated CPU and graphics : ASROCK J3455-ITX
RAM: 2xKINGSTON 8GB Notebook DDR3L 1600MHz CL11 KVR16LS11/8
System storage: KINGSPEC 64GB Internal SATA 3 2.5" KS-P3-64G
Storage: 3xWD Red 4TB 3.5" 5400rpm 64MB SATA WD40EFRX
PSU: CHIEFTEC Smart SFX-250VS 250W

@tomaae
Copy link
Owner

tomaae commented Feb 10, 2023

I can second this issue and seemingly only tracked it down now. I noticed my truenas webui had been very sluggish when needing to investigate an unrelated issue (truenas is running 8 core intel avoton w/ 64GB ram and ssd boot drive). I noticed an almost constant 100% usage on a single core dedicated to middlewared, sometimes dropping around 70% and intermittently hogging two full cores.

Additionally.. in a totally separate investigation of optimizing my home assistant instance I was trying to track down custom integrations with longest load time, which this seemed to be coming in double of my frigate integration around ~80 seconds. On disabling the integration middlewared usage dropped to around 20% of a core.

Do you know if disabling the individual entities that I am not using prevents the integration from fetching the data from truenas?

truenas integration takes 80 seconds to load? thats way too long unless your HA is running on really slow hardware. For me its 26 seconds.

@irakhlin
Copy link

I can second this issue and seemingly only tracked it down now. I noticed my truenas webui had been very sluggish when needing to investigate an unrelated issue (truenas is running 8 core intel avoton w/ 64GB ram and ssd boot drive). I noticed an almost constant 100% usage on a single core dedicated to middlewared, sometimes dropping around 70% and intermittently hogging two full cores.
Additionally.. in a totally separate investigation of optimizing my home assistant instance I was trying to track down custom integrations with longest load time, which this seemed to be coming in double of my frigate integration around ~80 seconds. On disabling the integration middlewared usage dropped to around 20% of a core.
Do you know if disabling the individual entities that I am not using prevents the integration from fetching the data from truenas?

truenas integration takes 80 seconds to load? thats way too long unless your HA is running on really slow hardware. For me its 26 seconds.

Home Assistant is running on an Odroid N2 ( Home Assistant Blue) which has been more than enough power thus far but I went ahead and loaded the integration on my home assistant dev vm (hassio os 9.5) on an intel i7-12800H giving the VM 6 cores and 8GB of ram. On a completely clean instance of home assistant the integration start up time was actually 98 seconds.
image

Could this issue be potentially related to a specific call that is being made to truenas that is taking a specifically long time to return? As a brief summary of my truenas loaded integration I am getting 7 devices and 153 entities, 1 cloud sync sensor, over 40 dataset sensors, 19 disk sensors, 17 jail sensors, 2 snapshot sensors and a bunch of system sensors (the system does have 4 NICs each of which is generating multiple sensors).

@tomaae
Copy link
Owner

tomaae commented Feb 11, 2023

no idea how fast that cpu is, its an arm. comparing to my load times, its all over the place.
but not sure that would be relevant, its most likely related to truenas respond time, not HA side. avoton may be old, but should be fine unless you run many vm/jails on it.

best would be if you could run the integration in debug for like 3 minutes to capture integration bootup and at least one update cycle, for me to analyze. we should be able to identify the cause.

@irakhlin
Copy link

@tomaae I do not believe the issue is the home assistant instance as I mentioned I am now testing on a clean and brand new hassio 9.5 vm with 6 cores (of an 12th gen i7) and issue is very much the same. I did capture the logs as you mentioned and looked though them a bit, I have no issue sending them to you but preferably would not want to post them here, email maybe?

I did start looking though the logs and without going to crazy did notice that some calls were taking significantly longer to return than others, for example get_systemstats and get_disk. Commenting line 123+124 and 127+128 in truenas_controller.py dropped integration load time from 104 seconds to 28 seconds (with debugging enable).

@tomaae
Copy link
Owner

tomaae commented Feb 12, 2023

yea, thats what I said. its most likely nas itself.
I dont think there are any sensitive information in log for this integration.

get_systemstats contains a lot, including workarounds for truenas api bug. so there is need to go deeper.
get_disk should not take long, thats just 2 queries for disks, nothing special there, nor it should be a long list.

@irakhlin
Copy link

irakhlin commented Feb 12, 2023

yea, thats what I said. its most likely nas itself. I dont think there are any sensitive information in log for this integration.

get_systemstats contains a lot, including workarounds for truenas api bug. so there is need to go deeper. get_disk should not take long, thats just 2 queries for disks, nothing special there, nor it should be a long list.

Doing a deeper dive on my own it seems the middlewared process on my truenas instance was in some precarious state and sitting on over 4GB of memory. A simple restart of the middlewared process on the truenas instance speed up both the truenas web ui and dropped the middlewared memory usage to ~500MB. I still see a bit of a jump in CPU usage for the process, (from 4% to 50% on a single core) every 60 seconds but it seems to settle very quickly. Integration load time is the biggest winner here however which is now sitting at 12 seconds, down from the previous 100 seconds.

I will still say it may be a good feature add for a future release to allow more gradual selection of desired sensors during the integration configuration as I am personally not using the majority of them and it may save resources. Last note, I do have a 'cloud sync' configuration setup in truenas for azure blob storage, I did notice the API key in the log unredacted with debugging logging. Not a huge deal but just something to note.

@tomaae
Copy link
Owner

tomaae commented Feb 13, 2023

Thats an interesting finding.
I plan to have more gradual options for sensors. I'm mostly giving it time to figure out what is useful to have as optional and how to group them.

Also, could you give me key for that API key, so I can redact it? We dont want any woopsies when people are posting logs.

@irakhlin
Copy link

Thats an interesting finding. I plan to have more gradual options for sensors. I'm mostly giving it time to figure out what is useful to have as optional and how to group them.

Also, could you give me key for that API key, so I can redact it? We dont want any woopsies when people are posting logs.

That makes sense, there is most definitely a good amount of information and data that is being made available figuring out level of importance for most use cases will definitely take time. As for the log message this is where I am seeing it, on the query: cloudsync, get
query response: [{'id': 3, 'description': 'hassio-backup', 'direction': 'PUSH', 'path': '/mnt/znas/hassio', 'attributes': {'bucket': 'homeassistant', 'folder': '/', 'fast_list': False}, 'enabled': True, 'transfer_mode': 'SYNC', 'encryption': False, 'filename_encryption': False, 'encryption_password': '', 'encryption_salt': '', 'args': '', 'post_script': '', 'pre_script': '', 'snapshot': True, 'bwlimit': [{'time': '00:00', 'bandwidth': 5242880}], 'exclude': [], 'transfers': None, 'follow_symlinks': False, 'credentials': {'id': 3, 'name': 'freenas-archive', 'provider': 'AZUREBLOB', 'attributes': {'account': 'freenasarchive', 'key': '**KEY REMOVED**'}}, 'schedule': {'minute': '10', 'hour': '3', 'dom': '*', 'month': '*', 'dow': 'sun,wed,sat'}, 'locked': False, 'job': {'id': 248573, 'method': 'cloudsync.sync', 'arguments': [3], 'logs_path': '/tmp/middlewared/jobs/248573.log', 'logs_excerpt': '2023/02/08 03:10:03 INFO : Starting bandwidth limiter at 5Mi Byte/s\n2023/02/08 03:10:04 INFO : There was nothing to transfer\n2023/02/08 03:10:04 INFO : \nTransferred: \t 0 B / 0 B, -, 0 B/s, ETA -\nChecks: 10 / 10, 100%\nElapsed time: 0.4s\n\n', 'progress': {'percent': 100, 'description': 'checks: 10 / 10', 'extra': None}, 'result': None, 'error': None, 'exception': None, 'exc_info': None, 'state': 'SUCCESS', 'time_started': {'$date': 1675843800516}, 'time_finished': {'$date': 1675843804885}}}]

@tomaae
Copy link
Owner

tomaae commented Feb 13, 2023

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working planned To be implemented in future release
Projects
None yet
Development

No branches or pull requests

3 participants