Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE][ChromeOS] Expose DSP load tracker info for the visibility to ChromeOS #5521

Open
johnylin76 opened this issue Mar 11, 2022 · 42 comments
Assignees
Labels
area:SOF logging enhancement New feature or request performance Issues related to code performance and speed optimizations.
Milestone

Comments

@johnylin76
Copy link
Contributor

Is your feature request related to a problem? Please describe.
Currently we don't have much information for DSP in system logs attached in feedback report of audio issue. To debug the issue we need to re-build kernel with trace feature on and capture logs by sof-logger. It will be super helpful for debugging if DSP load tracker info could be obtained without those debugging steps. Moreover, we can setup an autotest item to track the DSP load performance for all supported products.

Describe the solution you'd like
I don't have ideas which should be the appropriate/simple way to do this. It can be either actively reported from DSP, or on demands of the host request.
For the active approach, it should be enough by reporting critical info only (e.g. DSP load is over the threshold) in order not to cause more burdens.

Describe alternatives you've considered
None

Additional context
None

@johnylin76 johnylin76 added the enhancement New feature or request label Mar 11, 2022
@lgirdwood
Copy link
Member

@plbossart @ujfalusi @ranj063 any thoughts ?

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 25, 2022

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 25, 2022

I'd say an IPC to on-demand ask this information from FW would be the most suitable solution. So we could e.g. have a debugfs interface (exposed by SOF linux driver) and when CRAS queries this (at the rate it chooses), SOF drivers sends an IPC and FW replies with current DSP load tracker data.

@lgirdwood lgirdwood added this to the v2.2 milestone Apr 6, 2022
@kv2019i kv2019i added the performance Issues related to code performance and speed optimizations. label Jun 16, 2022
@kv2019i kv2019i modified the milestones: v2.2, v2.6 Feb 17, 2023
@lgirdwood
Copy link
Member

@kv2019i we have mtrace and Zephyr oops now, anything else pending ?

@btian1
Copy link
Contributor

btian1 commented May 12, 2023

there is one line log may cover this topic:

[ 65.782163] ll_schedule: ll timer avg 2138, max 4647, overruns 0
[ 66.806163] ll_schedule: ll timer avg 2143, max 2924, overruns 0

simply check overruns can know whether dsp overloaded.

@kv2019i
Copy link
Collaborator

kv2019i commented May 12, 2023

@lgirdwood @btian1 This is not done. The request is to have this data in user-space without enabling the FW logs (disabled in Chromebooks by default). So this requires new DSP+kernel infra. In IPC4, we have messages to report load measurements, but these are not implemented yet (on FW nor on kernel side). So this is still a valid ask and I'm not aware of any ongoing work to implement these (we discussed about the IPC4 messages in question with the cycles-budget (CPC) topic around manifest).

@marc-hb
Copy link
Collaborator

marc-hb commented May 12, 2023

The request is to have this data in user-space without enabling the FW logs (disabled in Chromebooks by default). So this requires new DSP+kernel infra.

Why is this not just a log level configuration problem?
https://docs.zephyrproject.org/latest/services/logging/index.html

Logging API is highly configurable at compile time as well as at run time. [...] During compilation logs can be filtered out on module basis and severity level. [...] Logs can also be compiled in but filtered on run time using dedicate API. Run time filtering is independent for each backend and each source of log messages. Source of log messages can be a module or specific instance of the module.

The old sof-logger has similar capabilities.

So the idea to start another, new and entirely different logging framework sounds weird to me. @johnylin76, @andyross, @cujomalainey why are Chromebooks "throwing the baby with the bathwater" and disabling logs entirely as opposed to selectively keep the logging they need/want?

@cujomalainey
Copy link
Member

@marc-hb at least on pre-ace systems I have been warned that enabling logging hijacks a DMA (which PNP is not happy about when it comes to power) and also it can block power states. Trust me I would love to have even just ERR/WARN on.

@marc-hb
Copy link
Collaborator

marc-hb commented May 15, 2023

The old sof-logger uses either shared memory for high levels or DMA for all levels or both at the same time. It's configurable.

I'm not familiar with all Zephyr logging possibilities yet but I know for sure that the way it's configured does NOT use any DMA right now in our CI, it relies on shared memory "mtrace" only (which means it can overrun and lose statements; "you get what you paid for")

@kv2019i
Copy link
Collaborator

kv2019i commented May 17, 2023

@cujomalainey @marc-hb That's a good question. "mtrace" Zephyr backend is friendlier to system flows (it doesn't take a DMA channel). but it's still not zero cost. There are writes to the SRAM that take DSP cycles, there's a logging thread that takes some memory and there's impact to code size (although we already have dictionary formats in Zephyr up and running limiting this). But in the end it will depend on the specific product. Our current assumption is still that at least not all release firmwares will have logging enabled (like has been the case with SOF previously). I.e. key debugging tools (like ability to extract a FW coredump), will be available even if you don't have logging enabled. Having a way to access DSP load information without logging, seems like a reasonable feature, similar to the coredump. I think we are running out of runway for v2.6 though, so this probably needs to be pushed out to v2.7.

@marc-hb
Copy link
Collaborator

marc-hb commented May 17, 2023

There are writes to the SRAM that take DSP cycles, there's a logging thread that takes some memory and there's impact to code size (although we already have dictionary formats in Zephyr up and running limiting this).

Logging never comes for free; OK. Something Heisenberg something?

But how would a new technique focused on logging the DSP load only avoid all these costs? By being uber-specialized and optimized for reporting DSP load only? Then someone else realizes that "oh, wait, we also need the ability to log this other critical thing too in production builds... and that other one..."

Maybe I'm missing something but until fine-tuning the configuration of the existing logging framework has not been attempted and measured, starting a new brand new technique seems to fall in the "premature optimization" category.

I.e. key debugging tools (like ability to extract a FW coredump), will be available even if you don't have logging enabled

If that's already available without logging then it's different; it does really come "for free". Coredumps feel quite different from regular logging too; for instance you don't need a long-running thread streaming a series of coredumps :-)

@cujomalainey
Copy link
Member

cujomalainey commented May 17, 2023

There are writes to the SRAM that take DSP cycles, there's a logging thread that takes some memory and there's impact to code size

Will happily pay that price if it means debugging time is cut because we are not having blind errors in the field

Our current assumption is still that at least not all release firmwares will have logging enabled (like has been the case with SOF previously)

i think the bigger issue was that we didn't enable it in the current which meant the resources were never allocated. So even though the firmware was ready the kernel was enabled.

I think the load data could even be host triggered for our purposes into the log, but we need to be able to run the log in production without PNP getting annoyed.

@alex-cri alex-cri modified the milestones: v2.6, v2.7 May 23, 2023
@alex-cri
Copy link

@kv2019i , agree with your comment on running out of runway for v2.6. Moving to v2.7.

@lgirdwood
Copy link
Member

@kv2019i can we get the overall LL load per core in SRAM window ?
@ujfalusi will probably need a kernel debugFS to read it.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 5, 2023

SOF drivers sends an IPC and FW replies with current DSP load tracker data.

@ujfalusi will probably need a kernel debugFS to read it.

These seem to imply a "poll" style of user interface where a process must run and regularly poll the firmware.

The description and some other comments seem to lean more towards a "push" style of interface where the firmware regularly logs the data somehow and somewhere (a different question). In the latter case, the interval is controlled by the firmware so it is more predictable.

@cujomalainey (who filed this issue has been requesting this feature) do you have a preference?

@lgirdwood
Copy link
Member

SOF drivers sends an IPC and FW replies with current DSP load tracker data.

@ujfalusi will probably need a kernel debugFS to read it.

These seem to imply a "poll" style of user interface where a process must run and regularly poll the firmware.

The description and some other comments seem to lean more towards a "push" style of interface where the firmware regularly logs the data somehow and somewhere (a different question). In the latter case, the interval is controlled by the firmware so it is more predictable.

SRAM is FW push. DebugFS can be blocking read() for an update every 1sec using existing FW MCPS infra.

@cujomalainey
Copy link
Member

No preference, @johnylin76 thoughts?

@lgirdwood
Copy link
Member

We have had lot of focus on the per-module measurements infra in v2.8 cycle, so this has not progressed. The ask is still valid, so let's move it to v2.9 and see if we can finally do the final bit of publishing this to the host side.

Lets target LL 1st then DP 2nd. I think we have all the data collected now in FW for LL as CI is able to parse the logs per module.

@lgirdwood
Copy link
Member

@jsarha for v2.9 we need something initially that can provide data in an unused RO SRAM Window (for Intel - other Vendors will choose their memory locations) that can be read by debugFS as binary data with userspace parsing i.e. a Python script would read and parse the debugFS and display the contents. Later on it could use other transports like probes.

  1. The SRAM Windows and FW data collection will be disabled by default - it will be enabled at runtime via a new IPC message. i.e. the IPC to enable can be sent by kernel when debugFS is open() and IPC to disable when debugFS is close().

  2. There will be a second debugFS that allows for the FW data collection to be tuned to the needs of the developer. Each data payload will have an ID (iniut32_t) and if host echos the ID to debug FS then that sends an IPC (set large config) to enable that feature i.e.

echo 123, 789, -456 > /sys/debug/blah/sof_blah

Will enable data items with ID 123, 789 and disable ID 456 (negative).

  1. The FW data wont be parsed by the kernel (except for the total size & magic). It will just be passed through. It should have a generic header that is easy to parse e.g.
struct payload_hdr {
    uint32_t total_size; // total size of the whole payload in bytes
    uint32_t hdr_size; // size of this header only in bytes
    uint32_t magic; // used to identify valid data
    uint32_t abi;  // ABI version, can be used by tool to warn users if too old.
    uint64_t tstamp; // aligned with host epoch.
};

Following this will be the data structures/containers. They should all lead with a standard header that the tool can use
e.g.

struct elem_hdr {
    uint32_t size; // size of this elem in bytes;
    uint32_t id; // ID of this elem type
};
/* elem body follows here */

This way the Python tool can read and parse, it should ignore any IDs it does not know about.

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 4, 2024

Stable-v2.9 branched, this didn't make the cut, bumping to 2.10. There is progress with both Zephyr thread analyzer and with a more generic telemetry interface, but neither is ready for merge for SOF2.9.

@kv2019i kv2019i modified the milestones: v2.9, v2.10 Mar 4, 2024
@lgirdwood
Copy link
Member

@jsarha can you give an update of any PRs and output to date. Thanks !

@jsarha
Copy link
Contributor

jsarha commented Apr 26, 2024

I have some couple of PRs to address this issue. First there is Intel ADSP fixes for Zephyr Thread Analyzer:

And then couple two other PRs, depending on the first one, that still need some polishing, but which are functional and working at the moment:

The telemetry2 and thread info FW code, and the python client to decode the associated debugfs.

Simple debugfs code to Linux sof driver to map the telemetry2 window to a debugfs file.

@jsarha
Copy link
Contributor

jsarha commented Apr 26, 2024

Here is example output of the current implementation:

python3 thread-analyzer.py 
Open /sys/kernel/debug/sof/telemetry2 failed '[Errno 14] Bad address'
core_count: 3
Core 0 load 0%
        0x401184a8           cpu 0.0% stack 2.0%
        0x401183e0           cpu 0.0% stack 2.0%
        1 thread info        cpu 0.0% stack 8.2%
        0 thread info        cpu 0.0% stack 47.1%
        edf_workq            cpu 0.0% stack 8.2%
        sysworkq             cpu 0.0% stack 31.8%
        logging              cpu 0.0% stack 19.6%
        idle 00              cpu 99.6% stack 33.7%
Core 2 load 0%
        idle 02              cpu 99.6% stack 27.1%
        0x40118570           cpu 0.0% stack 47.1%
        2 thread info        cpu 0.0% stack 47.1%
Core 0 load 0%
        0x401184a8           cpu 0.0% stack 2.0%
        0x401183e0           cpu 0.0% stack 2.0%
        1 thread info        cpu 0.0% stack 8.2%
        0 thread info        cpu 0.0% stack 47.1%
        edf_workq            cpu 0.0% stack 8.2%
        sysworkq             cpu 0.0% stack 31.8%
        logging              cpu 0.0% stack 19.6%
        idle 00              cpu 99.6% stack 33.7%
Core 2 load 6%
        ll_thread2           cpu 0.0% stack 18.8%
        idle 02              cpu 93.3% stack 28.6%
        0x40118570           cpu 0.4% stack 47.1%
        2 thread info        cpu 0.0% stack 47.1%
Core 0 load 0%
        0x401184a8           cpu 0.0% stack 2.0%
        0x401183e0           cpu 0.0% stack 2.0%
        1 thread info        cpu 0.0% stack 8.2%
        0 thread info        cpu 0.0% stack 47.1%
        edf_workq            cpu 0.0% stack 8.2%
        sysworkq             cpu 0.0% stack 31.8%
        logging              cpu 0.0% stack 19.6%
        idle 00              cpu 99.6% stack 33.7%
Core 2 load 5%
        ll_thread2           cpu 5.5% stack 18.8%
        idle 02              cpu 94.1% stack 28.6%
        0x40118570           cpu 0.0% stack 47.1%
        2 thread info        cpu 0.0% stack 47.1%
Core 0 load 0%
        0x401184a8           cpu 0.0% stack 2.0%
        0x401183e0           cpu 0.0% stack 2.0%
        1 thread info        cpu 0.0% stack 8.2%
        0 thread info        cpu 0.0% stack 47.1%
        edf_workq            cpu 0.0% stack 8.2%
        sysworkq             cpu 0.0% stack 31.8%
        logging              cpu 0.0% stack 19.6%
        idle 00              cpu 99.6% stack 33.7%
Core 2 load 5%
        ll_thread2           cpu 5.5% stack 18.8%
        idle 02              cpu 94.1% stack 28.6%
        0x40118570           cpu 0.0% stack 47.1%
        2 thread info        cpu 0.0% stack 47.1%
Core 0 load 0%
        0x401184a8           cpu 0.0% stack 2.0%
        0x401183e0           cpu 0.0% stack 2.0%
        1 thread info        cpu 0.0% stack 8.2%
        0 thread info        cpu 0.0% stack 47.1%
        edf_workq            cpu 0.0% stack 8.2%
        sysworkq             cpu 0.0% stack 31.8%
        logging              cpu 0.0% stack 19.6%
        idle 00              cpu 99.6% stack 33.7%
Core 2 load 5%
        ll_thread2           cpu 5.5% stack 18.8%
        idle 02              cpu 94.1% stack 28.6%
        0x40118570           cpu 0.0% stack 47.1%
        2 thread info        cpu 0.0% stack 47.1%
Core 0 load 1%
        0x401184a8           cpu 0.0% stack 2.0%
        0x401183e0           cpu 0.0% stack 2.0%
        1 thread info        cpu 0.0% stack 8.2%
        0 thread info        cpu 0.0% stack 47.1%
        edf_workq            cpu 0.0% stack 8.2%
        sysworkq             cpu 0.0% stack 31.8%
        logging              cpu 0.8% stack 19.6%
        idle 00              cpu 98.4% stack 33.7%

@plbossart
Copy link
Member

  1. The SRAM Windows and FW data collection will be disabled by default - it will be enabled at runtime via a new IPC message. i.e. the IPC to enable can be sent by kernel when debugFS is open() and IPC to disable when debugFS is close().

@lgirdwood What if you want this FW data collection to start immediately without waiting for user-space to enable this with an IPC?

Take the example of the trace, it's on immediately. userspace can select to read it or not.

@lgirdwood
Copy link
Member

  1. The SRAM Windows and FW data collection will be disabled by default - it will be enabled at runtime via a new IPC message. i.e. the IPC to enable can be sent by kernel when debugFS is open() and IPC to disable when debugFS is close().

@lgirdwood What if you want this FW data collection to start immediately without waiting for user-space to enable this with an IPC?

Take the example of the trace, it's on immediately. userspace can select to read it or not.

Kconfig initially, but there could be later updates that allow developers to switch on/of via other methods.

@plbossart
Copy link
Member

@lgirdwood if it's a Kconfig, then why would you need an IPC to turn the data collection on?

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 30, 2024

I don't know about this case specifically but you can have a default value in Kconfig for boot time and the ability to change it again at runtime, it's not unheard of.

(you can also have the Kconfig ability to strip the code and save space but that's a different Kconfig)

@plbossart
Copy link
Member

plbossart commented Apr 30, 2024

indeed, and my issue is with the default value as false requiring an IPC to start data collection. That's a sure way to introduce non-reproducible behavior between runs since we can't control when the IPC would be sent.

@marc-hb
Copy link
Collaborator

marc-hb commented Apr 30, 2024

Again, not sure about this particular case but a typical configuration for this type of features in general is:

  • OFF by default in production builds
  • ON by default in debug builds thanks to some -DCONFIG_xxx override in some sof/app/debug_overlay.conf
  • Run-time knob available in all builds for more flexibility. Not used by default in automated testing.

Problem solved?

@plbossart
Copy link
Member

how would we know at the kernel level if the feature is on or off by default? There's a need to report this capability...

@johnylin76
Copy link
Contributor Author

I second @marc-hb

I'm so glad to see this feature being fulfilled. Since it's a developer feature anyway, Kconfig makes more senses to me as well.

how would we know at the kernel level if the feature is on or off by default? There's a need to report this capability...

My two cents is that kernel doesn't really to know whether this feature is supported or not, as long as no false alarm is reported by firmware while it does not support.
In another aspect, kernel can never consider the feature is "on" by default because there is always the mismatch between two ends.

@plbossart
Copy link
Member

in my experience, collecting data after a random time isn't great for early boot issues.
If you don't start from T0, then your results will depend on when the host driver sends that IPC, and that depends on a user-space program which is even worse.
Also there's a conceptual difference between COLLECTING data and REPORTING data. The first one should not be dependent on anything.

@lgirdwood
Copy link
Member

All, lets take this step by step - we can incrementally add to the config/control when the basics are upstream.
Today, we just need the Kconfig that advertises (enabled) in the extended manifest.
Next steps can be IPCs for runtime config/control, different backends (like probes) and so on, the important part is we need the basics 1st.

@marc-hb
Copy link
Collaborator

marc-hb commented May 1, 2024

I think everyone is on board with incremental, Continuous Integration but to reduce churn, back-and-forth and even going around in circles sometimes, it's good to have some high level idea of the final design and UI a.k.a. "requirements"? They don't have to be set in stone of course...

"Fail to plan, plan to fail" - my 2 cents.

@lgirdwood
Copy link
Member

I think everyone is on board with incremental, Continuous Integration but to reduce churn, back-and-forth and even going around in circles sometimes, it's good to have some high level idea of the final design and UI a.k.a. "requirements"?

See above re UI #5521 (comment) requirement is to track DSP load as per subject.

@marc-hb
Copy link
Collaborator

marc-hb commented May 1, 2024

I was including the user interface in the "requirements" but I admit none of this is an "exact science".

@kv2019i
Copy link
Collaborator

kv2019i commented Jun 5, 2024

There are two solutions now to expose DSP load over share memory:

  • IPC4 telemetry blob, provides SOF/app level DSP load -> merged and available but not supported by SOF Linux driver
  • telemetry2 developed by @jsarha , provides Zephyr level CPU/core load -- this is currently waiting for Zephyr dependencies Thread analyzer adsp fix zephyrproject-rtos/zephyr#71408 so trending not make the v2.10 cut

@lgirdwood lgirdwood modified the milestones: v2.10, v2.11 Jun 10, 2024
@kv2019i
Copy link
Collaborator

kv2019i commented Sep 12, 2024

I think we are ready now v2.11 work with #9398 and #9451 . @jsarha can you add a summary of what is available now and how to use the feature?

@johnylin76 if you can review and see if this meets the original feature request needs, or do we something left to do under this ticket (we can then either close, move to 2.12 and/or file follow-up new tickets).

@johnylin76
Copy link
Contributor Author

I think we are ready now v2.11 work with #9398 and #9451 . @jsarha can you add a summary of what is available now and how to use the feature?

@johnylin76 if you can review and see if this meets the original feature request needs, or do we something left to do under this ticket (we can then either close, move to 2.12 and/or file follow-up new tickets).

LGTM. Thanks for the great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:SOF logging enhancement New feature or request performance Issues related to code performance and speed optimizations.
Projects
None yet
Development

No branches or pull requests

9 participants