Debug stream thread info pr #9398

jsarha · 2024-08-23T09:33:29Z

This PR contains following things:

debug-stream definitions
implementation for transporting the debug-stream records over debug window slot
thread info implementation
- extracts per thread stack and cpu usage statistics from Zephyr core
- sends the statistics as debug-stream records
python script for receiving the debug-stream records over debug window slot transport and decoding them to human readable form

This PR requires following PR to compile (if the feature is enabled in Kconfig):
zephyrproject-rtos/zephyr#77467

and following PR to work (for python script to access the debug window slot):
thesofproject/linux#5154

and this fix to be able to disable tlemetry performance measurements, that tries to use the same debug-window slot (a configuration that allows both is also possible, but I wanted to get this PR out):
#9392

If one wants to try the PR in practice on MTL, one only needs the linux PR, and this branch of mine that contains all the other necessary changes:
https://github.com/jsarha/sof/tree/debug_stream_thread_info_testing

src/debug/debug_stream/debug_stream.h

src/debug/debug_stream/Kconfig

lgirdwood · 2024-08-23T13:01:12Z

src/debug/debug_stream/debug_stream_slot.c

+LOG_MODULE_REGISTER(debug_strem_slot);
+
+struct cpu_mutex {
+ struct k_mutex m;


ok, I assume mutex is per core and not not lock for all core ? - if so we should inline comment this description.

in fact, if this is nothing else but a mutex, why embed it into another structure? Just use an array of struct k_mutex objects. And if you do insist of having a separate structure for it, maybe use the debug_ namespace with it too

They are mutexes meant for different cores and AFAIK they should be on different cache-lines to avoid trouble. That is why they are inside a struct that I can force to be cache-aligned.

lgirdwood · 2024-08-23T13:01:55Z

src/debug/debug_stream/debug_stream_slot.c

+ uint32_t record_size = rec->size;
+ uint32_t record_start, buf_remain;
+
+ LOG_DBG("Sending record %u id %u len %u\n", rec->serial, rec->id, rec->size);


Should this go after the buffer check below ?

A matter or taste. I wanted to know for sure the function is at least called, when I could not see anything happening in the circular buffer.

src/debug/debug_stream/debug_stream_slot.c

lgirdwood · 2024-08-23T13:04:20Z

src/debug/debug_stream/debug_stream_slot.h

+ * Section descriptor defines core ID, offset and size of the circular
+ * buffer in the debug window slot.
+ */
+struct debug_stream_section_descriptor {


again - is user API, needs to go in another directory and needs packed/aligned.

Packed hardly does anything when everything is 32-bit words, and the descriptor is already aligned to cache-line size. The header itself must be written to beginning of slot so its automatically 4k page size aligned, which should be enough, and also if its not the idea is still to have it at the very beginning of the slot. I can put the packed aligned stuff to the header too, its not needed, and I feel it could even be misleading.

BTW I was amazed when I realized that the C standard has nothing like __packed. C is really not a low-level language ;-)

It seems unlikely but I think 32 bits fields could be 64-bits aligned on some 64 bits architectures, depending on the compiler flags, the phase of the moon etc.

https://www.catb.org/esr/structure-packing/ (sorry for the invalid certificate)
https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-packed-type-attribute
https://linux.die.net/man/1/pahole

best to be explicit here even though parent is packed and this is all 32bit words, i.e. we could have direct users that may reference this structure without the parent context (and hence not optimize for 32 but aligned/packed access).

@jsarha FWIW agree with @lgirdwood - please add packed. It's good for all ABI types to have it even if they "obviously" effectively would be packed by default by the compiler anyway.

I can of course create all kinds of constructs to invent the __aligned() again with variable sized paddings, and use __packed, but now I do not have time for it so, I leave this as it is.

@jsarha Wouldn't just removing __aligned and using __packed already solve it? And just look through those types to verify that you arrange elements from larger to smaller - first all 64-bit ones, then 32-bit ones etc. Only if you embed structures, well, then yes, just pad the structure that you're embedding with some uint8_t reserved[N] array to align to the next 64 bits, that's it

Yes, there is several ways to do this, and probably it would work even if the core specific circular-buffer descriptors were not 64-byte aligned, since after the initialization the contents is read only and the memory is accessed through uncached address-space. Still @lgirdwood suggested that also the static descriptor part should be 64-bit aligned, just to be safe. I am really fine with anything that works, but I would like to do it only once more, and the time is running little short if we want this to go to 2.11.

@lgirdwood @lyakh , I am more and more starting to think the __aligned() should be dropped from "struct debug_stream_section_descriptor", and for that matter all the padding hacks too. We can not have the ABI depend on cache-line size, or at least we should then write the offsets explicitly out in the debug window, but that is pretty much what the "struct debug_stream_section_descriptor" is for: It is all static data that is anyway initialized by core 0 before the threads running on other cores have started and its in uncached memory, and it contains the offsets of the core specific circular buffers, and their sizes. So, there should not and must not be any cash-line size dependent padding in the header that describes where the core-specific circular buffers are. I've tested this works fine with sof-mtl-nocodec-ssp0-ssp2.tplg on MTL.

@jsarha I agree. The padding that I was proposing wasn't for cache-line size alignment, it was for "natural data type alignment." I.e. if you have an 8-bit member followed by a 16-bit value and you cannot swap them, add a padding like

struct abi_data { uint8_t type; uint8_t reserved0[3]; uint32_t size; .... };

tools/debugstream/thread-info.py

src/debug/debug_stream/Kconfig

src/debug/debug_stream/debug_stream_slot.c

tools/debugstream/thread-info.py

src/debug/debug_stream/debug_stream_thread_info.c

kv2019i

No showstoppers, aside the assert usage.

src/debug/debug_stream/debug_stream.h

src/debug/debug_stream/debug_stream_thread_info.c

tools/debugstream/debug-stream.py

src/debug/debug_stream/debug_stream_slot.c

marc-hb

debug-stream.py has 352 pylint warnings: that's almost as many as the number of lines in the file! I'm NOT requesting "pylint perfection" but the number of warnings should be small enough to keep pylint usable. It's not usable right now.

The good news is: 256 out of these 352 are the same W0311 and you can get rid of ALL of them with a single black call:

apt install black
black tools/debugstream/debug-stream.py
reformatted tools/debugstream/debug-stream.py

Warning: black reformatting happens in place.

Another pro tip: sorted_pylint () { pylint "$@" | sort -t: -k2 -n ; }

tools/debugstream/debug-stream.py:370:0: W0311: Bad indentation. Found 5 spaces, expected 20 (bad-indentation)
tools/debugstream/debug-stream.py:371:0: W0311: Bad indentation. Found 5 spaces, expected 20 (bad-indentation)
tools/debugstream/debug-stream.py:372:0: W0311: Bad indentation. Found 6 spaces, expected 24 (bad-indentation)
tools/debugstream/debug-stream.py:374:0: W0311: Bad indentation. Found 2 spaces, expected 8 (bad-indentation)
tools/debugstream/debug-stream.py:375:0: W0311: Bad indentation. Found 3 spaces, expected 12 (bad-indentation)
tools/debugstream/debug-stream.py:376:0: W0311: Bad indentation. Found 3 spaces, expected 12 (bad-indentation)
tools/debugstream/debug-stream.py:377:0: W0311: Bad indentation. Found 2 spaces, expected 8 (bad-indentation)
tools/debugstream/debug-stream.py:378:0: W0311: Bad indentation. Found 3 spaces, expected 12 (bad-indentation)
tools/debugstream/debug-stream.py:379:0: W0311: Bad indentation. Found 4 spaces, expected 16 (bad-indentation)
tools/debugstream/debug-stream.py:380:0: W0311: Bad indentation. Found 4 spaces, expected 16 (bad-indentation)
tools/debugstream/debug-stream.py:381:0: W0311: Bad indentation. Found 2 spaces, expected 8 (bad-indentation)
tools/debugstream/debug-stream.py:382:0: W0311: Bad indentation. Found 2 spaces, expected 8 (bad-indentation)

tools/debugstream/debug-stream.py

marc-hb · 2024-08-26T17:15:30Z

but the number of warnings should be small enough to keep pylint usable.

Forgot this sorry: most warnings are easy to fix but in some rare cases pylint is wrong or you don't have the time to figure out how (feel free to ask for help privately). Then you can easily silence any warning with a # pylint: disable... comment. Either in the spot (preferred), or at in the whole file. Find examples in git -C sof[-test] grep pylint.*disable.

Or you can just leave some for someone else to fix later.

src/debug/debug_stream/Kconfig

tools/debugstream/debug-stream.py

jsarha · 2024-09-04T18:58:50Z

The new version should address all the comment according to reply, except for "serial" > "seqnum" and getting rid off __aligned() and using __packed and padding members instead. Oh, and I am not working from python ctypes to construct change, not yet anyway.

pylint mostly clean

jsarha · 2024-09-05T08:34:27Z

Now also "serial" -> "seqnum" change is done. Aome idle debug prints were also removed.

src/debug/debug_stream/debug_stream_thread_info.c

lyakh

As long as none of my questions actually uncover bugs and since this is new code, they can be addressed after merge, nothing seems to be critical

src/include/user/debug_stream.h

lyakh · 2024-09-05T13:51:58Z

src/debug/debug_stream/debug_stream_slot.c

+} __aligned(CONFIG_DCACHE_LINE_SIZE);
+
+/* CPU specific mutexes for each circular buffer */
+static struct cpu_mutex cpu_mutex[CONFIG_MP_MAX_NUM_CPUS];


this puts cpu_mutex[] into .bss which is uncached in present builds. That means, that you don't need to cache-line align them and to flush caches. In general, even if it were cached, you would always access it as such, mixing cached and uncached access to a mutex would very likely break its functionality. So you wouldn't need to flush caches anyway.

Oh, then that is a problem. Is all static (and global) data uncached? If that is the case, then I guess I should reserve the table from heap, but are all heap allocations automatically cache-line aligned?

The mutex only to control access within one core so AFAIU the cache should not be a problem. This is for the situation where higher priority thread interrupts record sending and that higher priority thread tries to send a record of its own.

I'd still say its safer to keep the mutexes on different cache-lines, even if the memory is uncached, so I suggest not to do anything right now, but move the mutexes to cached memory later.

@jsarha I'd keep them uncached and not bother with cache lines. Should be both simpler and safer. Though a bit slower, but don't think this slow down would be significant for you. But if you do decide to go via cache, you can get cached aliases to that memory, but then yes, you'd need to reserve a whole cache line-sized buffer for each.

lyakh · 2024-09-05T13:52:13Z

src/debug/debug_stream/debug_stream_slot.c

+ buf->next_seqno = 0;
+ buf->w_ptr = 0;
+ k_mutex_init(&cpu_mutex[i].m);
+ sys_cache_data_flush_range(&cpu_mutex[i], sizeof(cpu_mutex[i]));


I can remove this, for now, but eventually I would like have these mutexes in cached memory.

lyakh · 2024-09-05T13:53:07Z

src/debug/debug_stream/debug_stream_slot.c

+LOG_MODULE_REGISTER(debug_strem_slot);
+
+struct cpu_mutex {
+ struct k_mutex m;


in fact, if this is nothing else but a mutex, why embed it into another structure? Just use an array of struct k_mutex objects. And if you do insist of having a separate structure for it, maybe use the debug_ namespace with it too

lyakh · 2024-09-05T13:56:35Z

src/debug/debug_stream/debug_stream_slot.c

+ if (!buf)
+ return -ENODEV;
+
+ if (rec->size >= desc.buf_words) {


is .size in words? Should be clearly documented then... Or maybe use a different name or make it in bytes

Its documented here:

struct debug_stream_record { uint32_t id; /* Record id of abstract data record */ uint32_t seqno; /* Increments after each record */ uint32_t size; /* Size of the whole record in words */ uint32_t data[]; } __packed;

, but I change the data-member name.

lyakh · 2024-09-05T14:28:08Z

src/debug/debug_stream/debug_stream_thread_info.c

+ int i;
+
+ /* Mark the thread as active */
+ ud->active_threads[ud->thread_count] = tid;


thread_count sounds like it's the number of threads, i.e. it begins with 1? Can this access outside of the array?

@jsarha but does it begin with 0 or with 1? I.e. does the element ud->active_threads[0] ever get used here?

lyakh · 2024-09-05T14:30:25Z

src/debug/debug_stream/debug_stream_thread_info.c

+ * returns it.
+ */
+
+static uint32_t thread_info_get_cycles(void *tid, k_thread_runtime_stats_t *thread_stats,


can we use struct k_thread *?

We could, but I am trying to hint that the pointer is used as an id only, its not accessed at all. I'll add a comment.

lyakh · 2024-09-05T14:34:57Z

src/debug/debug_stream/debug_stream_thread_info.c

+
+ /* If there is more than THREAD_INFO_MAX_THREADS threads in this core */
+ if (i == ARRAY_SIZE(ud->previous->threads))
+ LOG_INF("No place found for %s %p", name, tid);


lyakh · 2024-09-05T14:37:05Z

src/debug/debug_stream/debug_stream_thread_info.c

+ /* If no cached value was found, look for an empty slot to store the recent value */
+ for (i = 0; i < ARRAY_SIZE(ud->previous->threads); i++) {
+ if (ud->previous->threads[i].tid == NULL) {
+ ud->previous->threads[i].tid = tid;


can be a follow-up, but maybe we can have a better name than previous here. You don't store more than one snapshot, so isn't it just "current"?

It is the cycles number from the previous round. After the calculations it will be "current", but when its used next time its again from "previous" round.

lyakh · 2024-09-05T14:48:26Z

src/debug/debug_stream/debug_stream_thread_info.c

+#define THREAD_STACK_SIZE (2048)
+static K_THREAD_STACK_ARRAY_DEFINE(info_thread_stacks, CONFIG_MP_MAX_NUM_CPUS,
+ THREAD_STACK_SIZE);
+static struct k_thread info_thread[CONFIG_MP_MAX_NUM_CPUS];


CONFIG_MP_NUM_CPUS?

CONFIG_MP_NUM_CPUS is obsolete, MP_MAX_NUM_CPUS should be used:

config MP_NUM_CPUS
int "Number of CPUs/cores [DEPRECATED]"
default MP_MAX_NUM_CPUS
range 1 12
help
This is deprecated, please use MP_MAX_NUM_CPUS instead.

config MP_MAX_NUM_CPUS
int "Maximum number of CPUs/cores"
default 1
range 1 12
help
Maximum number of multiprocessing-capable cores available to the
multicpu API and SMP features.

…tion This commit adds definition for abstract debug stream record and stream header. The header file contains the necessary documentation. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>

Implementation for debug stream transportation over debug window slot. The protocol details are documented in debug_stream_slot.h header. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>

Implementation for collecting Thread stack and CPU usage statistic from Zephyr core and send the info and debug_stream records. The records encoding is defined and documented in debug_stream_thread_info.c source file. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>

Python implementation for receiving and decoding debug-stream records from debug window slot transportation. Opens SOF debugfs file "debug_stream" reads and decodes the records from the circular buffer documented in soc_debug_window_slot.h. This initial version only knows of DEBUG_STREAM_RECORD_ID_THREAD_INFO records. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>

jsarha · 2024-09-05T17:49:54Z

Following changes were made:

No cache-line size alingment for static data circular buffer descriptors
record member "size" to "size_words"
struct record_buf called bufd and not rec to avoid confusion
Typo-fixes
TODO comment for using python construct instead of ctypes stucts for data models
thread_count overflow check
INF to WRN logging promotion for thread_count overflow
Comment out sys_cache_data_flush_range()
Add comment for using "void *tid" in thread_info_get_cycles()

lyakh

thanks for addressing comments

lyakh · 2024-09-06T06:30:51Z

src/debug/debug_stream/debug_stream_slot.c

+ buf->next_seqno = 0;
+ buf->w_ptr = 0;
+ k_mutex_init(&cpu_mutex[i].m);
+ /* The core specific mutexes are now .dss which is uncached so the


lyakh · 2024-09-06T06:32:41Z

src/debug/debug_stream/debug_stream_thread_info.c

+ int i;
+
+ /* Mark the thread as active */
+ ud->active_threads[ud->thread_count] = tid;


@jsarha but does it begin with 0 or with 1? I.e. does the element ud->active_threads[0] ever get used here?

kv2019i · 2024-09-06T07:11:55Z

@marcinszkudlinski ok for you to proceed? default for this is "n".

kv2019i · 2024-09-06T07:59:40Z

@wszypelt Can you check? This seems to have failed to "Copy build to DUT" step.

jsarha · 2024-09-06T11:19:23Z

@jsarha but does it begin with 0 or with 1? I.e. does the element ud->active_threads[0] ever get used here?

@lyakh from zero of course, like in every decent c-program:

https://github.com/jsarha/sof/blob/5fe574452db992cdda5ae5cdcfc560adc0862102/src/debug/debug_stream/debug_stream_thread_info.c#L275

https://github.com/jsarha/sof/blob/5fe574452db992cdda5ae5cdcfc560adc0862102/src/debug/debug_stream/debug_stream_thread_info.c#L229

jsarha requested review from marc-hb, lgirdwood, plbossart, mmaka1, lbetlej, dbaluta and kv2019i as code owners August 23, 2024 09:33

jsarha mentioned this pull request Aug 23, 2024

intel_adsp: debug_window: Add slot type for debug-stream transport zephyrproject-rtos/zephyr#77467

Merged

lgirdwood reviewed Aug 23, 2024

View reviewed changes

marcinszkudlinski requested changes Aug 23, 2024

View reviewed changes

src/debug/debug_stream/Kconfig Show resolved Hide resolved

lyakh reviewed Aug 26, 2024

View reviewed changes

kv2019i requested changes Aug 26, 2024

View reviewed changes

marc-hb previously requested changes Aug 26, 2024

View reviewed changes

tools/debugstream/debug-stream.py Outdated Show resolved Hide resolved

marc-hb reviewed Aug 26, 2024

View reviewed changes

jsarha force-pushed the debug_stream_thread_info_pr branch from c0345cd to b391af1 Compare September 4, 2024 18:56

jsarha requested review from iuliana-prodan, abonislawski and a team as code owners September 4, 2024 18:56

jsarha force-pushed the debug_stream_thread_info_pr branch from b391af1 to 0a67a2b Compare September 5, 2024 08:32

kv2019i approved these changes Sep 5, 2024

View reviewed changes

src/debug/debug_stream/debug_stream_thread_info.c Show resolved Hide resolved

lyakh reviewed Sep 5, 2024

View reviewed changes

Jyri Sarha added 4 commits September 5, 2024 20:34

debug: debug_stream: Add debug stream record and stream header defini…

cf4ce75

…tion This commit adds definition for abstract debug stream record and stream header. The header file contains the necessary documentation. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>

debug: debug_stream_slot: Debug slot transportation for debug_stream

7673c8a

Implementation for debug stream transportation over debug window slot. The protocol details are documented in debug_stream_slot.h header. Signed-off-by: Jyri Sarha <jyri.sarha@linux.intel.com>

jsarha force-pushed the debug_stream_thread_info_pr branch from 0a67a2b to 5fe5744 Compare September 5, 2024 17:49

lyakh approved these changes Sep 6, 2024

View reviewed changes

kv2019i requested a review from marcinszkudlinski September 6, 2024 07:11

marcinszkudlinski approved these changes Sep 6, 2024

View reviewed changes

kv2019i merged commit 9152b87 into thesofproject:main Sep 6, 2024
43 of 47 checks passed

This was referenced Sep 9, 2024

Telemetry2 thread info #9084

Closed

[FEATURE][ChromeOS] Expose DSP load tracker info for the visibility to ChromeOS #5521

Open

Debug stream thread info pr #9398

Debug stream thread info pr #9398

Conversation

jsarha commented Aug 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgirdwood Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsarha Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kv2019i left a comment

Choose a reason for hiding this comment

marc-hb left a comment

Choose a reason for hiding this comment

marc-hb commented Aug 26, 2024

jsarha commented Sep 4, 2024 • edited Loading

jsarha commented Sep 5, 2024

lyakh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsarha Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsarha Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsarha Sep 5, 2024 • edited Loading

Choose a reason for hiding this comment

jsarha commented Sep 5, 2024 • edited Loading

lyakh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kv2019i commented Sep 6, 2024

kv2019i commented Sep 6, 2024

jsarha commented Sep 6, 2024

lgirdwood Sep 2, 2024 •

edited

Loading

jsarha Sep 5, 2024 •

edited

Loading

jsarha commented Sep 4, 2024 •

edited

Loading

jsarha Sep 5, 2024 •

edited

Loading

jsarha Sep 5, 2024 •

edited

Loading

jsarha Sep 5, 2024 •

edited

Loading

jsarha commented Sep 5, 2024 •

edited

Loading