Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug stream thread info pr #9398

Merged
merged 4 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,5 @@ rsource "math/Kconfig"
rsource "library_manager/Kconfig"

rsource "debug/telemetry/Kconfig"

rsource "debug/debug_stream/Kconfig"
3 changes: 3 additions & 0 deletions src/debug/debug_stream/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# SPDX-License-Identifier: BSD-3-Clause

add_local_sources_ifdef(CONFIG_SOF_DEBUG_STREAM_SLOT sof debug_stream_slot.c)
26 changes: 26 additions & 0 deletions src/debug/debug_stream/Kconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# SPDX-License-Identifier: BSD-3-Clause

config SOF_DEBUG_STREAM_SLOT
bool "Enable SOF debug stream debug window slot"
help
jsarha marked this conversation as resolved.
Show resolved Hide resolved
Debug stream is an abstract stream of records of debug
information from SOF system that are streamed from SOF DSP
to the host side for decoding and presentation. This option
enables transferring the records from DSP to host over a
debug window slot.

if SOF_DEBUG_STREAM_SLOT

config SOF_DEBUG_STREAM_SLOT_NUMBER
int "Debug window slot where to put debug stream slot"
default 3
range 0 14
help
Which debug slot to reserve for Debug Stream. Remember to map
the slot with MEMORY_WIN_2_SIZE in soc/intel/intel_adsp/Kconfig,
in Zephyr source tree. The slots are 4k in size and one slot is
used for descriptors, so for slot 3 to be mapped, the WIN_2_SIZE
must be (1 + 3) * 4k = 16k or greater.

endif

156 changes: 156 additions & 0 deletions src/debug/debug_stream/debug_stream_slot.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
// SPDX-License-Identifier: BSD-3-Clause
//
// Copyright(c) 2024 Intel Corporation.

#include <zephyr/logging/log.h>
#include <zephyr/spinlock.h>
#include <adsp_debug_window.h>
#include <adsp_memory.h>
#include <sof/common.h>
#include <rtos/string.h>
#include <user/debug_stream.h>
#include <user/debug_stream_slot.h>

LOG_MODULE_REGISTER(debug_strem_slot);

struct cpu_mutex {
struct k_mutex m;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I assume mutex is per core and not not lock for all core ? - if so we should inline comment this description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, if this is nothing else but a mutex, why embed it into another structure? Just use an array of struct k_mutex objects. And if you do insist of having a separate structure for it, maybe use the debug_ namespace with it too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are mutexes meant for different cores and AFAIK they should be on different cache-lines to avoid trouble. That is why they are inside a struct that I can force to be cache-aligned.

} __aligned(CONFIG_DCACHE_LINE_SIZE);

/* CPU specific mutexes for each circular buffer */
static struct cpu_mutex cpu_mutex[CONFIG_MP_MAX_NUM_CPUS];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this puts cpu_mutex[] into .bss which is uncached in present builds. That means, that you don't need to cache-line align them and to flush caches. In general, even if it were cached, you would always access it as such, mixing cached and uncached access to a mutex would very likely break its functionality. So you wouldn't need to flush caches anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, then that is a problem. Is all static (and global) data uncached? If that is the case, then I guess I should reserve the table from heap, but are all heap allocations automatically cache-line aligned?

The mutex only to control access within one core so AFAIU the cache should not be a problem. This is for the situation where higher priority thread interrupts record sending and that higher priority thread tries to send a record of its own.

I'd still say its safer to keep the mutexes on different cache-lines, even if the memory is uncached, so I suggest not to do anything right now, but move the mutexes to cached memory later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsarha I'd keep them uncached and not bother with cache lines. Should be both simpler and safer. Though a bit slower, but don't think this slow down would be significant for you. But if you do decide to go via cache, you can get cached aliases to that memory, but then yes, you'd need to reserve a whole cache line-sized buffer for each.


static const int debug_stream_slot = CONFIG_SOF_DEBUG_STREAM_SLOT_NUMBER;

static struct debug_stream_slot_hdr *debug_stream_get_slot(void)
{
return (struct debug_stream_slot_hdr *)ADSP_DW->slots[debug_stream_slot];
}

static
struct debug_stream_circular_buf *
debug_stream_get_circular_buffer(struct debug_stream_section_descriptor *desc, unsigned int core)
{
struct debug_stream_slot_hdr *hdr = debug_stream_get_slot();
void *ptr;

if (hdr->hdr.magic != DEBUG_STREAM_IDENTIFIER) {
LOG_ERR("Debug stream slot not initialized.");
return NULL;
}

*desc = hdr->section_desc[core];
LOG_DBG("Section %u (desc %u %u %u)", core, desc->core_id, desc->buf_words, desc->offset);

return (struct debug_stream_circular_buf *) (((uint8_t *)hdr) + desc->offset);
}

int debug_stream_slot_send_record(struct debug_stream_record *rec)
{
struct debug_stream_section_descriptor desc;
struct debug_stream_circular_buf *buf =
debug_stream_get_circular_buffer(&desc, arch_proc_id());
uint32_t record_size = rec->size_words;
uint32_t record_start, buf_remain;

LOG_DBG("Sending record %u id %u len %u\n", rec->seqno, rec->id, rec->size_words);

if (!buf)
return -ENODEV;

if (rec->size_words >= desc.buf_words) {
LOG_ERR("Record too big %u >= %u (desc %u %u %u)", rec->size_words,
desc.buf_words, desc.core_id, desc.buf_words, desc.offset);
return -ENOMEM;
}
k_mutex_lock(&cpu_mutex[arch_proc_id()].m, K_FOREVER);

rec->seqno = buf->next_seqno++;
rec->size_words = record_size + 1; /* +1 for size at the end of record */
record_start = buf->w_ptr;
buf->w_ptr = (record_start + record_size) % desc.buf_words;
jsarha marked this conversation as resolved.
Show resolved Hide resolved
buf_remain = desc.buf_words - record_start;
if (buf_remain < record_size) {
uint32_t rec_remain = record_size - buf_remain;
int ret;

ret = memcpy_s(&buf->data[record_start], buf_remain * sizeof(uint32_t),
rec, buf_remain * sizeof(uint32_t));
assert(!ret);
ret = memcpy_s(&buf->data[0], desc.buf_words * sizeof(uint32_t),
((uint32_t *) rec) + buf_remain, rec_remain * sizeof(uint32_t));
assert(!ret);
} else {
int ret;

ret = memcpy_s(&buf->data[record_start], buf_remain * sizeof(uint32_t),
rec, record_size * sizeof(uint32_t));
assert(!ret);
}
/* Write record size again after the record */
buf->data[buf->w_ptr] = record_size + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that writes the size of the previous record after it? Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to find records behind w_ptr. Its used when the user space client starts or if there was a buffer overrun.

buf->w_ptr = (buf->w_ptr + 1) % desc.buf_words;

k_mutex_unlock(&cpu_mutex[arch_proc_id()].m);

LOG_DBG("Record %u id %u len %u sent\n", rec->seqno, rec->id, record_size);
return 0;
}

static int debug_stream_slot_init(void)
{
struct debug_stream_slot_hdr *hdr = debug_stream_get_slot();
size_t hdr_size = offsetof(struct debug_stream_slot_hdr,
section_desc[CONFIG_MP_MAX_NUM_CPUS]);
size_t section_area_size = ADSP_DW_SLOT_SIZE - hdr_size;
size_t section_size = ALIGN_DOWN(section_area_size /
CONFIG_MP_MAX_NUM_CPUS,
CONFIG_DCACHE_LINE_SIZE);
size_t offset = hdr_size;
int i;

LOG_INF("%u sections of %u bytes, hdr %u, secton area %u\n",
CONFIG_MP_MAX_NUM_CPUS, section_size, hdr_size,
section_area_size);

if (ADSP_DW->descs[debug_stream_slot].type != 0)
LOG_WRN("Slot %d was not free: %u", debug_stream_slot,
ADSP_DW->descs[debug_stream_slot].type);
ADSP_DW->descs[debug_stream_slot].type = ADSP_DW_SLOT_DEBUG_STREAM;

hdr->hdr.magic = DEBUG_STREAM_IDENTIFIER;
hdr->hdr.hdr_size = hdr_size;
hdr->total_size = hdr_size + CONFIG_MP_MAX_NUM_CPUS * section_size;
hdr->num_sections = CONFIG_MP_MAX_NUM_CPUS;
for (i = 0; i < CONFIG_MP_MAX_NUM_CPUS; i++) {
hdr->section_desc[i].core_id = i;
hdr->section_desc[i].buf_words =
(section_size - offsetof(struct debug_stream_circular_buf, data[0]))/
sizeof(uint32_t);
hdr->section_desc[i].offset = offset;
LOG_INF("sections %u, size %u, offset %u\n",
i, section_size, offset);
offset += section_size;
}
for (i = 0; i < CONFIG_MP_MAX_NUM_CPUS; i++) {
struct debug_stream_section_descriptor desc;
struct debug_stream_circular_buf *buf =
debug_stream_get_circular_buffer(&desc, i);

buf->next_seqno = 0;
buf->w_ptr = 0;
k_mutex_init(&cpu_mutex[i].m);
/* The core specific mutexes are now .dss which is uncached so the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.bss

* following line is commented out. However, since the mutexes are
* core specific there should be nothing preventing from having them
* in cached memory.
*
* sys_cache_data_flush_range(&cpu_mutex[i], sizeof(cpu_mutex[i]));
*/
}
LOG_INF("Debug stream slot initialized\n");

return 0;
}

SYS_INIT(debug_stream_slot_init, APPLICATION, CONFIG_KERNEL_INIT_PRIORITY_DEFAULT);
138 changes: 138 additions & 0 deletions src/include/user/debug_stream_slot.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
/* SPDX-License-Identifier: BSD-3-Clause
*
* Copyright(c) 2024 Intel Corporation.
*/

#ifndef __SOC_DEBUG_WINDOW_SLOT_H__
#define __SOC_DEBUG_WINDOW_SLOT_H__

#include <user/debug_stream.h>

/*
* This file describes how Debug Stream can be transported over debug
* window memory slot. The debug stream is an abstract media API for
* passing debug data records from SOF DSP to the host system for
* presentation. The debug stream records are described in debug_stream.h.
*
* This example describes how Debug Stream data is transferred from
* DSP to host side using a debug memory window slot. To learn more
* see soc/intel/intel_adsp/common/include/adsp_debug_window.h
* under Zephyr source tree.
*
* DEBUG_STREAM slot is reserved from SRAM window and a header is
* written in the beginning of the slot. The header is a static data
* structure that is initialized once at DSP boot time. All elements
* in bellow example are 32-bit unsigned integers:
*
* -------------------------------------------------- ---
* | id = DEBUG_STREAM_IDENTIFIER | |
* | total_size = 4096 | 64 bytes
* | num_sections = CONFIG_MP_MAX_NUM_CPUS * | |
* | <padding> | |
* -------------------------------------------------- ---
* | section_descriptor [] = { | |
* | { | |
* | core_id = 0 | |
* | size = 1344 | 64 bytes
* | offset = 64 | |
* | } | |
* | <padding> | |
* -------------------------------------------------- ---
* | { | |
* | core_id = 1 | |
* | size = 1344 | 64 bytes
* | offset = 1344+64 | |
* | } | |
* | <padding> | |
* -------------------------------------------------- ---
* | { | |
* | core_id = 2 | |
* | size = 1344 | 64 bytes
* | offset = 2*1344+64 | |
* | } | |
* | } | |
* | <padding> | |
* -------------------------------------------------- ---
* * CONFIG_MP_MAX_NUM_CPUS is 3 in this example
*
* The header contains generic information like identifier, total
* size, and number of sections. After the generic fields there is an
* array of section descriptors. Each array element is cacheline
* aligned. The array has 'num_sections' number of elements. Each
* element in the array describes a circular buffer, one for each DSP
* core.
*
* The remaining memory in the debug window slot is divided between
* those sections. The buffers are not necessarily of equal size, like
* in this example. The circular buffers are all cache line aligned,
* 64 in this example. One section looks like this:
*
* -------------------------------------------------- ---
* | next_seqno = <counter for written objects> | |
* | w_ptr = <write position in 32-bit words> | 1344 bytes
* | buffer_data[1344/4-2] = { | |
* | <debug data records> | |
* | } | |
* -------------------------------------------------- ---
*
* The data records are described in debug_strem.h. In the case of
* debug window memory slot the next record should always be aligned
* to word (4-byte) boundary.
*
* The debug stream writes the records of abstract data to the
* circular buffer, and updates the w_ptr when the record is
* completely written. The host side receiver tries to keep up with the
* w_ptr and keeps track of its read position. The size of the record
* is written - again - after each record and before the next. This is
* to allow parsing the stream backwards in an overrun recovery
* situation. The w_ptr value is updated last, when the record is
* completely written.
*/

#include <stdint.h>

/* Core specific section descriptor
*
* Section descriptor defines core ID, offset and size of the circular
* buffer in the debug window slot.
*/
struct debug_stream_section_descriptor {
uint32_t core_id; /* Core ID */
uint32_t buf_words; /* Circular buffer size in 32-bit words */
uint32_t offset; /* Core section offset */
} __packed;

/* Debug window slot header for Debug Stream.
*
* The header should be written in the beginning of the slot.
*/
struct debug_stream_slot_hdr {
struct debug_stream_hdr hdr;
uint32_t total_size; /* total size of payload including all sections */
uint32_t num_sections; /* number of core specific sections */
struct debug_stream_section_descriptor section_desc[];
} __packed;

struct debug_stream_circular_buf {
uint32_t next_seqno;
uint32_t w_ptr;
uint32_t data[];
} __aligned(CONFIG_DCACHE_LINE_SIZE);

struct debug_stream_record;
/**
* \brief Send debug stream records over debug window slot
*
* \param[in] rec the record to be written to circular buffer
*
* The debug window slot is initialized automatically at DSP boot
* time, and the core specific circular buffer is selected
* automatically.
*
* \return 0 on success
* -ENODEV if debug stream slot is not configured
* -ENOMEM if the record is too big
*/
int debug_stream_slot_send_record(struct debug_stream_record *rec);

#endif /* __SOC_DEBUG_WINDOW_SLOT_H__ */
1 change: 1 addition & 0 deletions zephyr/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ endmacro()
add_subdirectory(../src/init/ init_unused_install/)
add_subdirectory(../src/ipc/ ipc_unused_install/)
add_subdirectory(../src/debug/telemetry/ telemetry_unused_install/)
add_subdirectory(../src/debug/debug_stream/ debug_stream_unused_install/)
add_subdirectory(test/)


Expand Down