Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug stream thread info pr #9398

Merged
merged 4 commits into from
Sep 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,5 @@ rsource "math/Kconfig"
rsource "library_manager/Kconfig"

rsource "debug/telemetry/Kconfig"

rsource "debug/debug_stream/Kconfig"
5 changes: 5 additions & 0 deletions src/debug/debug_stream/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# SPDX-License-Identifier: BSD-3-Clause

add_local_sources_ifdef(CONFIG_SOF_DEBUG_STREAM_SLOT sof debug_stream_slot.c)

add_local_sources_ifdef(CONFIG_SOF_DEBUG_STREAM_THREAD_INFO sof debug_stream_thread_info.c)
47 changes: 47 additions & 0 deletions src/debug/debug_stream/Kconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# SPDX-License-Identifier: BSD-3-Clause

config SOF_DEBUG_STREAM_SLOT
bool "Enable SOF debug stream debug window slot"
help
jsarha marked this conversation as resolved.
Show resolved Hide resolved
Debug stream is an abstract stream of records of debug
information from SOF system that are streamed from SOF DSP
to the host side for decoding and presentation. This option
enables transferring the records from DSP to host over a
debug window slot.

if SOF_DEBUG_STREAM_SLOT

config SOF_DEBUG_STREAM_SLOT_NUMBER
int "Debug window slot where to put debug stream slot"
default 3
range 0 14
help
Which debug slot to reserve for Debug Stream. Remember to map
the slot with MEMORY_WIN_2_SIZE in soc/intel/intel_adsp/Kconfig,
in Zephyr source tree. The slots are 4k in size and one slot is
used for descriptors, so for slot 3 to be mapped, the WIN_2_SIZE
must be (1 + 3) * 4k = 16k or greater.

config SOF_DEBUG_STREAM_THREAD_INFO
bool "Enable Zephyr thread info reporting through Debug-Stream"
select INIT_STACKS
select THREAD_MONITOR
select THREAD_STACK_INFO
select THREAD_RUNTIME_STATS
jsarha marked this conversation as resolved.
Show resolved Hide resolved
help
This activates Zephyr thread info reporting through
Debug-Stream. Thread info reports some basic live data from
the Zephyr threads, like stack usage high-water-mark and CPU
usage. Please select THREAD_NAME=y for the thread names more
than just hex numbers.

config SOF_DEBUG_STREAM_THREAD_INFO_INTERVAL
int "Thread information collection interval in seconds"
depends on SOF_DEBUG_STREAM_THREAD_INFO
default 2
range 1 10
help
Decides how often thread info runs and checks execution cycle
statistics and stack usage.

endif
156 changes: 156 additions & 0 deletions src/debug/debug_stream/debug_stream_slot.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
// SPDX-License-Identifier: BSD-3-Clause
//
// Copyright(c) 2024 Intel Corporation.

#include <zephyr/logging/log.h>
#include <zephyr/spinlock.h>
#include <adsp_debug_window.h>
#include <adsp_memory.h>
#include <sof/common.h>
#include <rtos/string.h>
#include <user/debug_stream.h>
#include <user/debug_stream_slot.h>

LOG_MODULE_REGISTER(debug_strem_slot);

struct cpu_mutex {
struct k_mutex m;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I assume mutex is per core and not not lock for all core ? - if so we should inline comment this description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, if this is nothing else but a mutex, why embed it into another structure? Just use an array of struct k_mutex objects. And if you do insist of having a separate structure for it, maybe use the debug_ namespace with it too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are mutexes meant for different cores and AFAIK they should be on different cache-lines to avoid trouble. That is why they are inside a struct that I can force to be cache-aligned.

} __aligned(CONFIG_DCACHE_LINE_SIZE);

/* CPU specific mutexes for each circular buffer */
static struct cpu_mutex cpu_mutex[CONFIG_MP_MAX_NUM_CPUS];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this puts cpu_mutex[] into .bss which is uncached in present builds. That means, that you don't need to cache-line align them and to flush caches. In general, even if it were cached, you would always access it as such, mixing cached and uncached access to a mutex would very likely break its functionality. So you wouldn't need to flush caches anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, then that is a problem. Is all static (and global) data uncached? If that is the case, then I guess I should reserve the table from heap, but are all heap allocations automatically cache-line aligned?

The mutex only to control access within one core so AFAIU the cache should not be a problem. This is for the situation where higher priority thread interrupts record sending and that higher priority thread tries to send a record of its own.

I'd still say its safer to keep the mutexes on different cache-lines, even if the memory is uncached, so I suggest not to do anything right now, but move the mutexes to cached memory later.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsarha I'd keep them uncached and not bother with cache lines. Should be both simpler and safer. Though a bit slower, but don't think this slow down would be significant for you. But if you do decide to go via cache, you can get cached aliases to that memory, but then yes, you'd need to reserve a whole cache line-sized buffer for each.


static const int debug_stream_slot = CONFIG_SOF_DEBUG_STREAM_SLOT_NUMBER;

static struct debug_stream_slot_hdr *debug_stream_get_slot(void)
{
return (struct debug_stream_slot_hdr *)ADSP_DW->slots[debug_stream_slot];
}

static
struct debug_stream_circular_buf *
debug_stream_get_circular_buffer(struct debug_stream_section_descriptor *desc, unsigned int core)
{
struct debug_stream_slot_hdr *hdr = debug_stream_get_slot();
void *ptr;

if (hdr->hdr.magic != DEBUG_STREAM_IDENTIFIER) {
LOG_ERR("Debug stream slot not initialized.");
return NULL;
}

*desc = hdr->section_desc[core];
LOG_DBG("Section %u (desc %u %u %u)", core, desc->core_id, desc->buf_words, desc->offset);

return (struct debug_stream_circular_buf *) (((uint8_t *)hdr) + desc->offset);
}

int debug_stream_slot_send_record(struct debug_stream_record *rec)
{
struct debug_stream_section_descriptor desc;
struct debug_stream_circular_buf *buf =
debug_stream_get_circular_buffer(&desc, arch_proc_id());
uint32_t record_size = rec->size_words;
uint32_t record_start, buf_remain;

LOG_DBG("Sending record %u id %u len %u\n", rec->seqno, rec->id, rec->size_words);

if (!buf)
return -ENODEV;

if (rec->size_words >= desc.buf_words) {
LOG_ERR("Record too big %u >= %u (desc %u %u %u)", rec->size_words,
desc.buf_words, desc.core_id, desc.buf_words, desc.offset);
return -ENOMEM;
}
k_mutex_lock(&cpu_mutex[arch_proc_id()].m, K_FOREVER);

rec->seqno = buf->next_seqno++;
rec->size_words = record_size + 1; /* +1 for size at the end of record */
record_start = buf->w_ptr;
buf->w_ptr = (record_start + record_size) % desc.buf_words;
jsarha marked this conversation as resolved.
Show resolved Hide resolved
buf_remain = desc.buf_words - record_start;
if (buf_remain < record_size) {
uint32_t rec_remain = record_size - buf_remain;
int ret;

ret = memcpy_s(&buf->data[record_start], buf_remain * sizeof(uint32_t),
rec, buf_remain * sizeof(uint32_t));
assert(!ret);
ret = memcpy_s(&buf->data[0], desc.buf_words * sizeof(uint32_t),
((uint32_t *) rec) + buf_remain, rec_remain * sizeof(uint32_t));
assert(!ret);
} else {
int ret;

ret = memcpy_s(&buf->data[record_start], buf_remain * sizeof(uint32_t),
rec, record_size * sizeof(uint32_t));
assert(!ret);
}
/* Write record size again after the record */
buf->data[buf->w_ptr] = record_size + 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that writes the size of the previous record after it? Why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to find records behind w_ptr. Its used when the user space client starts or if there was a buffer overrun.

buf->w_ptr = (buf->w_ptr + 1) % desc.buf_words;

k_mutex_unlock(&cpu_mutex[arch_proc_id()].m);

LOG_DBG("Record %u id %u len %u sent\n", rec->seqno, rec->id, record_size);
return 0;
}

static int debug_stream_slot_init(void)
{
struct debug_stream_slot_hdr *hdr = debug_stream_get_slot();
size_t hdr_size = offsetof(struct debug_stream_slot_hdr,
section_desc[CONFIG_MP_MAX_NUM_CPUS]);
size_t section_area_size = ADSP_DW_SLOT_SIZE - hdr_size;
size_t section_size = ALIGN_DOWN(section_area_size /
CONFIG_MP_MAX_NUM_CPUS,
CONFIG_DCACHE_LINE_SIZE);
size_t offset = hdr_size;
int i;

LOG_INF("%u sections of %u bytes, hdr %u, secton area %u\n",
CONFIG_MP_MAX_NUM_CPUS, section_size, hdr_size,
section_area_size);

if (ADSP_DW->descs[debug_stream_slot].type != 0)
LOG_WRN("Slot %d was not free: %u", debug_stream_slot,
ADSP_DW->descs[debug_stream_slot].type);
ADSP_DW->descs[debug_stream_slot].type = ADSP_DW_SLOT_DEBUG_STREAM;

hdr->hdr.magic = DEBUG_STREAM_IDENTIFIER;
hdr->hdr.hdr_size = hdr_size;
hdr->total_size = hdr_size + CONFIG_MP_MAX_NUM_CPUS * section_size;
hdr->num_sections = CONFIG_MP_MAX_NUM_CPUS;
for (i = 0; i < CONFIG_MP_MAX_NUM_CPUS; i++) {
hdr->section_desc[i].core_id = i;
hdr->section_desc[i].buf_words =
(section_size - offsetof(struct debug_stream_circular_buf, data[0]))/
sizeof(uint32_t);
hdr->section_desc[i].offset = offset;
LOG_INF("sections %u, size %u, offset %u\n",
i, section_size, offset);
offset += section_size;
}
for (i = 0; i < CONFIG_MP_MAX_NUM_CPUS; i++) {
struct debug_stream_section_descriptor desc;
struct debug_stream_circular_buf *buf =
debug_stream_get_circular_buffer(&desc, i);

buf->next_seqno = 0;
buf->w_ptr = 0;
k_mutex_init(&cpu_mutex[i].m);
/* The core specific mutexes are now .dss which is uncached so the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.bss

* following line is commented out. However, since the mutexes are
* core specific there should be nothing preventing from having them
* in cached memory.
*
* sys_cache_data_flush_range(&cpu_mutex[i], sizeof(cpu_mutex[i]));
*/
}
LOG_INF("Debug stream slot initialized\n");

return 0;
}

SYS_INIT(debug_stream_slot_init, APPLICATION, CONFIG_KERNEL_INIT_PRIORITY_DEFAULT);
Loading
Loading