Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I/O Performance Measurement: Add I/O Performance Monitor #9330

Merged
merged 8 commits into from
Sep 18, 2024

Conversation

tobonex
Copy link
Contributor

@tobonex tobonex commented Jul 25, 2024

Implements I/O performance measurement feature. It counts number of transmitted bytes or other events that happen in various interfaces. Currently measured interfaces: IPC, IDC, SNDW, HDA, SSP, DMIC.

There are some interfaces that are defined in Reference Firmware that apparently are not used directly here: USB, I2C, I3C, UART, CSI_2 , DTF. But I could not confirm this as of yet.

There are some other minor issues I will mention in comments.

Copy link
Contributor Author

@tobonex tobonex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments from me.

src/ipc/ipc-common.c Outdated Show resolved Hide resolved
return;

struct io_perf_monitor_ctx *self = &perf_monitor_ctx;
k_spinlock_key_t key = k_spin_lock(&self->lock);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, this lock was there in reference (not sure what kind of a lock it was), but I think this has potential to disrupt the measured interfaces with a spinlock here. The entries to which we write are just statically defined memory, assigned to a module from performance monitor, and released by the module itself, so I don't think the global lock is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI answered my own question, smokes did not pass. Both this and release slot function cannot be locked. io_perf_monitor_release_slot has locks in the bitmap functions, so it should be safe.

@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch 5 times, most recently from 48f43da to 55dab6c Compare July 26, 2024 17:23
@tobonex tobonex marked this pull request as ready for review July 30, 2024 15:40
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor opens.

struct io_perf_monitor_ctx {
struct k_spinlock lock;
enum ipc4_perf_measurements_state_set state;
struct perf_bitmap io_performance_data_bitmap;//??????
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to explain this i.e. its it unused but needed for backwards compat we should say so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I do use it. It may be unnecessary though, as this only points to static memory.

src/debug/telemetry/performance_monitor.c Outdated Show resolved Hide resolved
int io_perf_monitor_init_data(struct io_perf_data_item **slot_id,
struct io_perf_data_item *init_data)
{
if (!slot_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put a log() here since its coming from host SW.

struct io_perf_monitor_ctx *io_perf_monitor = &perf_monitor_ctx;
struct io_perf_data_item *new_slot = io_perf_monitor_get_next_slot(io_perf_monitor);

if (!new_slot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here for log(), anything from host could be wrong.

return;

struct io_perf_monitor_ctx *self = &perf_monitor_ctx;
//k_spinlock_key_t key = k_spin_lock(&self->lock);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locking needed ? If multicore you could use an atomic add increment ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given "data" is 64bit, atomics won't work, but it would seem locking is probably not needed given a single slot is only used for one interface.

Copy link
Contributor Author

@tobonex tobonex Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if only one instance uses a specific slot this should be ok. I'm not 100% sure if this was the original design but it is a thing now. I did add a comment in .h now to make this apparent.

src/include/sof/debug/telemetry/performance_monitor.h Outdated Show resolved Hide resolved
src/audio/base_fw.c Outdated Show resolved Hide resolved
@@ -88,6 +90,11 @@ static void idc_handler(struct k_p4wq_work *work)
idc->received_msg.header = msg->header;
idc->received_msg.extension = msg->extension;

#ifdef CONFIG_SOF_TELEMETRY_IO_PERFORMANCE_MEASUREMENTS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid the #ifdefs in client to maintain readability, i.e. the telemetry header could have the #ifdef and just declare some empty static inline stubs if its not enabled in Kconfig

src/ipc/ipc-common.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple minor comments. The locking is a bit hard to follow, but otherwise looks ok.

src/debug/telemetry/Kconfig Outdated Show resolved Hide resolved
return;

struct io_perf_monitor_ctx *self = &perf_monitor_ctx;
//k_spinlock_key_t key = k_spin_lock(&self->lock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given "data" is 64bit, atomics won't work, but it would seem locking is probably not needed given a single slot is only used for one interface.


item->is_removed = true;

int idx = (item - self->io_perf_data) / sizeof(*item);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be one place where locking is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added lock for this line alone, we'll see if we get any problems in smoke tests. This function did deadlock when i locked all the code here, but it's probably because bitmap has it's own locks too.

@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch from 55dab6c to dea7611 Compare August 8, 2024 14:24
app/boards/intel_adsp_ace15_mtpm.conf Show resolved Hide resolved

ret = perf_bitmap_setbit(&self->io_performance_data_bitmap, idx);
if (ret < 0)
return NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free the bitmap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If setbit fails, free also fails (wrong idx)

item->is_removed = true;

key = k_spin_lock(&self->lock);
idx = (item - self->io_perf_data) / sizeof(*item);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks wrong - the subtraction returns the number of structures, not bytes


ret = perf_bitmap_clearbit(&self->io_performance_data_bitmap, idx);
if (ret < 0)
return ret;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is questionable - whether you should still try to free the bitmap

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only fails if idx is wrong - "free" will also fail in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and idx can only be wrong if item is wrong, and you've already used it above to set item->is_removed = true. So, if item can really be wrong, can we check it first before modifying anything? And if it cannot be wrong, then we should probably handle "impossible" cases differently here - either just print a big scary error message or even use an assert()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there's no check for null., and there should be one, added. As item is within a certain memory range we could check if it is within it also.

}

static int
io_perf_monitor_stopped(struct io_perf_monitor_ctx *self)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this never returns errors, so can be void. Besides the name sounds like it's checking whether the monitor has stopped, while this isn't what this function is doing

#endif
}

static int io_global_perf_data_get(uint32_t *data_off_size, char *data)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why char? what does it have to do with characters / strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better question would be: why all the other handlers have this kind of interface

@@ -317,6 +336,11 @@ static enum task_state ipc_do_cmd(void *data)
{
struct ipc *ipc = data;

#ifdef CONFIG_SOF_TELEMETRY_IO_PERFORMANCE_MEASUREMENTS
/* Increment performance counters */
io_perf_monitor_update_data(ipc->io_perf_in_msg_count, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why IPC commands are counted for performance? In some use-cases there are many IPCs, in others none, like plain playback or capture?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ref. FW this feature did also count some events like IPC IDC and some interrupts.

@@ -481,6 +483,11 @@ static void idc_complete(void *data)
uint32_t type = iTS(idc->received_msg.header);
k_spinlock_key_t key;

#ifdef CONFIG_SOF_TELEMETRY_IO_PERFORMANCE_MEASUREMENTS
/* Increment performance counters */
io_perf_monitor_update_data(idc->io_perf_out_msg_count, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, how does the number of IDCs describe IO performance?

src/audio/dai-zephyr.c Show resolved Hide resolved
io_perf_monitor_init_data(&dd->io_perf_bytes_count, &init_data);
}
#endif

return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in the INVALID case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should treat performance as something optional. Perf setup failing should not make everything else fail

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then maybe at least print a warning in the default case at line 509 above - not a blocker, can be a follow up patch

@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch from dea7611 to 2cf375d Compare August 29, 2024 13:24

ret = perf_bitmap_clearbit(&self->io_performance_data_bitmap, idx);
if (ret < 0)
return ret;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and idx can only be wrong if item is wrong, and you've already used it above to set item->is_removed = true. So, if item can really be wrong, can we check it first before modifying anything? And if it cannot be wrong, then we should probably handle "impossible" cases differently here - either just print a big scary error message or even use an assert()

if (ret < 0)
return ret;

return 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this should've been

return perf_bitmap_free(&performance_data_bitmap, idx);

but if you agree to use asserts, then something like

ret = perf_bitmap_free(&performance_data_bitmap, idx);
assert(!ret);
return ret;

You could also consider return 0 but then if built with asserts disabled, the compiler might complain about assigned but not used ret

@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch from 2cf375d to a611cc2 Compare August 30, 2024 15:22
Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tobonex I assume when IO is enabled, the the data will be gathered on every IO action and will be reset when D3. Btw, it occurs to me now we are gathering a lot of good data than we can also additionally now dump via the Zephyr shell. @kv2019i

src/debug/telemetry/performance_monitor.c Show resolved Hide resolved
@@ -433,7 +433,8 @@ int io_perf_monitor_init(void)
}

/* init performance monitor using Zephyr */
SYS_INIT(io_perf_monitor_init, APPLICATION, CONFIG_APPLICATION_INIT_PRIORITY);
/* IMPORTANT , this init is now in IPC init, temporarily */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, we should also state the transition plan in the comment so that everyone know when this will be resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is connected to the io_perf_monitor_init comment at the top, not sure if we have a nice place for the init. Leaving it near ipc_init works, but it's a bit ugly.

If we don't have a better place for this init, I will just leave it near ipc_init for now, and merge this temporary commit to another, or rename it.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but nothing blocking merge.

src/debug/telemetry/Kconfig Outdated Show resolved Hide resolved
src/debug/telemetry/performance_monitor.c Show resolved Hide resolved
src/ipc/ipc-common.c Outdated Show resolved Hide resolved
@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch from a611cc2 to 08b6164 Compare September 4, 2024 14:52
@lgirdwood
Copy link
Member

Lot of failures, could have been lab power off event. rerun CI.

@lgirdwood
Copy link
Member

SOFCI TEST

@tobonex
Copy link
Contributor Author

tobonex commented Sep 11, 2024

Some CI fails are due to newly added asserts failing. I'll take a look.

@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch from 08b6164 to 38c8379 Compare September 13, 2024 14:35
Adds config option for enabling I/O performance.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Implements I/O performance measurement feature. It counts number of
transmitted bytes or other events that happen in various interfaces.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Adds IO_PERF_MEASUREMENTS_STATE and IO_GLOBAL_PERF_DATA IPCs.
Those can be used to change state of I/O performance monitor and
extract the measured data.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Set up a measurement of number of input and output IPCs.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Set up a counter of input and output IDCs.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Adds I/O performance measurement for audio unterfaces in DAI (SNDW, DMIC,
SSP, HDA).

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
Enable I/O performance for mtl and lnl

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch from 38c8379 to f7d2315 Compare September 17, 2024 07:35
Zephyr's bitarray had a problem when clearing the bit before freeing
it. Not clearing it changes nothing in overall logic so we can just not
do it.
Also fixes and issue with computing bitmap idx.

Signed-off-by: Tobiasz Dryjanski <tobiaszx.dryjanski@intel.com>
@tobonex tobonex force-pushed the telemetry_global_io_perf_final branch from f7d2315 to cade98a Compare September 17, 2024 08:46
@kv2019i kv2019i merged commit 1fe942f into thesofproject:main Sep 18, 2024
44 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants