Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPC: fix 0 CPC module clock scaling #9319

Merged
merged 8 commits into from
Aug 30, 2024
Merged

Conversation

lyakh
Copy link
Collaborator

@lyakh lyakh commented Jul 22, 2024

Currently when a 0-CPC module is used, it will raise the system clock to the maximum and keep it there. Fix this by recalculating needed KCPS every time when a pipeline is created or deleted from scratch instead of adding or subtracting module values.

@lyakh
Copy link
Collaborator Author

lyakh commented Jul 22, 2024

replaces the draft version #9318

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An open on the CPC calculation.

@@ -41,6 +41,7 @@ CONFIG_DMA_DW_LLI_POOL_SIZE=50
CONFIG_INTEL_MODULES=y
CONFIG_LIBRARY_MANAGER=y
CONFIG_LIBRARY_AUTH_SUPPORT=y
CONFIG_LIBRARY_PIPELINE_FORCE_MIN_LIFETIME=100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, not following commit message, it not clear where the CPC calculation is transitioning from/to, i.e. CPC at pipeline start/stop is start and end point of this commit, pls also say why current method is broken. i.e. what use case.

Copy link
Collaborator

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the rebalance pipepeline code looks clean and a good improvement, but a bit unsure what to make of the new min-lifetime Kconfig.

if (comp_is_llext(current))
/* pipelines are allocated using rzalloc(), so initially .init_time = 0 */
current->pipeline->init_time = sof_cycle_get_64();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not following why this is different for llext?

UPDATE: now saa the new Kconfig for min lifetime, but maybe a comment would be good here as well.

life_time = k_cyc_to_ms_near64(sof_cycle_get_64() - ipc_pipe->pipeline->init_time);
pipe_dbg(ipc_pipe->pipeline, "Extend pipeline life beyond %llu", life_time);
if (life_time < CONFIG_LIBRARY_PIPELINE_FORCE_MIN_LIFETIME)
k_msleep(CONFIG_LIBRARY_PIPELINE_FORCE_MIN_LIFETIME - life_time);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still looks quite odd. I guess there is some race in start and stopping the pipeline tasks, but this seems awfully brittle way to address it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to go except the final patch atm, need to work out why this i problem for llext only.

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 15, 2024

I think this is good to go except the final patch atm, need to work out why this i problem for llext only.

@lgirdwood This isn't difficult to answer - it's logging from LLEXT again. You might remember, that earlier I removed log prints from converted modules in their clean-up paths. Because when LLEXT modules are unloaded, if they use logging at that time, the logger thread runs after the module is unloaded and accesses unmapped memory. Now with the speaker-test it also happens with logs on initialisation and configuration paths - when pipelines are created and very quickly destroyed again, the same thing happens.
But there are CI failures ATM with this PR, which I have been trying to resolve since yesterday. Seem to be timing related because reproduction isn't 100% and changes with apparently unrelated changes to the code. Investigating.

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 16, 2024

CI failures:

  1. cAVS: (actually unrelated) one platform untested
  2. MTL: known pause-release failures https://sof-ci.01.org/sofpr/PR9319/build7075/devicetest/index.html
  3. LNL: known pause-release, nocodec alsabat and suspend-resume https://sof-ci.01.org/sofpr/PR9319/build7074/devicetest/index.html

Extract code to obtain a pointer to the module base configuration,
depending on its type, into a separate function.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Extract the module_is_llext() macro into a header and add a function
to identify a loadable module type of a component.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>

DECLARE_TR_CTX(drc_tr, SOF_UUID(drc_uuid), LOG_LEVEL_INFO);
extern const struct sof_uuid drc_uuid;
extern struct tr_ctx drc_tr;

void drc_reset_state(struct drc_state *state)
{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is optional. With it dropped log entries look like

[  140.980950] <inf> drc:

and without it like

[  326.927381] <inf>

So using this and similar log context relocation for LLEXT modules adds a bit of context when messages are dropped, but it isn't critical

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 20, 2024

MTL: known pause-release failures https://sof-ci.01.org/sofpr/PR9319/build7075/devicetest/index.html

No, we did not have any pause-release failures on MTL for a very long time. The only known pause-release failures are:

  1. [LNL] hang, timeout or crash in pause-release linux#5109 on LNL
  2. [BUG] [stable-v2.2] overrun in pause/resume tests on HDA configs #9139 with XTOS in the stable-v2.2 branch (not tested in PRs, only in daily test runs)

Copy link
Member

@cujomalainey cujomalainey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approval for DRC code

@marc-hb
Copy link
Collaborator

marc-hb commented Aug 20, 2024

No, we did not have any pause-release failures on MTL for a very long time.

In fact we didn't have ANY failure whatsoever on MTL for a very long time.

Same tests failed again in https://sof-ci.01.org/sofpr/PR9319/build7188/devicetest/index.html so this PR is definitely guilty.

EDIT: dunno what changed but ACE https://sof-ci.01.org/sofpr/PR9319/build7210/devicetest/index.html is passing now.

Adding or subtracting module CPC when starting and stopping pipelines
is brittle. Particularly it's prone to mistakes with modules, not
specifying their CPC explicitly. Instead recalculate CPC every time
a pipeline is started or stopped.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
The library manager isn't supported on TGL, disable it there.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
No need to set clock frequency if the new one is equal to the current
one.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
uuid.h uses uint32_t, uint16_t and uint8_t types, it must include
stdint.h or inttypes.h.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
Logging context can be accessed from the deferred logging
thread. Therefore it must be always available to avoid
access to unmapped memory when an LLEXT module is unloaded.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
west.yml Outdated
@@ -43,7 +43,7 @@ manifest:

- name: zephyr
repo-path: zephyr
revision: 795ac34f291c2f9895e7a3a03eafc053ad1d36f2
revision: pull/77289/head
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still needed - or has west been updated now ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lgirdwood zephyrproject-rtos/zephyr#77289 still under review, waiting for the assignee in particular

@lyakh lyakh changed the title CPC: fix 0 CPC module clock scaling [DNM] CPC: fix 0 CPC module clock scaling Aug 21, 2024
@lyakh lyakh marked this pull request as draft August 21, 2024 14:43
@lyakh
Copy link
Collaborator Author

lyakh commented Aug 21, 2024

converted to draft to avoid accidental merging while waiting for a Zephyr PR to be merged

Zephyr PR 77289 fixed deferred logging with LLEXT objects, update to
a Zephyr version, containing it.

Signed-off-by: Guennadi Liakhovetski <guennadi.liakhovetski@linux.intel.com>
@lyakh lyakh marked this pull request as ready for review August 30, 2024 07:30
@lyakh lyakh changed the title [DNM] CPC: fix 0 CPC module clock scaling CPC: fix 0 CPC module clock scaling Aug 30, 2024
@lyakh
Copy link
Collaborator Author

lyakh commented Aug 30, 2024

LNL test results are too https://sof-ci.01.org/sofpr/PR9319/build7508/devicetest/index.html truncated, MTL results https://sof-ci.01.org/sofpr/PR9319/build7509/devicetest/index.html were green here, but I'll rerun all Jenkins tests to get a better LNL coverage

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 30, 2024

SOFCI TEST

@lyakh
Copy link
Collaborator Author

lyakh commented Aug 30, 2024

LNL build https://sof-ci.01.org/sofpr/PR9319/build7512/devicetest/index.html looks better now AFAICS - only known errors.

@kv2019i kv2019i merged commit 728fb98 into thesofproject:main Aug 30, 2024
44 of 47 checks passed
@lyakh lyakh deleted the cpc-min branch August 30, 2024 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants