init: zephyr: Fix memory leak during secondary core init #9006

tmleman · 2024-04-05T14:22:20Z

This patch refines the initialization process for secondary cores in a multicore environment when using Zephyr as the RTOS. The patch introduces a check_restore function specifically for Zephyr, which checks if basic core structures (IDC, notifier, schedulers) have been previously allocated and are still present in memory, indicating that the system is not undergoing a cold boot.

By adding this check, the system avoids unnecessary re-allocation of these structures during the power-up sequence of secondary cores, effectively preventing the memory leak observed during repeated power cycle tests.

fix #9005

lyakh · 2024-04-08T07:31:44Z

src/init/init.c

+	 * has not been powered off.
+	 */
+	if (!idc || !notifier || !schedulers)
+		return 0;


even though this function returns int (can we change it to return bool?) still return idc && notifier && schedulers; should still work. Also, seeing this check raises a question for me - what if some of these pointers are NULL? I think the answer is that this isn't possible, right? If any of these allocations fail initially, the whole initialisation fails, so we never reach this point. That means, that checking just one of these pointers should be enough: if one of them is set, all the others must be set too?

can we change it to return bool?

Done

what if some of these pointers are NULL? I think the answer is that this isn't possible, right?

True, initialization should fail at the start (first initialization). It's definitely the case with the scheduler.

that checking just one of these pointers should be enough: if one of them is set, all the others must be set too?

I think one would be enough, but more is better, right?

I think one would be enough, but more is better, right?

@tmleman why and how? Maybe it is better, or maybe not. If you claim, that it's better, then I'd like an explanation of why and how and a proper handling of it. So, if you detect that one of them is NULL and others aren't, this can mean that the initialisation code got broken. So I would expect an appropriate handling of such a case. Currently a case of some NULL and some non-NULL seems to indicate some inconsistent state and we leave it at it. A better solution could be adding an explicit "initialisation complete" flag and checking it, instead of using some implicit indicators

this can mean that the initialisation code got broken.

I think you have given the best answer to the question why more is better.

Perhaps it would be helpful to add a piece of code that would handle the case where only part of the components has been initialized. As for the idea with the variable (flag) that would indicate that the initialization has been completed, I don't see the value in it at the moment. What would be the difference compared to checking the state of just one component, for example, one that is initialized last?

At this moment, fixing regressions introduced by the previous refactor is a higher priority for me than starting another one.

plbossart · 2024-04-08T16:13:02Z

this seems to introduce a pretty bad regression on MeteorLake, see e.g. https://sof-ci.01.org/sofpr/PR9006/build3770/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=check-playback-10sec

src/init/init.c

tmleman · 2024-04-09T07:43:15Z

src/init/init.c

-		return 0;
-
-	return 1;
+	return !!idc && !!task && !!notifier && !!schedulers;


@lyakh I could use the same function if I skipped checking the 'task' pointer.

This patch refactors the `check_restore` function to return a `bool` instead of an `int`. This change enhances code readability and clarifies the intent of the function, which is to return a true or false value based on the presence of core structures in memory. No functional changes are introduced by this patch; it is purely a code quality improvement. Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>

This patch refines the initialization process for secondary cores in a multicore environment when using Zephyr as the RTOS. The patch introduces a `check_restore` function specifically for Zephyr, which checks if basic core structures (IDC, notifier, schedulers) have been previously allocated and are still present in memory, indicating that the system is not undergoing a cold boot. By adding this check, the system avoids unnecessary re-allocation of these structures during the power-up sequence of secondary cores, effectively preventing the memory leak observed during repeated power cycle tests. fix thesofproject#9005 Signed-off-by: Tomasz Leman <tomasz.m.leman@intel.com>

tmleman requested review from lgirdwood, plbossart, mmaka1, lbetlej, dbaluta and kv2019i as code owners April 5, 2024 14:22

tmleman requested review from abonislawski, dnikodem, marcinszkudlinski and softwarecki April 5, 2024 14:25

softwarecki approved these changes Apr 5, 2024

View reviewed changes

kv2019i approved these changes Apr 8, 2024

View reviewed changes

kv2019i mentioned this pull request Apr 8, 2024

[BUG] error 7 unsupported request to SET_PIPELINE_STATE IPC, could not alloc size = 1536 #8966

Closed

lyakh reviewed Apr 8, 2024

View reviewed changes

tmleman force-pushed the topic/upstream/issue/9005/core_reinit branch from 42cf341 to 7da2919 Compare April 8, 2024 08:52

tmleman requested a review from lyakh April 8, 2024 12:57

tmleman force-pushed the topic/upstream/issue/9005/core_reinit branch from 7da2919 to 07cd45e Compare April 8, 2024 18:41

lyakh reviewed Apr 9, 2024

View reviewed changes

src/init/init.c Show resolved Hide resolved

src/init/init.c Outdated Show resolved Hide resolved

tmleman force-pushed the topic/upstream/issue/9005/core_reinit branch from 07cd45e to c6df4a6 Compare April 9, 2024 07:38

tmleman commented Apr 9, 2024

View reviewed changes

tmleman added 2 commits April 9, 2024 10:48

tmleman requested a review from lyakh April 9, 2024 09:21

dnikodem approved these changes Apr 9, 2024

View reviewed changes

lgirdwood approved these changes Apr 9, 2024

View reviewed changes

lgirdwood merged commit a43981e into thesofproject:main Apr 9, 2024
44 of 45 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init: zephyr: Fix memory leak during secondary core init #9006

init: zephyr: Fix memory leak during secondary core init #9006

tmleman commented Apr 5, 2024

lyakh Apr 8, 2024

tmleman Apr 8, 2024

lyakh Apr 9, 2024

tmleman Apr 9, 2024

plbossart commented Apr 8, 2024

tmleman Apr 9, 2024

init: zephyr: Fix memory leak during secondary core init #9006

init: zephyr: Fix memory leak during secondary core init #9006

Conversation

tmleman commented Apr 5, 2024

lyakh Apr 8, 2024

Choose a reason for hiding this comment

tmleman Apr 8, 2024

Choose a reason for hiding this comment

lyakh Apr 9, 2024

Choose a reason for hiding this comment

tmleman Apr 9, 2024

Choose a reason for hiding this comment

plbossart commented Apr 8, 2024

tmleman Apr 9, 2024

Choose a reason for hiding this comment