DP: provide data to next LL module no earlier than DP deadline #8511

marcinszkudlinski · 2023-11-22T13:58:03Z

lets assume DP with 10ms period (a.k.a a deadline). It starts and finishes earlier, i.e. in 2ms providing 10ms of data LL starts consuming data in 1ms chunks and will drain 10ms buffer in 10ms, expecting a new portion of data on 11th ms

BUT - the DP module deadline is still 10ms,
regardless if it had finished earlier and it is completely fine that processing in next cycle takes full 10ms - as long as it fits into the deadline.

It may lead to underruns:

LL1 (1ms) ---> DP (10ms) -->LL2 (1ms)

ticks 0..9 -> LL1 is producing 1ms data portions,
DP is waiting, LL2 is waiting
tick 10 - DP has enough data to run, it starts processing tick 12 - DP finishes earlier, LL2 starts consuming,
LL1 is producing data
ticks 13-19 LL1 is producing data,
LL2 is consuming data (both in 1ms chunks)
tick 20 - DP starts processing a new portion of 10ms data,
having 10ms to finish
!!!! but LL2 has already consumed 8ms !!!!
tick 22 - LL2 is consuming the last 1ms data chunk tick 23 - DP is still processing, LL2 has no data to process
!!! UNDERRUN !!!!
tick 19 - DP finishes properly in a deadline time

Solution: even if DP finishes before its deadline, the data must be held till deadline time, so LL2 may start processing no earlier than tick 20

marcinszkudlinski · 2023-11-22T14:02:18Z

comment from prev PR:

@lgirdwood thinking aloud - would this be better as a int i.e. a counter and could be set to any delay value needed and decremented on each LL tick ?

Such counter is in DP scheduler context
I did not put it here as this delayed startup matters at 1st run only, but the counter is valid at all times

btian1 · 2023-11-23T02:32:09Z

src/include/sof/audio/module_adapter/module/generic.h

+	 * ticks 13-19 LL1 is producing data, LL2 is consuming data (both in 1ms chunks)
+	 * tick 20  - DP starts processing a new portion of 10ms data, having 10ms to finish
+	 *	      !!!! but LL2 has already consumed 8ms !!!!
+	 * tick 22 - LL2 is consuming the last 1ms data chunk


As you assumption, DP need 2ms to generate 10ms data, for the second time, DP start processing on tick 20, so it should finish on tick 22, at tick 22, LL2 can get 1ms data.

assumption maybe based on DP need take 3ms to finish 10ms process.

again, why do you think that DP will finish in 2ms at second cycle? And 3rd? and later? What if a second pipeline starts with 2 more DPs with 3ms deadlines (having shorter deadlines and therefore scheduled before 10ms one)?

And - why do you think that every process takes the same number of CPU cycles?

The only guaranteed time is a deadline. And the infrastructure must ensure that data flow is not disturbed as long as the module finished within the deadline

i.e.

we have 2DPs - one 10ms, process time 5ms, second 5ms, process time 2ms

LL --> DP1 (10ms / 5ms) ---> LL ---> DP2 (5ms / 2ms) --> LL

1st cycle - DP2 is not yet processing, full CPU power to DP1
| DP1 1,2,3,4,5| finish in 5ms

later - CPU time is sliced between DP1 and DP2

|DP1 1,2,3| |DP2 1, 2| |DP1 4,5| (idle for 1ms) |DP2 1, 2| |DP1 1,2,3| |DP2 1, 2| |DP1 4,5| ^ finish in 7ms

CPU loaded in 90%, DP1 finishes in 7ms (deadline met) regardless it was processing only 5ms, DP2 finishes in 2ms (deadline met)

I understand the underrun case, just check the example, laterexample exactly show the real scenario for dsp work load.

So, still the questions are:

whether DP processing period are configurable? I mean it can also config with 2ms?

if not, then there would be significant delays, take DTS as example, if DTS require a delay < 10ms, with above case, we have 20ms+ delay, how we handle this situation?

it is configurable by the module itself.
It can set any period it wants during prepare method
or it will be calculated based on OBS and data rate

marcinszkudlinski · 2023-11-23T14:57:52Z

rebase to newest head - retrigger stalled CI

lyakh · 2023-11-24T07:22:13Z

src/schedule/zephyr_dp_schedule.c


 				/* trigger the task */
 				curr_task->state = SOF_TASK_STATE_RUNNING;
 				k_sem_give(&pdata->sem);
+				pdata->ll_cycles_to_deadline = pdata->deadline_ll_cycles;


I think this should go before the semaphore to avoid a race

does not matter, the semaphore give is releasing a module thread, which has no access to pdata-> context. The context itself is protected by scheduler_dp_lock / scheduler_dp_unlock

Anyway I see CI is still stalled so I have to retrigger anyway - so I'll change it because it looks suspicious at first

kv2019i · 2023-11-24T09:06:56Z

Note, we can ignore "sof-ci/jenkins/pr-fw-build" fails. We have a new test job "sof-ci/jenkins/pr-build" that covers both FW and the tools in one job. (and @keqiaozhang we need to make the new one "Required").

marcinszkudlinski · 2023-11-24T14:19:01Z

please DNM, some internal full range tests failed because of this patch - must double check the rootcause

lgirdwood · 2023-11-24T16:36:08Z

please DNM, some internal full range tests failed because of this patch - must double check the rootcause

btw, west update fixed a lot of CI results today so pls make sure you rebase before retest. Thanks !

lets assume DP with 10ms period (a.k.a a deadline). It starts and finishes earlier, i.e. in 2ms providing 10ms of data LL starts consuming data in 1ms chunks and will drain 10ms buffer in 10ms, expecting a new portion of data on 11th ms BUT - the DP module deadline is still 10ms, regardless if it had finished earlier and it is completely fine that processing in next cycle takes full 10ms - as long as it fits into the deadline. It may lead to underruns: LL1 (1ms) ---> DP (10ms) -->LL2 (1ms) ticks 0..9 -> LL1 is producing 1ms data portions, DP is waiting, LL2 is waiting tick 10 - DP has enough data to run, it starts processing tick 12 - DP finishes earlier, LL2 starts consuming, LL1 is producing data ticks 13-19 LL1 is producing data, LL2 is consuming data (both in 1ms chunks) tick 20 - DP starts processing a new portion of 10ms data, having 10ms to finish !!!! but LL2 has already consumed 8ms !!!! tick 22 - LL2 is consuming the last 1ms data chunk tick 23 - DP is still processing, LL2 has no data to process !!! UNDERRUN !!!! tick 19 - DP finishes properly in a deadline time Solution: even if DP finishes before its deadline, the data must be held till deadline time, so LL2 may start processing no earlier than tick 20 Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>

marcinszkudlinski · 2023-12-15T15:35:32Z

rebased to newest main
Removing dnm, please proceed

btian1

if LL2 start from 20ms, how about at this time, DP already ready to output the second 10ms data to output buffer, will DP on hold until LL2 consumed the first 10ms?

marcinszkudlinski · 2023-12-18T13:00:31Z

if LL2 start from 20ms, how about at this time, DP already ready to output the second 10ms data to output buffer, will DP on hold until LL2 consumed the first 10ms?

buffer at DP output is set to 2*OBS, DP always starts in 10ms period, so even if DP finishes in 0.000001ms - there will still be enough space for store processed data - because following LL will drain 10ms data till then

as to latency:
DP must produce data within deadline, a deadline is the worst case for DP
If it is certain that the DP will always be faster than its processing time - there's no problem with setting shorter deadline, even 1ms for 10ms processing.
Default deadline is a processing period - meaning the longest possible value. Deadline cannot, under any circumstances, be longer than processing chunk because following LL module won't get data on time

Example - let's say we do have a DP module with 10ms chunks, 8ms processing time, 10ms deadline

use LL with 10ms period

DMIC DMA buffer (10ms) - processing may start when the data are fully loaded, so 10ms latency here

process of 10ms data LL1
process of 10ms data LL2
process of 10ms data LL3
process of 10ms data LL4
process of 10ms data LL5 (in total 10ms of LL processing - including 8 of DP)
HOST DMA buffer (10ms) - transfer may start when the data are loaded, immediately when LL finishes.

Total latency -> 20ms, no matter if processing takes 8ms or 2ms - always 2*LL period

LL 1ms + DP 10ms data chunks with 10 ms deadline

DMIC DMA (1ms) - processing may start when the data are loaded, so 1ms latency here

process of 1ms data LL1
process of 1ms data LL2 (1ms latency for processing)
accumulate 10ms of data for DP (processing may start when buffer is loaded - 10 ms latency)
process of 10ms data DP - 8ms, with 10ms declared deadline = 10ms latency - 22ms in total
accumulate 10ms of data for DP (no latency here - just data storage till deadline)
process of 1ms data LL4
process of 1ms data LL5 (1ms latency for processing)
HOST DMA (1ms) transfer may start when the data are loaded, immediately when LL finishes.

total latency is 23 ms
only 3 more than in case of LL 10ms

It may be shorter if you be certain that DP will finish sooner every single cycle - i.e. in 4.9ms:

LL 1ms + DP 10ms with 5ms deadline

DMIC DMA (1ms) - processing may start when the data are loaded, so 1ms latency here

process of 1ms data LL1
process of 1ms data LL2 (1ms latency for processing)
accumulate 10ms of data for DP (processing may start when buffer is loaded - 10 ms latency)
process of 10ms data DP1 - 4,9ms, declared deadline 5ms, so latency is 5ms
accumulate 10ms of data for DP (no latency here)
process of 1ms data LL4
process of 1ms data LL5 (1ms latency for processing)
HOST DMA (1ms) transfer may start when the data are loaded, immediately when LL finishes.

total latency is 18 ms, shorter than 10ms LL !!

DP in fact introduces only 3ms of additional delay because data must wait for next LL cycle, but that's it.
In fact, you're able to have shorter latency using DP

btian1 · 2023-12-19T05:47:17Z

"buffer at DP output is set to 2*OBS, DP always starts in 10ms period, so even if DP finishes in 0.000001ms - there will still be enough space for store processed data - because following LL will drain 10ms data till then"

Thanks, for this case, how to handle linear buffer for LL2? I added some comments in another PR.

use wrapped buffer in DP output buffer.
always move data to head once 1ms consumed by LL2.

will you take 2?

marcinszkudlinski · 2023-12-19T13:01:51Z

@btian1

use wrapped buffer in DP output buffer.

it is a wrapped/cicrular buffer

always move data to head once 1ms consumed by LL2.

What good would it do?

DP won't be able to use additional space,
LL2 won't be able to use linearity - as it won't know if it's being fed by DP (linear in your case) or by LL (currently circular)

It will , no doubt, cost additional copying every LL cycle

marcinszkudlinski marked this pull request as ready for review November 22, 2023 14:02

marcinszkudlinski requested review from pblaszko, dbaluta, LaurentiuM1234, ranj063, jxstelter, lgirdwood, plbossart, mmaka1, lbetlej and kv2019i as code owners November 22, 2023 14:02

marcinszkudlinski mentioned this pull request Nov 22, 2023

Bunch of fixes to DP processing #8467

Closed

kv2019i approved these changes Nov 22, 2023

View reviewed changes

btian1 reviewed Nov 23, 2023

View reviewed changes

marcinszkudlinski force-pushed the dp-fix2 branch from 5bae733 to a07e07c Compare November 23, 2023 14:57

lyakh reviewed Nov 24, 2023

View reviewed changes

marcinszkudlinski force-pushed the dp-fix2 branch from a07e07c to 7d581d5 Compare November 24, 2023 09:43

marcinszkudlinski changed the title ~~DP: provide data to next LL module no earlier than DP deadline~~ [DNM] DP: provide data to next LL module no earlier than DP deadline Nov 24, 2023

lgirdwood added this to the v2.9 milestone Dec 11, 2023

marcinszkudlinski force-pushed the dp-fix2 branch from 7d581d5 to 6a0f3d3 Compare December 15, 2023 15:34

marcinszkudlinski changed the title ~~[DNM] DP: provide data to next LL module no earlier than DP deadline~~ DP: provide data to next LL module no earlier than DP deadline Dec 15, 2023

btian1 reviewed Dec 18, 2023

View reviewed changes

lgirdwood approved these changes Dec 19, 2023

View reviewed changes

lgirdwood merged commit 3d4883a into thesofproject:main Dec 19, 2023
43 of 44 checks passed

marcinszkudlinski deleted the dp-fix2 branch June 21, 2024 07:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DP: provide data to next LL module no earlier than DP deadline #8511

DP: provide data to next LL module no earlier than DP deadline #8511

marcinszkudlinski commented Nov 22, 2023

marcinszkudlinski commented Nov 22, 2023

btian1 Nov 23, 2023

marcinszkudlinski Nov 23, 2023 •

edited

Loading

marcinszkudlinski Nov 23, 2023 •

edited

Loading

btian1 Nov 24, 2023

marcinszkudlinski Dec 13, 2023

marcinszkudlinski commented Nov 23, 2023

lyakh Nov 24, 2023

marcinszkudlinski Nov 24, 2023

kv2019i commented Nov 24, 2023

marcinszkudlinski commented Nov 24, 2023 •

edited

Loading

lgirdwood commented Nov 24, 2023

marcinszkudlinski commented Dec 15, 2023

btian1 left a comment

marcinszkudlinski commented Dec 18, 2023

btian1 commented Dec 19, 2023

marcinszkudlinski commented Dec 19, 2023 •

edited

Loading

DP: provide data to next LL module no earlier than DP deadline #8511

DP: provide data to next LL module no earlier than DP deadline #8511

Conversation

marcinszkudlinski commented Nov 22, 2023

marcinszkudlinski commented Nov 22, 2023

btian1 Nov 23, 2023

Choose a reason for hiding this comment

marcinszkudlinski Nov 23, 2023 • edited Loading

Choose a reason for hiding this comment

marcinszkudlinski Nov 23, 2023 • edited Loading

Choose a reason for hiding this comment

btian1 Nov 24, 2023

Choose a reason for hiding this comment

marcinszkudlinski Dec 13, 2023

Choose a reason for hiding this comment

marcinszkudlinski commented Nov 23, 2023

lyakh Nov 24, 2023

Choose a reason for hiding this comment

marcinszkudlinski Nov 24, 2023

Choose a reason for hiding this comment

kv2019i commented Nov 24, 2023

marcinszkudlinski commented Nov 24, 2023 • edited Loading

lgirdwood commented Nov 24, 2023

marcinszkudlinski commented Dec 15, 2023

btian1 left a comment

Choose a reason for hiding this comment

marcinszkudlinski commented Dec 18, 2023

btian1 commented Dec 19, 2023

marcinszkudlinski commented Dec 19, 2023 • edited Loading

marcinszkudlinski Nov 23, 2023 •

edited

Loading

marcinszkudlinski Nov 23, 2023 •

edited

Loading

marcinszkudlinski commented Nov 24, 2023 •

edited

Loading

marcinszkudlinski commented Dec 19, 2023 •

edited

Loading