-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DP: provide data to next LL module no earlier than DP deadline #8511
Conversation
comment from prev PR:
Such counter is in DP scheduler context |
* ticks 13-19 LL1 is producing data, LL2 is consuming data (both in 1ms chunks) | ||
* tick 20 - DP starts processing a new portion of 10ms data, having 10ms to finish | ||
* !!!! but LL2 has already consumed 8ms !!!! | ||
* tick 22 - LL2 is consuming the last 1ms data chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you assumption, DP need 2ms to generate 10ms data, for the second time, DP start processing on tick 20, so it should finish on tick 22, at tick 22, LL2 can get 1ms data.
assumption maybe based on DP need take 3ms to finish 10ms process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, why do you think that DP will finish in 2ms at second cycle? And 3rd? and later? What if a second pipeline starts with 2 more DPs with 3ms deadlines (having shorter deadlines and therefore scheduled before 10ms one)?
And - why do you think that every process takes the same number of CPU cycles?
The only guaranteed time is a deadline. And the infrastructure must ensure that data flow is not disturbed as long as the module finished within the deadline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e.
we have 2DPs - one 10ms, process time 5ms, second 5ms, process time 2ms
LL --> DP1 (10ms / 5ms) ---> LL ---> DP2 (5ms / 2ms) --> LL
1st cycle - DP2 is not yet processing, full CPU power to DP1
| DP1 1,2,3,4,5| finish in 5ms
later - CPU time is sliced between DP1 and DP2
|DP1 1,2,3| |DP2 1, 2| |DP1 4,5| (idle for 1ms) |DP2 1, 2| |DP1 1,2,3| |DP2 1, 2| |DP1 4,5|
^
finish in 7ms
CPU loaded in 90%, DP1 finishes in 7ms (deadline met) regardless it was processing only 5ms, DP2 finishes in 2ms (deadline met)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the underrun case, just check the example, laterexample exactly show the real scenario for dsp work load.
So, still the questions are:
- whether DP processing period are configurable? I mean it can also config with 2ms?
- if not, then there would be significant delays, take DTS as example, if DTS require a delay < 10ms, with above case, we have 20ms+ delay, how we handle this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is configurable by the module itself.
It can set any period it wants during prepare method
or it will be calculated based on OBS and data rate
5bae733
to
a07e07c
Compare
rebase to newest head - retrigger stalled CI |
src/schedule/zephyr_dp_schedule.c
Outdated
|
||
/* trigger the task */ | ||
curr_task->state = SOF_TASK_STATE_RUNNING; | ||
k_sem_give(&pdata->sem); | ||
pdata->ll_cycles_to_deadline = pdata->deadline_ll_cycles; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should go before the semaphore to avoid a race
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not matter, the semaphore give is releasing a module thread, which has no access to pdata-> context. The context itself is protected by scheduler_dp_lock / scheduler_dp_unlock
Anyway I see CI is still stalled so I have to retrigger anyway - so I'll change it because it looks suspicious at first
Note, we can ignore "sof-ci/jenkins/pr-fw-build" fails. We have a new test job "sof-ci/jenkins/pr-build" that covers both FW and the tools in one job. (and @keqiaozhang we need to make the new one "Required"). |
a07e07c
to
7d581d5
Compare
please DNM, some internal full range tests failed because of this patch - must double check the rootcause |
btw, west update fixed a lot of CI results today so pls make sure you rebase before retest. Thanks ! |
lets assume DP with 10ms period (a.k.a a deadline). It starts and finishes earlier, i.e. in 2ms providing 10ms of data LL starts consuming data in 1ms chunks and will drain 10ms buffer in 10ms, expecting a new portion of data on 11th ms BUT - the DP module deadline is still 10ms, regardless if it had finished earlier and it is completely fine that processing in next cycle takes full 10ms - as long as it fits into the deadline. It may lead to underruns: LL1 (1ms) ---> DP (10ms) -->LL2 (1ms) ticks 0..9 -> LL1 is producing 1ms data portions, DP is waiting, LL2 is waiting tick 10 - DP has enough data to run, it starts processing tick 12 - DP finishes earlier, LL2 starts consuming, LL1 is producing data ticks 13-19 LL1 is producing data, LL2 is consuming data (both in 1ms chunks) tick 20 - DP starts processing a new portion of 10ms data, having 10ms to finish !!!! but LL2 has already consumed 8ms !!!! tick 22 - LL2 is consuming the last 1ms data chunk tick 23 - DP is still processing, LL2 has no data to process !!! UNDERRUN !!!! tick 19 - DP finishes properly in a deadline time Solution: even if DP finishes before its deadline, the data must be held till deadline time, so LL2 may start processing no earlier than tick 20 Signed-off-by: Marcin Szkudlinski <marcin.szkudlinski@intel.com>
7d581d5
to
6a0f3d3
Compare
rebased to newest main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if LL2 start from 20ms, how about at this time, DP already ready to output the second 10ms data to output buffer, will DP on hold until LL2 consumed the first 10ms?
buffer at DP output is set to 2*OBS, DP always starts in 10ms period, so even if DP finishes in 0.000001ms - there will still be enough space for store processed data - because following LL will drain 10ms data till then as to latency: Example - let's say we do have a DP module with 10ms chunks, 8ms processing time, 10ms deadline
DMIC DMA buffer (10ms) - processing may start when the data are fully loaded, so 10ms latency here
Total latency -> 20ms, no matter if processing takes 8ms or 2ms - always 2*LL period
DMIC DMA (1ms) - processing may start when the data are loaded, so 1ms latency here
total latency is 23 ms It may be shorter if you be certain that DP will finish sooner every single cycle - i.e. in 4.9ms: LL 1ms + DP 10ms with 5ms deadline DMIC DMA (1ms) - processing may start when the data are loaded, so 1ms latency here
total latency is 18 ms, shorter than 10ms LL !! DP in fact introduces only 3ms of additional delay because data must wait for next LL cycle, but that's it. |
"buffer at DP output is set to 2*OBS, DP always starts in 10ms period, so even if DP finishes in 0.000001ms - there will still be enough space for store processed data - because following LL will drain 10ms data till then" Thanks, for this case, how to handle linear buffer for LL2? I added some comments in another PR.
will you take 2? |
it is a wrapped/cicrular buffer
What good would it do?
It will , no doubt, cost additional copying every LL cycle |
lets assume DP with 10ms period (a.k.a a deadline). It starts and finishes earlier, i.e. in 2ms providing 10ms of data LL starts consuming data in 1ms chunks and will drain 10ms buffer in 10ms, expecting a new portion of data on 11th ms
BUT - the DP module deadline is still 10ms,
regardless if it had finished earlier and it is completely fine that processing in next cycle takes full 10ms - as long as it fits into the deadline.
It may lead to underruns:
LL1 (1ms) ---> DP (10ms) -->LL2 (1ms)
ticks 0..9 -> LL1 is producing 1ms data portions,
DP is waiting, LL2 is waiting
tick 10 - DP has enough data to run, it starts processing tick 12 - DP finishes earlier, LL2 starts consuming,
LL1 is producing data
ticks 13-19 LL1 is producing data,
LL2 is consuming data (both in 1ms chunks)
tick 20 - DP starts processing a new portion of 10ms data,
having 10ms to finish
!!!! but LL2 has already consumed 8ms !!!!
tick 22 - LL2 is consuming the last 1ms data chunk tick 23 - DP is still processing, LL2 has no data to process
!!! UNDERRUN !!!!
tick 19 - DP finishes properly in a deadline time
Solution: even if DP finishes before its deadline, the data must be held till deadline time, so LL2 may start processing no earlier than tick 20