-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: refactor agent to be more useful #1855
Conversation
2cb6c03
to
c82408b
Compare
src/lib/agent.c
Outdated
/* warning timeout */ | ||
if (delta > sa->warn_timeout) | ||
trace_sa_error("validate(), ll drift detected, delta = " | ||
"%u", delta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about logging a time of the delay in microseconds instead of bare delta? I am afraid during debugging delta won't tell us much and we will end up calculating it manually anyway.
SOFCI TEST |
5c427a8
to
a702441
Compare
SOFCI TEST |
7614dd8
to
1ee5b4a
Compare
SOFCI TEST |
1 similar comment
SOFCI TEST |
@xiulipan This change causes agent panic on cAVS platforms with QEMU, however works perfectly fine on real HW. Can you look into this? |
@tlauda Sure will check the issue with QEMU to see if it QEMU issue or FW issue. |
1ee5b4a
to
b47cb70
Compare
@tlauda Checked the code with QEMU, it seems QEMU can not work was the 1ms agent scheduler. I am still working on the QEMU code to make it work as fast as HW but it seems very difficult to achieve to goal. @lgirdwood any idea about how to make the QEMU work as the requirement of this high rate timer setting. |
@xiulipan It's intended to detect drift in our system tick. I've already added huge panic threshold until some additional tweaks won't be implemented, which will stabilize it more. Why it isn't failing on other platforms? The requirement is the same. |
@xiulipan @tlauda @lgirdwood Perhaps this is the place where things might be configured more flexibly per specific platform/environment requirements:
As Tomek explained, the intention was to add a heartbeat to something we may call a system tick. Typically it is set to 1ms but may be different per platform. Therefore I'd rephrase the formula to use CONFIG_ kind of timeout configured to longer periods in other environments if they are unable to keep that pace. |
b47cb70
to
19cc72b
Compare
@tlauda @macchian I tried to refine the cavs timer implementation in QEME to make sure timer ticks can work as we expected but it seems not that easy. The older agent have a through-hold of 750ms but now we have 1.5ms. |
51597b0
to
37282e8
Compare
@xiulipan When will you have time to build separate config for QEMU? It's the only thing blocking this PR. |
@tlauda for that I will need to refine the whole logic for both Travis and Jenkins CI. It would not be that easy. Best ETA is next week. |
@xiulipan Not a problem. Go for it. |
@tlauda Can you check https://sof-ci.01.org/sofpr/PR1855/build3509/boottest/dump-skl.txt it seems 5ms will still have random issue. Will try with 10ms or 100ms. |
@xiulipan 10 ms helped? |
37282e8
to
89fb047
Compare
@xiulipan Any updates? |
@tlauda could you rebase, I will retry the test about 10 more times to avoid regression in CI. |
89fb047
to
3d8e030
Compare
@xiulipan Done. |
@tlauda Let merge this. I have already make the QEMU test not blocking with 10ms agent time. I also run manually test with 100 times. No dsp panic happen. But be aware that the qemu test may be failed by the agent. Please check the QEMU test result avoid false alarm blocking test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The agent works much better than old one. Nice patch.
3d8e030
to
6f851c5
Compare
@xiulipan I'm not sure if I understand correctly. So from now on the Travis QEMU will always fail? |
@xiulipan Where did you adjust new config for qemu? In scripts I see that you still build firmware with just one flag changed for qemu |
@tlauda Forget about the travis, will send the patch to for Travis. |
@jajanusz I did some find and replace in the source code to change the CONFIG_SYSTICK_PERIOD default value. |
@tlauda Travis update is here: #1996 local test with PR1855 and the travis update is here https://travis-ci.org/xiulipan/sof/builds/602186422 Please make sure we get the right order to merge these PRs |
Adds additional CONFIG_SYSTICK_PERIOD to Kconfig, which is used to drive timer based low latency scheduler and also will be used as a timeout check value for system agent. Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
Removes PLATFORM_LL_DEFAULT_TIMEOUT definition. It has been replaced by CONFIG_SYSTICK_PERIOD. Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
Refactors agent's functionality to really verify DSP responsiveness. Instead of updating last_idle time on passive level before entering waiti, let's change it to last_check and update it on actual scheduler task execution. Task is executed above passive level, so it should preempt every other long running processing. Since we don't have any additional precautions to guarantee that low latency tick will happen exactly at the scheduled time, let's define warning and panic thresholds to control stability of the system. Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
Removes disable and enable functions, because they are no longer needed. Agent will now update its lat_check time above passive level, so every long running task will be preempted. Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
Removes PLATFORM_IDLE_TIME definition, because it's no longer used. Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
Changes scheduler_init_edf and task_main_init function definitions as sof structure is no longer needed. It was only used to pass agent object to main idle task. Signed-off-by: Tomasz Lauda <tomasz.lauda@linux.intel.com>
6f851c5
to
368a739
Compare
Refactors agent's functionality to really verify DSP
responsiveness. Instead of updating last_idle time
on passive level before entering waiti, let's change it
to last_check and update it on actual scheduler task
execution. Task is executed above passive level, so it
should preempt every other long running processing.
Since we don't have any additional precautions to guarantee
that low latency tick will happen exactly at the scheduled time,
let's define warning and panic thresholds to control stability
of the system.
Signed-off-by: Tomasz Lauda tomasz.lauda@linux.intel.com