Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schedule: zephyr_ll: Fix schedule bug for multi instances #8653

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/include/sof/schedule/ll_schedule_domain.h
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,8 @@ static inline bool domain_is_pending(struct ll_schedule_domain *domain,
{
bool ret;

assert(domain->ops->domain_is_pending);
if (!domain->ops->domain_is_pending)
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you add a .domain_is_pending method to the DMA scheduling domain, so presumably this change isn't needed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lyakh I think zephyr_domain.c doesn't implement this, so this is needed not to break non-DMA LL cases.


ret = domain->ops->domain_is_pending(domain, task, comp);

Expand Down
37 changes: 36 additions & 1 deletion src/schedule/zephyr_dma_domain.c
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
#include <rtos/alloc.h>
#include <sof/lib/cpu.h>
#include <sof/lib/dma.h>
#include <sof/lib/notifier.h>
#include <sof/platform.h>
#include <sof/schedule/ll_schedule.h>
#include <sof/schedule/ll_schedule_domain.h>
Expand Down Expand Up @@ -98,12 +99,15 @@ static int zephyr_dma_domain_register(struct ll_schedule_domain *domain,
static int zephyr_dma_domain_unregister(struct ll_schedule_domain *domain,
struct task *task,
uint32_t num_tasks);
static bool zephyr_dma_domain_is_pending(struct ll_schedule_domain *domain,
struct task *task, struct comp_dev **comp);
static void zephyr_dma_domain_task_cancel(struct ll_schedule_domain *domain,
struct task *task);

static const struct ll_schedule_domain_ops zephyr_dma_domain_ops = {
.domain_register = zephyr_dma_domain_register,
.domain_unregister = zephyr_dma_domain_unregister,
.domain_is_pending = zephyr_dma_domain_is_pending,
.domain_task_cancel = zephyr_dma_domain_task_cancel
};

Expand Down Expand Up @@ -150,6 +154,18 @@ static void zephyr_dma_domain_thread_fn(void *p1, void *p2, void *p3)
}
}

void pipe_task_notify(void *arg, enum notify_id type, void *data)
{
struct pipeline_task *pipe_task = (void *)arg;
struct task *task;

if (!pipe_task)
return;
task = &pipe_task->task;

task->state = SOF_TASK_STATE_PENDING;
}

static void dma_irq_handler(void *data)
{
struct zephyr_dma_domain_irq *irq_data;
Expand All @@ -169,8 +185,11 @@ static void dma_irq_handler(void *data)
list_for_item(i, &irq_data->channels) {
chan_data = container_of(i, struct zephyr_dma_domain_channel, list);

if (dma_interrupt_legacy(chan_data->channel, DMA_IRQ_STATUS_GET))
if (dma_interrupt_legacy(chan_data->channel, DMA_IRQ_STATUS_GET)) {
dma_interrupt_legacy(chan_data->channel, DMA_IRQ_CLEAR);
notifier_event(chan_data, NOTIFIER_ID_DMA_IRQ,
NOTIFIER_TARGET_CORE_LOCAL, NULL, 0);
}
}

/* clear IRQ - the mask argument is unused ATM */
Expand Down Expand Up @@ -351,6 +370,9 @@ static int register_dma_irq(struct zephyr_dma_domain *domain,

irq_local_enable(flags);

notifier_register(pipe_task, chan_data, NOTIFIER_ID_DMA_IRQ,
pipe_task_notify, 0);

return 0;
}
}
Expand Down Expand Up @@ -597,6 +619,19 @@ static int zephyr_dma_domain_unregister(struct ll_schedule_domain *domain,
return 0;
}

static bool zephyr_dma_domain_is_pending(struct ll_schedule_domain *domain,
struct task *task, struct comp_dev **comp)
{
struct pipeline_task *pipe_task;

pipe_task = pipeline_task_get(task);

if (!pipe_task->registrable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you have 2 registrable pipeline tasks and 2 non-registrable? If I recall correctly this case is supported by the mixer topologies we have (registrable 1 is playback while registrable 2 is capture). With the current approach we're still going to end up executing the non-registrable tasks (i.e: those that transfer data into the mixer) every time the registrable tasks are scheduled. Also, I'm not sure messing around with the state of the tasks inside the domain is an OK idea as they're used by the scheduler.

Also, is this patch still needed? If I recall correctly you already submitted a workaround for this issue. If not then I would suggest waiting for the transition to the native interface + timer domain. For now, it seems to work fine on i.MX93 and I'm hoping it will also work for our other platforms.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did have a workaround for this issue, but it not fix the root cause. As I said before, the root cause is schedule bug.
If you think don't need this patch, we'd better switch to native interface + timer domain in next release.

return true;
else
return (task->state == SOF_TASK_STATE_PENDING);
}

static void zephyr_dma_domain_task_cancel(struct ll_schedule_domain *domain,
struct task *task)
{
Expand Down
14 changes: 12 additions & 2 deletions src/schedule/zephyr_ll.c
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ static void zephyr_ll_run(void *data)
for (list = sch->tasks.next; !list_is_empty(&sch->tasks); list = sch->tasks.next) {
enum task_state state;
struct zephyr_ll_pdata *pdata;
struct ll_schedule_domain *domain = sch->ll_domain;

task = container_of(list, struct task, list);
pdata = task->priv_data;
Expand All @@ -198,19 +199,26 @@ static void zephyr_ll_run(void *data)
continue;
}

pdata->run = true;
task->state = SOF_TASK_STATE_RUNNING;
if (domain_is_pending(domain, task, NULL)) {
pdata->run = true;
task->state = SOF_TASK_STATE_RUNNING;
}

/* Move the task to a temporary list */
list_item_del(list);
list_item_append(list, &task_head);

/* avoid to execute the task is not ready to run */
if (task->state != SOF_TASK_STATE_RUNNING)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, re inline comments - scheduler can be complex so lets add comments to help.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I can see the fails in CI, and it's related intel test. But I can't do the test in my side cause we have no intel board.
Can you please help? I also will look the patch to see if any I can change to fix the CI fails.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, @TangleZ I'll try to debug this a bit early next week, agreed this is difficult to debug without a board available.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your help!

continue;

zephyr_ll_unlock(sch, &flags);

/*
* task's .run() should only return either
* SOF_TASK_STATE_COMPLETED or SOF_TASK_STATE_RESCHEDULE
*/

state = do_task_run(task);
if (state != SOF_TASK_STATE_COMPLETED &&
state != SOF_TASK_STATE_RESCHEDULE) {
Expand All @@ -237,6 +245,8 @@ static void zephyr_ll_run(void *data)
break;
}
}
/* update task state */
task->state = state;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's this addition on L247-248 that is causing some unexpected failures. It seems test configurations that use the chain-dma feature (basicly bypasses creation of full audio pipelines and uses a simple copy LL task to push data through two interfaces). I can see DSP panics in configurations where chain-dma is enabled:
https://sof-ci.01.org/sofpr/PR8653/build1416/devicetest/index.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the CI I can't see any info about fails in my side:
TEST #{{ internalResultID }}
CI build start time:
On-device test with:
Kernel Build Info
SOF Build Info
PR ID: {{ planResult.linuxPrID }}
Linux Branch: {{ planResult.linuxBranch }}
Commit: Merge {{ shortenCommit(planResult.linuxParents.head) }} into {{ shortenCommit(planResult.linuxParents.target) }} ⇒ {{ shortenCommit(planResult.linuxSource.linuxCommit) }}
Kconfig Branch: {{ shortenCommit(planResult.kconfigBranch) }}
Kconfig Commit: {{ shortenCommit(planResult.linuxSource.kconfigCommit) }}
PR ID: {{ planResult.sofPrID }}
SOF Branch: {{ planResult.sofBranch }}
Commit: Merge {{ shortenCommit(planResult.sofParents.head) }} into {{ shortenCommit(planResult.sofParents.target) }} ⇒ {{ shortenCommit(planResult.sofSource.sofCommit) }}
Zephyr Commit: {{ shortenCommit(planResult.sofSource.zephyrCommit) }}
Copy Link

Can you please help me to understand where show DSP panics and I will try to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TangleZ can you refresh and enable scripts in your browser for this site, it should then show you logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TangleZ Sorry for late response, out for xmas. Here's example log showing a panic:
https://sof-ci.01.org/sofpr/PR8653/build1416/devicetest/index.html?model=TGLU_UP_HDA-ipc4&testcase=multiple-pause-resume-50

This is running "~/sof-test/test-case/multiple-pause-resume.sh -r 50" on a Intel TGL based laptop. The panic hits when the test uses the HDMI/DP PCMs that use chain-dma.

}

/* Move tasks back */
Expand Down
Loading