Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check-playback: add check for FW performance #959

Closed

Conversation

kv2019i
Copy link
Contributor

@kv2019i kv2019i commented Sep 14, 2022

NOTE: This PR is based on top of #956

check-playback: add check for FW performance

Add a tool to analyze the performance traces that are emitted
by the FW low-latency scheduler, and use this tool in the check
playback test case.

The tool requires output of FW logs and currently only supports
the mtrace output available in SOF Zephyr IPC4 builds. This can
be extended later for other output types.

The sof-ll-timer-check.py can do some simple global check like
observing occurences of low-latency scheduler overruns and raise
an error in this case.

For most of the analysis, sof-ll-timer-check.py needs reference
data detailing what the expected performance level should be.
To implement this, a simple JSON database is added via
sof-ll-timer-check-db.json that is used to query the reference
data. As this data is manually maintained, it is expected
reference data is only used for a small set of key use-cases for
any given platform.

cc:

keqiaozhang and others added 6 commits September 14, 2022 09:27
Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
mtrace is a new logging tool. Tool to stream data from Linux
SOF driver mtrace debugfs interface to standard output.

Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
This function can check if the running firmware is IPC4 Zephyr.

Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
The mtrace is only support IPC4 platforms and mtrace should run
while tests are running ATM. If mtrace is not running, DSP will
stop outputting logs when buffer is full. So we need to keep the
mtrace running during the test, or you won't see the error log.

Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
Need to kill the mtrace-reader.py process after the test
to prevent the conflicts between each case.

Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
Add a tool to analyze the performance traces that are emitted
by the FW low-latency scheduler, and use this tool in the check
playback test case.

The tool requires output of FW logs and currently only supports
the mtrace output available in SOF Zephyr IPC4 builds. This can
be extended later for other output types.

The sof-ll-timer-check.py can do some simple global check like
observing occurences of low-latency scheduler overruns and raise
an error in this case.

For most of the analysis, sof-ll-timer-check.py needs reference
data detailing what the expected performance level should be.
To implement this, a simple JSON database is added via
sof-ll-timer-check-db.json that is used to query the reference
data. As this data is manually maintained, it is expected
reference data is only used for a small set of key use-cases for
any given platform.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i kv2019i requested a review from a team as a code owner September 14, 2022 10:23
is_ipc4_zephyr && {
data_file=$LOG_ROOT/mtrace.txt
test_reference_key="${platform}-${tplg_basename}-ipc4-zephyr-check-playback-${dev}"
TOPDIR="$(dirname "${BASH_SOURCE[0]}")"/..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move TOPDIR at the top of the file (and then use it in source ... lib.sh). Copy what's already been done in many other scripts in this directory.

[
{
"test-key": "tgl-sof-tgl-nocodec.tplg-ipc4-zephyr-check-playback-hw:0,0" ,
"ll-timer-avg": 2589
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ll-timer-avg": 2589
"expected-ll-timer-avg": 2589
Suggested change
"ll-timer-avg": 2589
"maximum-ll-timer-avg": 2589
Suggested change
"ll-timer-avg": 2589
"maximum-ll-timer-avg-before-error-margin": 2589

You get the idea.

@@ -0,0 +1,59 @@
#!/usr/bin/env python3

'''Script to analyze performance data in SOF FW log output'''
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


if overruns:
print("ERROR: %s overruns detected" % overruns, file=sys.stderr)
sys.exit(-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sys.exit(-1)
sys.exit("ERROR: %s overruns detected" % overruns")

prints the message on stderr and returns 1.

print("Measurements:\t\t%d" % len(avg_vals))
print("Median avg reported:\t%d" % median_avg_vals)
print("Median max reported:\t%d" % median(max_vals))
print("Highest max reported:\t%d" % max(max_vals))
Copy link
Collaborator

@marc-hb marc-hb Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import logging
# Can be invoked only once and only before any actual logging
logging.basicConfig(level=logging.DEBUG)

...

logging.info(""Measurements:\t\t%d" % len(avg_vals)")

and that's it, nothing else needed. Fancy stuff can be more easily added later if/when needed. https://docs.python.org/3/howto/logging.html

median_avg_ref = None
dbfile = open(sys.argv[3])
ref_key = sys.argv[2]
ref_data_all = json.load(dbfile)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered a .init file?
https://docs.python.org/3/library/configparser.html

It's less powerful and less flexible but more user-friendly and more programmer friendly.

Long story short: do you expect a hierarchy of values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be considered, yeah. I frankly don't yet know whether even whole concept is pragmatic/feasible, so I just wanted to get some minimum-working-set and json was good for this.

break

if not median_avg_ref:
print("No reference data for key '%s', unable to check performance against reference" % ref_key)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would probably need that code with a simpler .ini file, it would likely just fail with a meaning full stack trace and error message.

# allowed error margin before error is raised for
# lower observed performance (1.05 -> 5% difference
# required to raise error)
AVG_ERROR_MARGIN = 1.05
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels overkill / YAGNI... why not just bump the thresholds 5%? There are only two of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb I added this on purpose. Basicly anyone setting baseline values is going to run a test and copy values over to the database. Probably we add a option to the test script it self that if you pass "--reference-capture", script will run the test, measure performance, but instead of comparing against expected value, just write the measured values out in jsof/init format, so they can be incorporated in the database (after of course reviewed that values are a sane new baseline).

So to make above possible, it's better the margin is stored elsewhere.

@@ -0,0 +1,59 @@
#!/usr/bin/env python3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of what this script does is parsing a log file but there's neither "parsing" nor "log" in the name. This is a test code so I don't think this is an "implementation detail" that should be hidden.

data_file=$LOG_ROOT/mtrace.txt
test_reference_key="${platform}-${tplg_basename}-ipc4-zephyr-check-playback-${dev}"
TOPDIR="$(dirname "${BASH_SOURCE[0]}")"/..
$TOPDIR/tools/sof-ll-timer-check.py ${data_file} $test_reference_key $TOPDIR/tools/sof-ll-timer-check-db.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's just in check-playback.sh but I expect this to be found in multiple tests in the future. So please some new check_firmware_load() (pick the name) shell function / indirection that can be quickly deployed across many tests, quickly turned on/off test-suite wide, easily switched to a less crude method that does not parse the logs, etc.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 14, 2022

Have you considered a .init file?
https://docs.python.org/3/library/configparser.html
It's less powerful and less flexible but more user-friendly and more programmer friendly.
Long story short: do you expect a hierarchy of values?

Answering myself: probably yes more complex JSON needed considering the huge range of hardware and software configurations, workloads etc.

Which brings the next question: should these values be hardcoded in sof-test commits? Most likely not because so far we've successfully kept sof-test and sof mostly independent, in concrete terms most sof-test versions can test most sof versions and configurations. Otherwise how could you bisect anything? Including performance regressions!

So SOF performance thresholds should really not be hardcoded in sof-test, they should come from elsewhere. Not sure where unfortunately, in CI maybe?

Of course have small JSON examples in sof-test is fine but they should not be used in CI by default.

tplg_basename=$(basename $tplg)
platform=$(sof-dump-status.py -p)
is_ipc4_zephyr && {
data_file=$LOG_ROOT/mtrace.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK this new test is merely scanning logs so it should be hardcoded to neither IPC4 nor Zephyr nor mtrace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb Well that's complicated. The log output structure is specific to Zephyr (it uses formatting of the Zephyr logging subsystem). IPC4 should not be a dependency, but we do need separate code to handle sof-logger output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One big TODO that needs to be addresses is limiting the analysis to also cover logs that happened during the test run. Here full mtrace.txt is analyzed, but it is not guaranteed the DSP went to D3 between the test iterations. So some kind of capture of the timestamp of last line of log line before test starts, and pruning the log to only cover the new entries added during test execution, is needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems complicated...

  1. Can the analysis be ran last, after the log contributors have finished?
  2. Can the analysis be ran as a "follow," so it gathers updates automatically?
  3. Can the logs be backed up and purged after the initial run?

It's possible 3 may not work, as it may still present the same challenges.

@kv2019i
Copy link
Contributor Author

kv2019i commented Sep 16, 2022

@marc-hb wrote:

So SOF performance thresholds should really not be hardcoded in sof-test, they should come from elsewhere. Not sure where unfortunately, in CI maybe?

That's a very good point and answer is probably somewhere else (wherever the CI running sof-test keeps track of the device configurations).

OTOH, considering this probably needs a PoC run in CI to get sense of usefulness of the approach, the initial database might just as well be in sof-test for simplicity. If proven useful and it becomes clear the database is going to be populated and maintained, the data needs to be split out. Oh well, probably this doesn't help much, so we could just as well put the database out from day 1.

@miRoox miRoox requested a review from btian1 September 22, 2022 08:45
@kv2019i
Copy link
Contributor Author

kv2019i commented Sep 22, 2022

Closing this, @btian1 will continue with a slightly modified approach.

@kv2019i kv2019i closed this Sep 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants