-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
check-playback: add check for FW performance #959
check-playback: add check for FW performance #959
Conversation
Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
mtrace is a new logging tool. Tool to stream data from Linux SOF driver mtrace debugfs interface to standard output. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
This function can check if the running firmware is IPC4 Zephyr. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
The mtrace is only support IPC4 platforms and mtrace should run while tests are running ATM. If mtrace is not running, DSP will stop outputting logs when buffer is full. So we need to keep the mtrace running during the test, or you won't see the error log. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
Need to kill the mtrace-reader.py process after the test to prevent the conflicts between each case. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
Add a tool to analyze the performance traces that are emitted by the FW low-latency scheduler, and use this tool in the check playback test case. The tool requires output of FW logs and currently only supports the mtrace output available in SOF Zephyr IPC4 builds. This can be extended later for other output types. The sof-ll-timer-check.py can do some simple global check like observing occurences of low-latency scheduler overruns and raise an error in this case. For most of the analysis, sof-ll-timer-check.py needs reference data detailing what the expected performance level should be. To implement this, a simple JSON database is added via sof-ll-timer-check-db.json that is used to query the reference data. As this data is manually maintained, it is expected reference data is only used for a small set of key use-cases for any given platform. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
is_ipc4_zephyr && { | ||
data_file=$LOG_ROOT/mtrace.txt | ||
test_reference_key="${platform}-${tplg_basename}-ipc4-zephyr-check-playback-${dev}" | ||
TOPDIR="$(dirname "${BASH_SOURCE[0]}")"/.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move TOPDIR at the top of the file (and then use it in source ... lib.sh
). Copy what's already been done in many other scripts in this directory.
[ | ||
{ | ||
"test-key": "tgl-sof-tgl-nocodec.tplg-ipc4-zephyr-check-playback-hw:0,0" , | ||
"ll-timer-avg": 2589 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"ll-timer-avg": 2589 | |
"expected-ll-timer-avg": 2589 |
"ll-timer-avg": 2589 | |
"maximum-ll-timer-avg": 2589 |
"ll-timer-avg": 2589 | |
"maximum-ll-timer-avg-before-error-margin": 2589 |
You get the idea.
@@ -0,0 +1,59 @@ | |||
#!/usr/bin/env python3 | |||
|
|||
'''Script to analyze performance data in SOF FW log output''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
if overruns: | ||
print("ERROR: %s overruns detected" % overruns, file=sys.stderr) | ||
sys.exit(-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys.exit(-1) | |
sys.exit("ERROR: %s overruns detected" % overruns") |
prints the message on stderr
and returns 1.
print("Measurements:\t\t%d" % len(avg_vals)) | ||
print("Median avg reported:\t%d" % median_avg_vals) | ||
print("Median max reported:\t%d" % median(max_vals)) | ||
print("Highest max reported:\t%d" % max(max_vals)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import logging
# Can be invoked only once and only before any actual logging
logging.basicConfig(level=logging.DEBUG)
...
logging.info(""Measurements:\t\t%d" % len(avg_vals)")
and that's it, nothing else needed. Fancy stuff can be more easily added later if/when needed. https://docs.python.org/3/howto/logging.html
median_avg_ref = None | ||
dbfile = open(sys.argv[3]) | ||
ref_key = sys.argv[2] | ||
ref_data_all = json.load(dbfile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered a .init
file?
https://docs.python.org/3/library/configparser.html
It's less powerful and less flexible but more user-friendly and more programmer friendly.
Long story short: do you expect a hierarchy of values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be considered, yeah. I frankly don't yet know whether even whole concept is pragmatic/feasible, so I just wanted to get some minimum-working-set and json was good for this.
break | ||
|
||
if not median_avg_ref: | ||
print("No reference data for key '%s', unable to check performance against reference" % ref_key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would probably need that code with a simpler .ini file, it would likely just fail with a meaning full stack trace and error message.
# allowed error margin before error is raised for | ||
# lower observed performance (1.05 -> 5% difference | ||
# required to raise error) | ||
AVG_ERROR_MARGIN = 1.05 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels overkill / YAGNI... why not just bump the thresholds 5%? There are only two of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marc-hb I added this on purpose. Basicly anyone setting baseline values is going to run a test and copy values over to the database. Probably we add a option to the test script it self that if you pass "--reference-capture", script will run the test, measure performance, but instead of comparing against expected value, just write the measured values out in jsof/init format, so they can be incorporated in the database (after of course reviewed that values are a sane new baseline).
So to make above possible, it's better the margin is stored elsewhere.
@@ -0,0 +1,59 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of what this script does is parsing a log file but there's neither "parsing" nor "log" in the name. This is a test code so I don't think this is an "implementation detail" that should be hidden.
data_file=$LOG_ROOT/mtrace.txt | ||
test_reference_key="${platform}-${tplg_basename}-ipc4-zephyr-check-playback-${dev}" | ||
TOPDIR="$(dirname "${BASH_SOURCE[0]}")"/.. | ||
$TOPDIR/tools/sof-ll-timer-check.py ${data_file} $test_reference_key $TOPDIR/tools/sof-ll-timer-check-db.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it's just in check-playback.sh but I expect this to be found in multiple tests in the future. So please some new check_firmware_load()
(pick the name) shell function / indirection that can be quickly deployed across many tests, quickly turned on/off test-suite wide, easily switched to a less crude method that does not parse the logs, etc.
Answering myself: probably yes more complex JSON needed considering the huge range of hardware and software configurations, workloads etc. Which brings the next question: should these values be hardcoded in sof-test commits? Most likely not because so far we've successfully kept sof-test and sof mostly independent, in concrete terms most sof-test versions can test most sof versions and configurations. Otherwise how could you bisect anything? Including performance regressions! So SOF performance thresholds should really not be hardcoded in sof-test, they should come from elsewhere. Not sure where unfortunately, in CI maybe? Of course have small JSON examples in sof-test is fine but they should not be used in CI by default. |
tplg_basename=$(basename $tplg) | ||
platform=$(sof-dump-status.py -p) | ||
is_ipc4_zephyr && { | ||
data_file=$LOG_ROOT/mtrace.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK this new test is merely scanning logs so it should be hardcoded to neither IPC4 nor Zephyr nor mtrace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marc-hb Well that's complicated. The log output structure is specific to Zephyr (it uses formatting of the Zephyr logging subsystem). IPC4 should not be a dependency, but we do need separate code to handle sof-logger output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One big TODO that needs to be addresses is limiting the analysis to also cover logs that happened during the test run. Here full mtrace.txt is analyzed, but it is not guaranteed the DSP went to D3 between the test iterations. So some kind of capture of the timestamp of last line of log line before test starts, and pruning the log to only cover the new entries added during test execution, is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems complicated...
- Can the analysis be ran last, after the log contributors have finished?
- Can the analysis be ran as a "follow," so it gathers updates automatically?
- Can the logs be backed up and purged after the initial run?
It's possible 3 may not work, as it may still present the same challenges.
@marc-hb wrote:
That's a very good point and answer is probably somewhere else (wherever the CI running sof-test keeps track of the device configurations). OTOH, considering this probably needs a PoC run in CI to get sense of usefulness of the approach, the initial database might just as well be in sof-test for simplicity. If proven useful and it becomes clear the database is going to be populated and maintained, the data needs to be split out. Oh well, probably this doesn't help much, so we could just as well put the database out from day 1. |
Closing this, @btian1 will continue with a slightly modified approach. |
NOTE: This PR is based on top of #956
check-playback: add check for FW performance
Add a tool to analyze the performance traces that are emitted
by the FW low-latency scheduler, and use this tool in the check
playback test case.
The tool requires output of FW logs and currently only supports
the mtrace output available in SOF Zephyr IPC4 builds. This can
be extended later for other output types.
The sof-ll-timer-check.py can do some simple global check like
observing occurences of low-latency scheduler overruns and raise
an error in this case.
For most of the analysis, sof-ll-timer-check.py needs reference
data detailing what the expected performance level should be.
To implement this, a simple JSON database is added via
sof-ll-timer-check-db.json that is used to query the reference
data. As this data is manually maintained, it is expected
reference data is only used for a small set of key use-cases for
any given platform.
cc: