check-playback: add check for FW performance #959

kv2019i · 2022-09-14T10:23:34Z

NOTE: This PR is based on top of #956

check-playback: add check for FW performance

Add a tool to analyze the performance traces that are emitted
by the FW low-latency scheduler, and use this tool in the check
playback test case.

The tool requires output of FW logs and currently only supports
the mtrace output available in SOF Zephyr IPC4 builds. This can
be extended later for other output types.

The sof-ll-timer-check.py can do some simple global check like
observing occurences of low-latency scheduler overruns and raise
an error in this case.

For most of the analysis, sof-ll-timer-check.py needs reference
data detailing what the expected performance level should be.
To implement this, a simple JSON database is added via
sof-ll-timer-check-db.json that is used to query the reference
data. As this data is manually maintained, it is expected
reference data is only used for a small set of key use-cases for
any given platform.

cc:

Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

mtrace is a new logging tool. Tool to stream data from Linux SOF driver mtrace debugfs interface to standard output. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

This function can check if the running firmware is IPC4 Zephyr. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

The mtrace is only support IPC4 platforms and mtrace should run while tests are running ATM. If mtrace is not running, DSP will stop outputting logs when buffer is full. So we need to keep the mtrace running during the test, or you won't see the error log. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

Need to kill the mtrace-reader.py process after the test to prevent the conflicts between each case. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

Add a tool to analyze the performance traces that are emitted by the FW low-latency scheduler, and use this tool in the check playback test case. The tool requires output of FW logs and currently only supports the mtrace output available in SOF Zephyr IPC4 builds. This can be extended later for other output types. The sof-ll-timer-check.py can do some simple global check like observing occurences of low-latency scheduler overruns and raise an error in this case. For most of the analysis, sof-ll-timer-check.py needs reference data detailing what the expected performance level should be. To implement this, a simple JSON database is added via sof-ll-timer-check-db.json that is used to query the reference data. As this data is manually maintained, it is expected reference data is only used for a small set of key use-cases for any given platform. Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>

marc-hb · 2022-09-14T16:42:00Z

test-case/check-playback.sh

+		is_ipc4_zephyr && {
+		    data_file=$LOG_ROOT/mtrace.txt
+		    test_reference_key="${platform}-${tplg_basename}-ipc4-zephyr-check-playback-${dev}"
+		    TOPDIR="$(dirname "${BASH_SOURCE[0]}")"/..


Move TOPDIR at the top of the file (and then use it in source ... lib.sh). Copy what's already been done in many other scripts in this directory.

marc-hb · 2022-09-14T16:43:33Z

tools/sof-ll-timer-check-db.json

+[
+    {
+	"test-key": "tgl-sof-tgl-nocodec.tplg-ipc4-zephyr-check-playback-hw:0,0" ,
+	"ll-timer-avg":  2589


Suggested change

"ll-timer-avg": 2589

"expected-ll-timer-avg": 2589

Suggested change

"ll-timer-avg": 2589

"maximum-ll-timer-avg": 2589

Suggested change

"ll-timer-avg": 2589

"maximum-ll-timer-avg-before-error-margin": 2589

You get the idea.

marc-hb · 2022-09-14T16:45:16Z

tools/sof-ll-timer-check.py

@@ -0,0 +1,59 @@
+#!/usr/bin/env python3
+
+'''Script to analyze performance data in SOF FW log output'''


Always use a main function

clean-up: move all shell script code to a function and use a "main" #740

https://docs.python.org/3/library/__main__.html#idiomatic-usage

marc-hb · 2022-09-14T16:51:22Z

tools/sof-ll-timer-check.py

+
+if overruns:
+    print("ERROR: %s overruns detected" % overruns, file=sys.stderr)
+    sys.exit(-1)


Suggested change

sys.exit(-1)

sys.exit("ERROR: %s overruns detected" % overruns")

prints the message on stderr and returns 1.

marc-hb · 2022-09-14T16:53:47Z

tools/sof-ll-timer-check.py

+print("Measurements:\t\t%d" % len(avg_vals))
+print("Median avg reported:\t%d" % median_avg_vals)
+print("Median max reported:\t%d" % median(max_vals))
+print("Highest max reported:\t%d" % max(max_vals))


import logging # Can be invoked only once and only before any actual logging logging.basicConfig(level=logging.DEBUG) ... logging.info(""Measurements:\t\t%d" % len(avg_vals)")

and that's it, nothing else needed. Fancy stuff can be more easily added later if/when needed. https://docs.python.org/3/howto/logging.html

marc-hb · 2022-09-14T17:03:55Z

tools/sof-ll-timer-check.py

+median_avg_ref = None
+dbfile = open(sys.argv[3])
+ref_key = sys.argv[2]
+ref_data_all = json.load(dbfile)


Have you considered a .init file?
https://docs.python.org/3/library/configparser.html

It's less powerful and less flexible but more user-friendly and more programmer friendly.

Long story short: do you expect a hierarchy of values?

That should be considered, yeah. I frankly don't yet know whether even whole concept is pragmatic/feasible, so I just wanted to get some minimum-working-set and json was good for this.

marc-hb · 2022-09-14T17:05:12Z

tools/sof-ll-timer-check.py

+        break
+
+if not median_avg_ref:
+    print("No reference data for key '%s', unable to check performance against reference" % ref_key)


You would probably need that code with a simpler .ini file, it would likely just fail with a meaning full stack trace and error message.

marc-hb · 2022-09-14T17:06:17Z

tools/sof-ll-timer-check.py

+# allowed error margin before error is raised for
+# lower observed performance (1.05 -> 5% difference
+# required to raise error)
+AVG_ERROR_MARGIN = 1.05


This feels overkill / YAGNI... why not just bump the thresholds 5%? There are only two of them.

@marc-hb I added this on purpose. Basicly anyone setting baseline values is going to run a test and copy values over to the database. Probably we add a option to the test script it self that if you pass "--reference-capture", script will run the test, measure performance, but instead of comparing against expected value, just write the measured values out in jsof/init format, so they can be incorporated in the database (after of course reviewed that values are a sane new baseline).

So to make above possible, it's better the margin is stored elsewhere.

marc-hb · 2022-09-14T17:08:12Z

tools/sof-ll-timer-check.py

@@ -0,0 +1,59 @@
+#!/usr/bin/env python3


Most of what this script does is parsing a log file but there's neither "parsing" nor "log" in the name. This is a test code so I don't think this is an "implementation detail" that should be hidden.

marc-hb · 2022-09-14T17:11:09Z

test-case/check-playback.sh

+		    data_file=$LOG_ROOT/mtrace.txt
+		    test_reference_key="${platform}-${tplg_basename}-ipc4-zephyr-check-playback-${dev}"
+		    TOPDIR="$(dirname "${BASH_SOURCE[0]}")"/..
+		    $TOPDIR/tools/sof-ll-timer-check.py ${data_file} $test_reference_key $TOPDIR/tools/sof-ll-timer-check-db.json


Now it's just in check-playback.sh but I expect this to be found in multiple tests in the future. So please some new check_firmware_load() (pick the name) shell function / indirection that can be quickly deployed across many tests, quickly turned on/off test-suite wide, easily switched to a less crude method that does not parse the logs, etc.

marc-hb · 2022-09-14T17:19:51Z

Have you considered a .init file?
https://docs.python.org/3/library/configparser.html
It's less powerful and less flexible but more user-friendly and more programmer friendly.
Long story short: do you expect a hierarchy of values?

Answering myself: probably yes more complex JSON needed considering the huge range of hardware and software configurations, workloads etc.

Which brings the next question: should these values be hardcoded in sof-test commits? Most likely not because so far we've successfully kept sof-test and sof mostly independent, in concrete terms most sof-test versions can test most sof versions and configurations. Otherwise how could you bisect anything? Including performance regressions!

So SOF performance thresholds should really not be hardcoded in sof-test, they should come from elsewhere. Not sure where unfortunately, in CI maybe?

Of course have small JSON examples in sof-test is fine but they should not be used in CI by default.

marc-hb · 2022-09-15T22:36:18Z

test-case/check-playback.sh

+		tplg_basename=$(basename $tplg)
+		platform=$(sof-dump-status.py -p)
+		is_ipc4_zephyr && {
+		    data_file=$LOG_ROOT/mtrace.txt


AFAIK this new test is merely scanning logs so it should be hardcoded to neither IPC4 nor Zephyr nor mtrace.

@marc-hb Well that's complicated. The log output structure is specific to Zephyr (it uses formatting of the Zephyr logging subsystem). IPC4 should not be a dependency, but we do need separate code to handle sof-logger output.

One big TODO that needs to be addresses is limiting the analysis to also cover logs that happened during the test run. Here full mtrace.txt is analyzed, but it is not guaranteed the DSP went to D3 between the test iterations. So some kind of capture of the timestamp of last line of log line before test starts, and pruning the log to only cover the new entries added during test execution, is needed.

Seems complicated...

Can the analysis be ran last, after the log contributors have finished?

Can the analysis be ran as a "follow," so it gathers updates automatically?

Can the logs be backed up and purged after the initial run?

It's possible 3 may not work, as it may still present the same challenges.

kv2019i · 2022-09-16T16:23:03Z

@marc-hb wrote:

So SOF performance thresholds should really not be hardcoded in sof-test, they should come from elsewhere. Not sure where unfortunately, in CI maybe?

That's a very good point and answer is probably somewhere else (wherever the CI running sof-test keeps track of the device configurations).

OTOH, considering this probably needs a PoC run in CI to get sense of usefulness of the approach, the initial database might just as well be in sof-test for simplicity. If proven useful and it becomes clear the database is going to be populated and maintained, the data needs to be split out. Oh well, probably this doesn't help much, so we could just as well put the database out from day 1.

kv2019i · 2022-09-22T11:24:45Z

Closing this, @btian1 will continue with a slightly modified approach.

keqiaozhang and others added 6 commits September 14, 2022 09:27

lib.sh: move logger collection to a function

6aea288

Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

lib.sh: add a function for mtrace collection

c26ba47

mtrace is a new logging tool. Tool to stream data from Linux SOF driver mtrace debugfs interface to standard output. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

lib.sh: add a function to check the IPC4 Zephyr firmware

273822e

This function can check if the running firmware is IPC4 Zephyr. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

hijack.sh: kill the mtrace after test

2230534

Need to kill the mtrace-reader.py process after the test to prevent the conflicts between each case. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>

kv2019i requested a review from a team as a code owner September 14, 2022 10:23

marc-hb requested changes Sep 14, 2022

View reviewed changes

marc-hb reviewed Sep 15, 2022

View reviewed changes

miRoox requested a review from btian1 September 22, 2022 08:45

kv2019i closed this Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check-playback: add check for FW performance #959

check-playback: add check for FW performance #959

kv2019i commented Sep 14, 2022 •

edited by marc-hb

Loading

marc-hb Sep 14, 2022

marc-hb Sep 14, 2022

marc-hb Sep 14, 2022

marc-hb Sep 14, 2022

marc-hb Sep 14, 2022 •

edited

Loading

marc-hb Sep 14, 2022

kv2019i Sep 16, 2022

marc-hb Sep 14, 2022

marc-hb Sep 14, 2022

kv2019i Sep 16, 2022

marc-hb Sep 14, 2022

marc-hb Sep 14, 2022

marc-hb commented Sep 14, 2022 •

edited

Loading

marc-hb Sep 15, 2022

kv2019i Sep 16, 2022

kv2019i Sep 16, 2022

greg-intel Sep 16, 2022

kv2019i commented Sep 16, 2022

kv2019i commented Sep 22, 2022

	"ll-timer-avg": 2589
	"maximum-ll-timer-avg-before-error-margin": 2589

		@@ -0,0 +1,59 @@
		#!/usr/bin/env python3

		'''Script to analyze performance data in SOF FW log output'''

	sys.exit(-1)
	sys.exit("ERROR: %s overruns detected" % overruns")

check-playback: add check for FW performance #959

check-playback: add check for FW performance #959

Conversation

kv2019i commented Sep 14, 2022 • edited by marc-hb Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marc-hb Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marc-hb commented Sep 14, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kv2019i commented Sep 16, 2022

kv2019i commented Sep 22, 2022

kv2019i commented Sep 14, 2022 •

edited by marc-hb

Loading

marc-hb Sep 14, 2022 •

edited

Loading

marc-hb commented Sep 14, 2022 •

edited

Loading