DRC: math: Replace exponential function for performance #8435

ShriramShastry · 2023-11-03T05:54:36Z

For DRC performance, replace exp_small_fixed() with sofm_exp_int32().
Included supporting change to include sofm_exp_int32() within exp_fixed()
and repositioned exp_small_fixed() for future use.

ShriramShastry · 2023-11-03T14:04:16Z

For compilation, this PR requires further fixes.

singalsu · 2023-11-03T14:21:33Z

In my test this patch dropped drc.2.1 average MCPS from 140 to 96 in topology sof-hda-efx-generic-4ch.tplg. Excellent!

The used topology initializes DRC to passthrough, so I applied another with "sof-ctl -n 36 -s threshold_-35_knee_27_ratio_8.txt". The blob that I used is not in SOF git yet but I'll attach it here:

threshold_-35_knee_27_ratio_8.txt

src/audio/Kconfig

src/math/decibels.c

singalsu · 2023-11-03T14:56:27Z

src/math/exp_fcn.c

+ * The input is Q3.29
+ * The output is Q9.23
+ */
+int32_t exp_small_fixed(int32_t x)


I don't think you need this function here since you are using sofm_exp_int32().

src/audio/Kconfig

btian1 · 2023-11-06T03:14:35Z

src/math/decibels.c

 	 */
-	y0 = Q_SHIFT_RND(exp_small_fixed(Q_SHIFT_LEFT(xs, 27, 29)), 23, 20);
+	y0 = Q_SHIFT_RND(sofm_exp_int32(Q_SHIFT_LEFT(xs, 27, 28)), 23, 20);


line 78 - 81 still can be optimized further with instrinsic code.

this file didn't include xtensa header file, do you mean add hifi implementation of functions in this file?

yes, each for loop deserve an intrinsic implementation.

Please check if it improves speed. In some cases the C multiply has been as fast.

I'll give { Q_CONVERT_FLOAT ~, { Q_SHIFT_RND ~, { Q_SHIFT_LEFT ~, and Q_MULTSR_32X32 } a shot as well. If I don't succeed, I'll try again with the next PR.

The macro cycle count performance across Generic C ( Q_ *) and HiFi intrinsic ( XT_ *) using the identical input data samples is shown in the table below.

Note: The math is being calculated accurately by the HiFi function for CONVERT_FLOAT.

I have completed the implementation of Q_SHIFT_RND, Q_SHIFT_LEFT, and Q_MULTSR_32X32, and Q_CONVERT_FLOAT.

I'll correct the last few CI errors.

I've finished addressing every CI error.

@ShriramShastry , still code style error.

lgirdwood · 2023-11-10T17:16:41Z

@singalsu @ShriramShastry @andrula-song I think we need to have some options where we can decide at build time via Kconfig between speed and accuracy for certain maths ops. i.e. for 16bit output we may not need full 24/32 computations and so on. There may also be usages where speed is more important than bit accuracy, but thats up to you guys to identify and target.

singalsu · 2023-11-10T17:31:37Z

@singalsu @ShriramShastry @andrula-song I think we need to have some options where we can decide at build time via Kconfig between speed and accuracy for certain maths ops. i.e. for 16bit output we may not need full 24/32 computations and so on. There may also be usages where speed is more important than bit accuracy, but thats up to you guys to identify and target.

Yep, I wanted to avoid Sriram to change the old exponent function code since it is used by many other features. The new version for DRC is significantly faster but trades off slightly accuracy. We need to check case by case with exponent function to utilize.

singalsu · 2023-11-16T10:19:14Z

@ShriramShastry Any updates to this? The MCPS saving is large so we should get this cleaned up and merged.

lgirdwood

LGTM. @andrula-song pls review.

lgirdwood · 2023-11-24T17:03:08Z

@ShriramShastry some conflicts, pls rebase as CI wont complete.

btian1 · 2023-12-11T03:11:09Z

src/math/decibels.c

 	 */
-	y0 = Q_SHIFT_RND(exp_small_fixed(Q_SHIFT_LEFT(xs, 27, 29)), 23, 20);
+	y0 = Q_SHIFT_RND(sofm_exp_int32(Q_SHIFT_LEFT(xs, 27, 28)), 23, 20);


@ShriramShastry , still code style error.

btian1 · 2023-12-11T03:12:16Z

src/math/decibels.c

@@ -105,10 +106,10 @@ int32_t exp_fixed(int32_t x)
 		n++;
 	}

-	/* exp_small_fixed() input is Q3.29, while x1 is Q5.27
-	 * exp_small_fixed() output is Q9.23, while y0 is Q12.20
+	/* sofm_exp_int32() input is Q4.28, while x1 is Q5.27


for comments, you mentioned input is Q4.28, however, what is x1? means sofm_exp_int32() output?
if means output, please change x1 to output.

btian1 · 2023-12-11T03:18:33Z

src/math/exp_fcn.c

+ * Output is Q12.20, 0.0 .. +2048.0
+ */
+
+int32_t exp_fixed(int32_t x)


your patch sequence are quite strange, normally, we should have c version first, then hifi version, seems you already have exp_fixed hifi in previous patch, now this patch comes with c version, this is strange sequence.

The critical and hard requirement is that every git commit compiles and passes all the tests. CI does unfortunately not check this (for various, complicated reasons).

@btian1 if you think some git commit does not compile and pass the tests then please "Request Changes" and block this PR.

your patch sequence are quite strange,

every git commit compiles and passes all the tests. CI does unfortunately not check this

The simplest way to solve all these problems and more is to not submit all your commits at the same time:

https://docs.zephyrproject.org/latest/contribute/contributor_expectations.html#defining-smaller-prs

btian1 · 2023-12-11T03:20:59Z

src/math/exp_fcn_hifi.c

@@ -302,10 +367,10 @@ int32_t exp_fixed(int32_t x)
 	int i;
 	int n = 0;

-	if (x < Q_CONVERT_FLOAT(-11.5, 27))
+	if (x < exp_hifi_q_convert_float(-11.5, 27))


this is not build with pre-compiler stage? could you show the asm code to compare?

Sorry, I don't get the necessity in this situation; could you please explain?

I mean why not directly use:Q_CONVERT_FLOAT(-11.5, 27)?

btian1 · 2023-12-11T03:21:24Z

src/math/exp_fcn_hifi.c

+	return xt_o;
+}
+
+#define ONE_Q20         exp_hifi_q_convert_float(1.0, 20)	  /* Use Q12.20 */


these are always repeating definition, please move to one common header.

btian1 · 2023-12-11T03:22:36Z

src/math/exp_fcn_hifi.c

 	y = ONE_Q20;
 	for (i = 0; i < (1 << n); i++)
-		y = (int32_t)Q_MULTSR_32X32((int64_t)y, y0, 20, 20, 20);
+		y = (int32_t)exp_hifi_q_multsr_32x32((int64_t)y, y0, 20, 20, 20);


this patch is not related with title(exp), you can't use one PR to cover all the changes, please move out this patch from this PR.

btian1 · 2023-12-11T03:23:46Z

src/math/window.c

@@ -114,7 +115,7 @@ void win_povey_16b(int16_t win[], int length)
 		/* Calculate x^0.85 as exp(0.85 * log(x)) */
 		x2 = (int32_t)(ln_int32((uint32_t)x1) >> 1) - WIN_LOG_2POW31_Q26;
 		x3 = sat_int32(Q_MULTSR_32X32((int64_t)x2, WIN_085_Q31, 26, 31, 27)); /* Q5.27 */
-		x4 = exp_fixed(x3); /* Q5.27 -> Q12.20 */
+		x4 = sofm_exp_fixed(x3); /* Q5.27 -> Q12.20 */


somehow, you can rename from the beginning or start from first patch, then this patch will be removed.

zephyr/CMakeLists.txt

src/audio/module_adapter/module/generic.c

src/math/decibels.c

singalsu · 2023-12-11T15:03:03Z

@ShriramShastry I added your signed-off-by into #8605. Better to separate the fix from this large PR to get the common testbench build issue fixed.

src/math/exp_fcn_hifi.c

singalsu · 2023-12-13T09:22:36Z

Seems it's not ensured by C standards that literals floating point macros are calculated in the C pre-processor. Maybe things have changed with xt-clang.

https://stackoverflow.com/questions/21241031/does-the-c-preprocessor-handle-floating-point-math-constants

It would be safest to change macros like #define SOME_COEF Q_CONVERT_FLOAT(-11.5, 27) into #define SOME_COEF -1543503872 /* -11.5 Q5.27 */.

singalsu · 2023-12-13T10:23:19Z

Seems it's not ensured by C standards that literals floating point macros are calculated in the C pre-processor. Maybe things have changed with xt-clang.

https://stackoverflow.com/questions/21241031/does-the-c-preprocessor-handle-floating-point-math-constants

It would be safest to change macros like #define SOME_COEF Q_CONVERT_FLOAT(-11.5, 27) into #define SOME_COEF -1543503872 /* -11.5 Q5.27 */.

@ShriramShastry Please check disassembly of your Q_CONVERT_FLOAT() usage. We looked into TGL and MTL builds with @btian1 and we could not find floating point code generation from current usages of the macro. No, float overhead found. So, let's not yet start to replace these before understanding better as I worried in my previous comment.

btian1 · 2023-12-14T01:58:49Z

test/cmocka/src/math/window/CMakeLists.txt

@@ -8,4 +8,5 @@ cmocka_test(window
 	${PROJECT_SOURCE_DIR}/src/math/base2log.c
 	${PROJECT_SOURCE_DIR}/src/math/decibels.c
 	${PROJECT_SOURCE_DIR}/src/math/exp_fcn.c
+	${PROJECT_SOURCE_DIR}/src/math/exp_fcn_hifi.c


please squash this patch with previous one, as Marc said, you need get each patch pass all CI separately, here, there is obvious CI error.

The macros are moved to header file. There are no functional changes. Signed-off-by: shastry <malladi.sastry@intel.com>

Unused variables from HiFi4/5 were reshuffled and placed in order to use HiFi3 code. If the variable 'ret' is used uninitialized whenever the 'if' condition is false, set it to false. Signed-off-by: shastry <malladi.sastry@intel.com>

This change allows the fast exponent library to replace the decibels library for applications like DRC where exponent function is used in hot code parts. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com> Signed-off-by: shastry <malladi.sastry@intel.com>

In Zephyr CMakeLists, add exponential source files to facilitate the compilation of math C and HiFi code. Signed-off-by: shastry <malladi.sastry@intel.com>

The exp_fixed() function is replaced by fast sofm_exp_fixed() and sofm_db2lin() functions. It saves 40 MCPS, from 123 to 83 MCPS in a test run in TGL platform. Signed-off-by: shastry <malladi.sastry@intel.com>

singalsu · 2024-01-08T11:25:10Z

src/include/sof/math/exp_fcn.h

@@ -26,6 +26,38 @@

 #endif

-int32_t sofm_exp_int32(int32_t x);
+/* TODO: Is there a MCPS difference */
+#define USING_QCONVERT	1


I already approved, but let's check in real devices if setting this to zero speeds up the code. In theory it shouldn't impact since Q_CONVERT_FLOAT macro should be evaluated in C pre-processor. Once confirmed we can remove the direct integers. Or if difference seen, remove the Q_CONVERT_FLOAT part.

ok, can we do this as next steps in a follow up PR and test drive this now.

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch from afb34d5 to fb9cd9c Compare November 3, 2023 13:58

singalsu requested a review from andrula-song November 3, 2023 14:14

singalsu requested changes Nov 3, 2023

View reviewed changes

btian1 reviewed Nov 6, 2023

View reviewed changes

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch 4 times, most recently from 78515d8 to 61ea083 Compare November 24, 2023 06:53

lgirdwood approved these changes Nov 24, 2023

View reviewed changes

lgirdwood added this to the v2.9 milestone Nov 24, 2023

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch from 61ea083 to 06bc952 Compare November 24, 2023 17:00

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch from 06bc952 to 292e99f Compare November 27, 2023 05:31

ShriramShastry marked this pull request as ready for review November 27, 2023 05:44

ShriramShastry requested review from a team, plbossart, mmaka1, lbetlej, dbaluta and kv2019i as code owners November 27, 2023 05:44

ShriramShastry requested review from singalsu and btian1 November 27, 2023 05:45

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch 3 times, most recently from 3601def to 91f8cf5 Compare November 27, 2023 15:25

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch 2 times, most recently from a3519e1 to e1c8418 Compare December 7, 2023 11:44

ShriramShastry requested a review from marc-hb as a code owner December 7, 2023 11:44

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch 5 times, most recently from 9fa354a to 69ebd9a Compare December 8, 2023 05:37

ShriramShastry requested review from btian1, lgirdwood and andrula-song December 8, 2023 06:04

btian1 reviewed Dec 11, 2023

View reviewed changes

marc-hb reviewed Dec 11, 2023

View reviewed changes

zephyr/CMakeLists.txt Show resolved Hide resolved

singalsu requested changes Dec 11, 2023

View reviewed changes

src/audio/module_adapter/module/generic.c Outdated Show resolved Hide resolved

src/math/decibels.c Outdated Show resolved Hide resolved

singalsu reviewed Dec 11, 2023

View reviewed changes

src/math/exp_fcn_hifi.c Outdated Show resolved Hide resolved

singalsu reviewed Dec 13, 2023

View reviewed changes

src/math/exp_fcn_hifi.c Outdated Show resolved Hide resolved

src/math/exp_fcn_hifi.c Show resolved Hide resolved

src/math/exp_fcn_hifi.c Show resolved Hide resolved

btian1 reviewed Dec 14, 2023

View reviewed changes

ShriramShastry and others added 5 commits January 8, 2024 12:40

Math: Exp: Rename and move common macros for generic and HiFi

3ba94c6

The macros are moved to header file. There are no functional changes. Signed-off-by: shastry <malladi.sastry@intel.com>

Zephyr: Patch Zephyr CMakeLists with exponential source files

7a8430c

In Zephyr CMakeLists, add exponential source files to facilitate the compilation of math C and HiFi code. Signed-off-by: shastry <malladi.sastry@intel.com>

Audio: DRC: Use fast exponent functions

0589da0

The exp_fixed() function is replaced by fast sofm_exp_fixed() and sofm_db2lin() functions. It saves 40 MCPS, from 123 to 83 MCPS in a test run in TGL platform. Signed-off-by: shastry <malladi.sastry@intel.com>

ShriramShastry force-pushed the DRC_math_exp_optimization_dev branch from a4aa65c to 0589da0 Compare January 8, 2024 07:22

singalsu approved these changes Jan 8, 2024

View reviewed changes

singalsu reviewed Jan 8, 2024

View reviewed changes

lgirdwood approved these changes Jan 8, 2024

View reviewed changes

lgirdwood merged commit 3ed4ddd into thesofproject:main Jan 8, 2024
41 of 44 checks passed

DRC: math: Replace exponential function for performance #8435

DRC: math: Replace exponential function for performance #8435

Conversation

ShriramShastry commented Nov 3, 2023 • edited Loading

ShriramShastry commented Nov 3, 2023

singalsu commented Nov 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShriramShastry Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

ShriramShastry Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgirdwood commented Nov 10, 2023 • edited Loading

singalsu commented Nov 10, 2023

singalsu commented Nov 16, 2023

lgirdwood left a comment

Choose a reason for hiding this comment

lgirdwood commented Nov 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

singalsu commented Dec 11, 2023

singalsu commented Dec 13, 2023

singalsu commented Dec 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShriramShastry commented Nov 3, 2023 •

edited

Loading

ShriramShastry Nov 24, 2023 •

edited

Loading

ShriramShastry Nov 24, 2023 •

edited

Loading

lgirdwood commented Nov 10, 2023 •

edited

Loading