-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRC: math: Replace exponential function for performance #8435
DRC: math: Replace exponential function for performance #8435
Conversation
afb34d5
to
fb9cd9c
Compare
For compilation, this PR requires further fixes. |
In my test this patch dropped drc.2.1 average MCPS from 140 to 96 in topology sof-hda-efx-generic-4ch.tplg. Excellent! The used topology initializes DRC to passthrough, so I applied another with "sof-ctl -n 36 -s threshold_-35_knee_27_ratio_8.txt". The blob that I used is not in SOF git yet but I'll attach it here: |
src/math/exp_fcn.c
Outdated
* The input is Q3.29 | ||
* The output is Q9.23 | ||
*/ | ||
int32_t exp_small_fixed(int32_t x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this function here since you are using sofm_exp_int32().
src/math/decibels.c
Outdated
*/ | ||
y0 = Q_SHIFT_RND(exp_small_fixed(Q_SHIFT_LEFT(xs, 27, 29)), 23, 20); | ||
y0 = Q_SHIFT_RND(sofm_exp_int32(Q_SHIFT_LEFT(xs, 27, 28)), 23, 20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line 78 - 81 still can be optimized further with instrinsic code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file didn't include xtensa header file, do you mean add hifi implementation of functions in this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, each for loop deserve an intrinsic implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check if it improves speed. In some cases the C multiply has been as fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give { Q_CONVERT_FLOAT
~, { Q_SHIFT_RND
~, { Q_SHIFT_LEFT
~, and Q_MULTSR_32X32
} a shot as well. If I don't succeed, I'll try again with the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have completed the implementation of Q_SHIFT_RND, Q_SHIFT_LEFT, and Q_MULTSR_32X32, and Q_CONVERT_FLOAT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll correct the last few CI errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've finished addressing every CI error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShriramShastry , still code style error.
@singalsu @ShriramShastry @andrula-song I think we need to have some options where we can decide at build time via Kconfig between speed and accuracy for certain maths ops. i.e. for 16bit output we may not need full 24/32 computations and so on. There may also be usages where speed is more important than bit accuracy, but thats up to you guys to identify and target. |
Yep, I wanted to avoid Sriram to change the old exponent function code since it is used by many other features. The new version for DRC is significantly faster but trades off slightly accuracy. We need to check case by case with exponent function to utilize. |
@ShriramShastry Any updates to this? The MCPS saving is large so we should get this cleaned up and merged. |
78515d8
to
61ea083
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @andrula-song pls review.
61ea083
to
06bc952
Compare
@ShriramShastry some conflicts, pls rebase as CI wont complete. |
06bc952
to
292e99f
Compare
3601def
to
91f8cf5
Compare
a3519e1
to
e1c8418
Compare
9fa354a
to
69ebd9a
Compare
src/math/decibels.c
Outdated
*/ | ||
y0 = Q_SHIFT_RND(exp_small_fixed(Q_SHIFT_LEFT(xs, 27, 29)), 23, 20); | ||
y0 = Q_SHIFT_RND(sofm_exp_int32(Q_SHIFT_LEFT(xs, 27, 28)), 23, 20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShriramShastry , still code style error.
src/math/decibels.c
Outdated
@@ -105,10 +106,10 @@ int32_t exp_fixed(int32_t x) | |||
n++; | |||
} | |||
|
|||
/* exp_small_fixed() input is Q3.29, while x1 is Q5.27 | |||
* exp_small_fixed() output is Q9.23, while y0 is Q12.20 | |||
/* sofm_exp_int32() input is Q4.28, while x1 is Q5.27 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for comments, you mentioned input is Q4.28, however, what is x1? means sofm_exp_int32() output?
if means output, please change x1 to output.
src/math/exp_fcn.c
Outdated
* Output is Q12.20, 0.0 .. +2048.0 | ||
*/ | ||
|
||
int32_t exp_fixed(int32_t x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your patch sequence are quite strange, normally, we should have c version first, then hifi version, seems you already have exp_fixed hifi in previous patch, now this patch comes with c version, this is strange sequence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The critical and hard requirement is that every git commit compiles and passes all the tests. CI does unfortunately not check this (for various, complicated reasons).
@btian1 if you think some git commit does not compile and pass the tests then please "Request Changes" and block this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your patch sequence are quite strange,
every git commit compiles and passes all the tests. CI does unfortunately not check this
The simplest way to solve all these problems and more is to not submit all your commits at the same time:
https://docs.zephyrproject.org/latest/contribute/contributor_expectations.html#defining-smaller-prs
src/math/exp_fcn_hifi.c
Outdated
@@ -302,10 +367,10 @@ int32_t exp_fixed(int32_t x) | |||
int i; | |||
int n = 0; | |||
|
|||
if (x < Q_CONVERT_FLOAT(-11.5, 27)) | |||
if (x < exp_hifi_q_convert_float(-11.5, 27)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not build with pre-compiler stage? could you show the asm code to compare?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't get the necessity in this situation; could you please explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean why not directly use:Q_CONVERT_FLOAT(-11.5, 27)?
src/math/exp_fcn_hifi.c
Outdated
return xt_o; | ||
} | ||
|
||
#define ONE_Q20 exp_hifi_q_convert_float(1.0, 20) /* Use Q12.20 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are always repeating definition, please move to one common header.
y = ONE_Q20; | ||
for (i = 0; i < (1 << n); i++) | ||
y = (int32_t)Q_MULTSR_32X32((int64_t)y, y0, 20, 20, 20); | ||
y = (int32_t)exp_hifi_q_multsr_32x32((int64_t)y, y0, 20, 20, 20); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this patch is not related with title(exp), you can't use one PR to cover all the changes, please move out this patch from this PR.
src/math/window.c
Outdated
@@ -114,7 +115,7 @@ void win_povey_16b(int16_t win[], int length) | |||
/* Calculate x^0.85 as exp(0.85 * log(x)) */ | |||
x2 = (int32_t)(ln_int32((uint32_t)x1) >> 1) - WIN_LOG_2POW31_Q26; | |||
x3 = sat_int32(Q_MULTSR_32X32((int64_t)x2, WIN_085_Q31, 26, 31, 27)); /* Q5.27 */ | |||
x4 = exp_fixed(x3); /* Q5.27 -> Q12.20 */ | |||
x4 = sofm_exp_fixed(x3); /* Q5.27 -> Q12.20 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow, you can rename from the beginning or start from first patch, then this patch will be removed.
@ShriramShastry I added your signed-off-by into #8605. Better to separate the fix from this large PR to get the common testbench build issue fixed. |
Seems it's not ensured by C standards that literals floating point macros are calculated in the C pre-processor. Maybe things have changed with xt-clang. It would be safest to change macros like |
@ShriramShastry Please check disassembly of your Q_CONVERT_FLOAT() usage. We looked into TGL and MTL builds with @btian1 and we could not find floating point code generation from current usages of the macro. No, float overhead found. So, let's not yet start to replace these before understanding better as I worried in my previous comment. |
@@ -8,4 +8,5 @@ cmocka_test(window | |||
${PROJECT_SOURCE_DIR}/src/math/base2log.c | |||
${PROJECT_SOURCE_DIR}/src/math/decibels.c | |||
${PROJECT_SOURCE_DIR}/src/math/exp_fcn.c | |||
${PROJECT_SOURCE_DIR}/src/math/exp_fcn_hifi.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please squash this patch with previous one, as Marc said, you need get each patch pass all CI separately, here, there is obvious CI error.
The macros are moved to header file. There are no functional changes. Signed-off-by: shastry <malladi.sastry@intel.com>
Unused variables from HiFi4/5 were reshuffled and placed in order to use HiFi3 code. If the variable 'ret' is used uninitialized whenever the 'if' condition is false, set it to false. Signed-off-by: shastry <malladi.sastry@intel.com>
This change allows the fast exponent library to replace the decibels library for applications like DRC where exponent function is used in hot code parts. Signed-off-by: Seppo Ingalsuo <seppo.ingalsuo@linux.intel.com> Signed-off-by: shastry <malladi.sastry@intel.com>
In Zephyr CMakeLists, add exponential source files to facilitate the compilation of math C and HiFi code. Signed-off-by: shastry <malladi.sastry@intel.com>
The exp_fixed() function is replaced by fast sofm_exp_fixed() and sofm_db2lin() functions. It saves 40 MCPS, from 123 to 83 MCPS in a test run in TGL platform. Signed-off-by: shastry <malladi.sastry@intel.com>
a4aa65c
to
0589da0
Compare
@@ -26,6 +26,38 @@ | |||
|
|||
#endif | |||
|
|||
int32_t sofm_exp_int32(int32_t x); | |||
/* TODO: Is there a MCPS difference */ | |||
#define USING_QCONVERT 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already approved, but let's check in real devices if setting this to zero speeds up the code. In theory it shouldn't impact since Q_CONVERT_FLOAT macro should be evaluated in C pre-processor. Once confirmed we can remove the direct integers. Or if difference seen, remove the Q_CONVERT_FLOAT part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can we do this as next steps in a follow up PR and test drive this now.
For DRC performance, replace
exp_small_fixed()
withsofm_exp_int32()
.Included supporting change to include
sofm_exp_int32()
withinexp_fixed()
and repositioned
exp_small_fixed()
for future use.