Skip to content

Commit

Permalink
improve arm_correlate_q7 for CM0 (#178)
Browse files Browse the repository at this point in the history
Compilers GCC/CLANG unable to detect code similarities and merge __SSATs.
Let's help them emitting better code.

Co-authored-by: Christophe Favergeon <48906714+christophe0606@users.noreply.github.com>
  • Loading branch information
SiarheiVolkau and christophe0606 authored Jun 24, 2024
1 parent a9c26d6 commit fd088ac
Showing 1 changed file with 6 additions and 8 deletions.
14 changes: 6 additions & 8 deletions Source/FilteringFunctions/arm_correlate_q7.c
Original file line number Diff line number Diff line change
Expand Up @@ -921,15 +921,15 @@ void arm_correlate_q7(
const q7_t *pIn2 = pSrcB + (srcBLen - 1U); /* InputB pointer */
q31_t sum; /* Accumulator */
uint32_t i = 0U, j; /* Loop counters */
uint32_t inv = 0U; /* Reverse order flag */
int32_t inc = 1; /* Destination address modifier */
uint32_t tot = 0U; /* Length */

/* The algorithm implementation is based on the lengths of the inputs. */
/* srcB is always made to slide across srcA. */
/* So srcBLen is always considered as shorter or equal to srcALen */
/* But CORR(x, y) is reverse of CORR(y, x) */
/* So, when srcBLen > srcALen, output pointer is made to point to the end of the output buffer */
/* and a varaible, inv is set to 1 */
/* and a varaible, inc is set to -1 */
/* If lengths are not equal then zero pad has to be done to make the two
* inputs of same length. But to improve the performance, we include zeroes
* in the output instead of zero padding either of the the inputs*/
Expand Down Expand Up @@ -968,8 +968,8 @@ void arm_correlate_q7(
srcALen = srcBLen;
srcBLen = j;

/* Setting the reverse flag */
inv = 1;
/* Filling destination in reverse order */
inc = -1;
}

/* Loop to calculate convolution for output length number of times */
Expand All @@ -990,10 +990,8 @@ void arm_correlate_q7(
}

/* Store the output in the destination buffer */
if (inv == 1)
*pDst-- = (q7_t) __SSAT((sum >> 7U), 8U);
else
*pDst++ = (q7_t) __SSAT((sum >> 7U), 8U);
*pDst = (q7_t) __SSAT((sum >> 7U), 8U);
pDst += inc;
}

#endif /* #if !defined(ARM_MATH_CM0_FAMILY) */
Expand Down

0 comments on commit fd088ac

Please sign in to comment.