Google AEC rework: dynamic formats, legacy platform support #8919

andyross · 2024-03-07T17:32:10Z

This is almost exactly the code as it stands in PR #8571, but rebased on top of current main (which descends from mtl-008-drop-stable) and with as many individual changes split out as I could. Basically this unbreaks AEC for IPC3 targets (mt8195 in particular) with a unified implementation, supports dynamic stream sample formats, improves performance and code size significantly, and removes a few hundred lines of duplicated/unparametrized code. Tested on mt8195 and MTL, will pull out a ADL board and try that as soon as I get a chance.

See individual patch commit messages for more details.

lgirdwood

@singalsu can you review.
Can someone from @thesofproject/google also review. Thanks.

lgirdwood · 2024-03-08T13:54:03Z

src/audio/google/google_rtc_audio_processing.c

+ * are single-core component and know we can safely use the cache for
+ * AEC work.  XTOS SOF is cached by default, so stub the Zephyr API.
+ */
+#define arch_xtensa_cached_ptr(p) (p)


I think this should be automatically picked up in headers when building form non xtos targets. I know we removed for Zephyr target, but for xtos it should be around.

Actually this is the Zephyr API, not the old SOF one. I'm translating in the opposite direction (basically writing Zephyr code in the module, with fakes for legacy Mediatek builds).

@lgirdwood we will be porting this to platforms still using XTOS so it is needed

lgirdwood · 2024-03-08T13:55:55Z

src/audio/google/google_rtc_audio_processing.c

@@ -399,9 +399,6 @@ static int google_rtc_audio_processing_init(struct processing_module *mod)
 		return -EINVAL;
 	}

-	cd->config.output_fmt = mod->priv.cfg.input_pins[SOF_AEC_DMIC_QUEUE_ID].audio_fmt;


Good point on the topology tagging, can you create a FEATURE# for it and map out what would be needed etc.

Will do. Really this is a special case of a general flaw, which is that the topology data isn't reliably exposed to the firmware. There's a big library of code in the kernel to interpret it and translate it to ipc4 commands, but 80% of the time all we really want is the tuple array down here.

lgirdwood · 2024-03-08T14:01:51Z

src/audio/google/google_rtc_audio_processing.c

+static ALWAYS_INLINE float clamp_rescale(float max_val, float x)
+{
+	float min = -1.0f;
+	float max = 1.0f - 1.0f / max_val;
+
+	return max_val * (x < min ? min : (x > max ? max : x));
+}
+
+static ALWAYS_INLINE float s16_to_float(const char *ptr)
+{
+	float scale = -(float)SHRT_MIN;
+	float x = *(int16_t *)ptr;
+
+	return (1.0f / scale) * x;
+}
+
+static ALWAYS_INLINE void float_to_s16(float x, char *dst)
+{
+	*(int16_t *)dst = (int16_t)clamp_rescale(-(float)SHRT_MIN, x);
+}
+
+static ALWAYS_INLINE float s32_to_float(const char *ptr)
+{
+	float scale = -(float)INT_MIN;
+	float x = *(int32_t *)ptr;
+
+	return (1.0f / scale) * x;
+}
+
+static ALWAYS_INLINE void float_to_s32(float x, char *dst)
+{
+	*(int32_t *)dst = (int16_t)clamp_rescale(-(float)INT_MIN, x);
+}


I think we already have these conversions in a maths header. @singalsu where can @andyross find existing conversions ?

We actually don't, not in the form needed here. There is a fixed point set of conversion macros, but those are in maximal/64-bit precision for accuracy (they're intended for creating constants) and a pcm_converter library that is a C function that takes as an argument a whole array. The idea here is to get optimized code that does the fully unrolled de/interleave, format conversion and buffer rollover in one go. There was some

Also, to be fair, this is replacing code that also doesn't use the existing converters. That one was based on xcc/xt-clang intrinsics where this one is C, but both generate the same HiFi/FPU output (actually xt-clang on -O3 will even autovectorize this, though we don't use that mode).

@singalsu can we split these out into a common location after merge, it would save a lot of duplication. Also we should plan for SIMD versions of these especially since we are passing in arrays here.

andyross

Sorry, came by to nag for reviews and realized I still had stale comments. But regardless: nag for reviews please? This still rebases fine and smokes OK on MTL and mt8195 for me.

lkoenig

This look good to me. Did you try it on a device and checked the audio ?

lgirdwood

@marcinszkudlinski pls review.

marcinszkudlinski

accessing comp_buffer structures will stop working soon enough, it cannot be done this way.
@andyross, pipeline ID may be moved to struct sof_audio_stream_params and be accessible through sink/src API

marcinszkudlinski · 2024-04-15T13:58:06Z

src/audio/google/google_rtc_audio_processing.c

@@ -602,18 +604,16 @@ static int google_rtc_audio_processing_prepare(struct processing_module *mod,
 		return -EINVAL;
 	}

+	struct comp_buffer *b0 = list_first_item(&dev->bsource_list, struct comp_buffer, sink_list);


when using sink/source interface it is NOT guaranteed that comp_buffer structures are in use.
sink and source API may be provided by other entities.
At the moment there's a workaround in use - DpQueues do always work as a "shadow buffer" for comp_buffer, so this code it will work, but not for long.

So... I get that. It's relying on a stale/deprecated API scheme. But see the commit message. The problem is that the mechanism currently being used to answer the question "Is this source/sink connected to a component on the same pipeline or a different one?" is just incorrect as it stands. Those numbers are ephemeral values assigned at runtime in the kernel (e.g. they're like file descriptors), and the only reason they work here is because they've been hand-matched to the values chosen for this specific device with this topology.

So, I guess I'd argue we should pick the correct but to-be-replaced solution over the incorrect one. For sure a proper API is required. Basically this is a variant of a common SOF problem: the topology as it exists in the kernel is only incompletely exposed inside the firmware (everything gets cooked by the kernel, which drops a lot of information), so everywhere we're having to use trickery to recover it.

marcinszkudlinski · 2024-04-15T14:04:15Z

src/audio/google/Kconfig

@@ -24,14 +24,6 @@ config COMP_GOOGLE_RTC_AUDIO_PROCESSING
 	  This component takes raw microphones input and playback reference
 	  and outputs an echo-free microphone signal.



The internals of the AEC library are floating point already.

just a comment: from my measurements of the AEC binary I got - 16bit API used to be faster than 32bit.

Is that maybe polluted by conversion overhead? Or maybe it's a cache footprint issue? My understanding is that the behavior is identical once you get across the API boundary. @lkoenig might have input.

Internally the int16 are converted back to float32. If the int16 is faster than the float32, we might need to look into it.

@singalsu any comments here ?

@lkoenig I might have had an obsolete version of AEC binary, so my observations may be inaccurate.

From AEC quality point of view I think 32 bit float is a good choice over 16 bit integer. The raw Mic signal level is typically low since there is headroom for very loud ambient noise.

marcinszkudlinski · 2024-04-15T14:10:26Z

src/audio/google/google_rtc_audio_processing.c

@@ -93,6 +108,130 @@ void GoogleRtcFree(void *ptr)
 	return rfree(ptr);
 }

+static ALWAYS_INLINE float clamp_rescale(float max_val, float x)
+{
+	float min = -1.0f;


I suggest using HiFi coprocessor for this (peformance!!)

The compiler automatically emits HiFi instructions here (or FPU, for example on platforms that have HiFi3 w/o floating point support). There was some disassembly posted months ago in a previous iteration of this PR that showed what the inner loops look like. Basically the resulting code here is pretty much optimal. Actually if you build with -O3 (we don't) xt-clang is capable of autovectorizing the loop for another ~2x benefit on HiFi4/5 platforms.

lyakh · 2024-04-30T06:30:36Z

src/audio/google/google_rtc_audio_processing.c

+	struct comp_buffer *b0 = list_first_item(&dev->bsource_list, struct comp_buffer, sink_list);
+	struct comp_buffer *b1 = list_next_item(b0, sink_list);
+
+	cd->aec_reference_source = (is_ref_buffer(dev, b0) ? 0 : 1);


superfluous parentheses

lyakh · 2024-04-30T06:30:53Z

src/audio/google/google_rtc_audio_processing.c

+	struct comp_buffer *b1 = list_next_item(b0, sink_list);
+
+	cd->aec_reference_source = (is_ref_buffer(dev, b0) ? 0 : 1);
+	cd->raw_microphone_source = (cd->aec_reference_source == 1) ? 0 : 1;


how about 1 - cd->aec_reference_source

Lets keep it simple, if we need to remap IDs then lets have a comment saying why/how.

singalsu

Looks OK to me as far as I can understand this.

cujomalainey · 2024-05-14T16:37:30Z

src/audio/google/google_rtc_audio_processing.c

+ * are single-core component and know we can safely use the cache for
+ * AEC work.  XTOS SOF is cached by default, so stub the Zephyr API.
+ */
+#define arch_xtensa_cached_ptr(p) (p)


@lgirdwood we will be porting this to platforms still using XTOS so it is needed

cujomalainey · 2024-05-14T16:38:28Z

src/audio/google/google_rtc_audio_processing.c

+#endif
+
+#ifndef __ZEPHYR__
+#define ALWAYS_INLINE inline __attribute__((always_inline))


maybe put this in some rtos header for XTOS?

cujomalainey · 2024-05-14T16:39:51Z

src/audio/google/google_rtc_audio_processing.c

+static __aligned(PLATFORM_DCACHE_ALIGN)
+uint8_t aec_mem_blob[CONFIG_COMP_GOOGLE_RTC_AUDIO_PROCESSING_MEMORY_BUFFER_SIZE_KB * 1024];
+
+#define NUM_FRAMES (CONFIG_COMP_GOOGLE_RTC_AUDIO_PROCESSING_SAMPLE_RATE_HZ \


this should be dynamic with IPC4 right?

cujomalainey · 2024-05-14T16:40:34Z

src/audio/google/google_rtc_audio_processing.c

+	return max_val * (x < min ? min : (x > max ? max : x));
+}
+
+static ALWAYS_INLINE float s16_to_float(const char *ptr)


I feel like these should exist somewhere more generic so other components can use them if they are more optimized than the existing macros.

lgirdwood · 2024-05-31T08:06:16Z

accessing comp_buffer structures will stop working soon enough, it cannot be done this way. @andyross, pipeline ID may be moved to struct sof_audio_stream_params and be accessible through sink/src API

@andyross I think this old method is probably more and more in the untested territory today (before its fully deleted). Are you able to update ?

andyross · 2024-05-31T16:35:46Z

I'll can take a look at what's needed to put bidirectional pipeline IDs into sink/src, sure. There's a little design worry though. Are we all sure that's the right spot though? Are all streams guaranteed to be connected on both sides to something "on a pipeline"? Are there lifecycle concerns (e.g. does a stream ever get reconnected)?

I guess if we're going to do that, can't we put a module backpointer into the stream instead? That would be more general IMHO, and preserve the old capability of traversing the component/stream graph via a known API.

lgirdwood · 2024-06-04T09:44:58Z

I'll can take a look at what's needed to put bidirectional pipeline IDs into sink/src, sure. There's a little design worry though. Are we all sure that's the right spot though? Are all streams guaranteed to be connected on both sides to something "on a pipeline"? Are there lifecycle concerns (e.g. does a stream ever get reconnected)?

I guess if we're going to do that, can't we put a module backpointer into the stream instead? That would be more general IMHO, and preserve the old capability of traversing the component/stream graph via a known API.

@marcinszkudlinski @mwasko can you guys comment here. Thanks !

cujomalainey · 2024-06-04T20:06:17Z

accessing comp_buffer structures will stop working soon enough, it cannot be done this way. @andyross, pipeline ID may be moved to struct sof_audio_stream_params and be accessible through sink/src API

@andyross I think this old method is probably more and more in the untested territory today (before its fully deleted). Are you able to update ?

Can this work be expedited as it is blocking us on a lot of devices. Also I would assume this is breaking DSM as well.

andyross · 2024-06-04T20:09:15Z

I think I'm seeing a path here. Give me a day or so and be prepared for a little refactoring.

andyross · 2024-06-07T16:32:01Z

OK, reworked to move the comp_buffer handling out of the source idenfitication path via some refactoring of the buffer fields. There's still some use to detect "is source active", which is something that the new scheme doesn't have an API for. But that's not needed for IPC4 builds as the pipelines are always synchronized at the kernel level, so I can hide it under and #ifdef.

This is now sitting on some slightly wordy refactoring patches, not all of which are strictly required. Will submit those separately; please review there and keep the AEC reviews as clean as possible.

andyross · 2024-06-07T16:40:53Z

See #9210 and #9211 for split-out review of the early patches in this series.

marcinszkudlinski · 2024-06-11T13:05:06Z

src/audio/google/google_rtc_audio_processing.c

 	comp_data_blob_handler_free(cd->tuning_handler);
 	rfree(cd);
 	return 0;
 }

+#define PIPE_ID(srcsink) ((srcsink)->audio_stream_params->pipeline_id)


I wrote also a note in other PR.
use

static inline uint32_t sink_get_pipleine_id(struct sof_sink *sink) { return sink->audio_stream_params->pipeline_id; }

and put it in sink_api.h

I would like to keep as many sink/source internals "private" as possible

changes addressed, just one tiny comment

andyross · 2024-06-11T19:50:09Z

Rebase on current version of #9211, use the sink/source_get_pipeline_id() methods added there.

The operation of the AEC component uses a single buffer as an internal heap. This is very large, over half the available SRAM at component creation time on MTL. That's just a poor fit for the heap. It would be trivial to create a fragmentation scenario by creating/destroying components (which happens under user control all the way out in Linux userspace!) where AEC can't initialize and microphone input breaks. Longer term we can look at moving this usage back to the heap by integrating the component's internal allocations with the SOF/Zephyr heap (which is quite performant), allowing it to make fine-grained allocations which will work more robustly. Signed-off-by: Andy Ross <andyross@google.com>

This technique doesn't work. The ID returned by get_source_id() is not fixed by topology. Those numbers are derived from a component ID allocated in the Linux kernel via ida_alloc(). The specific values will depend on the state and history of the allocator, we can't compare them via numerical identity here in the firmware even if they happen to work right now due to topology ordering. Fall back to the older technique of checking whether the input source is on the same pipeline as the AEC component to determine if it's the microphone input. Signed-off-by: Andy Ross <andyross@google.com>

The internals of the AEC library are floating point already. If we're going to support using the float variant of the API, we should use it always to avoid all the complexity. Signed-off-by: Andy Ross <andyross@google.com>

In point of fact the AEC code has never worked on SOF main, and this code doesn't either. It got left here as new code got added to support for MTL, and it's just a wart. Really there's nothing "IPC4ish" at all about the new code at all, there's no reason a single code path can't be used for both, the process/source/sink APIs are identical. The new code doesn't work with legacy builds, mind you. But it will. Remove the stuff that will never be used. Signed-off-by: Andy Ross <andyross@google.com>

On IPC3 pipelines, triggers can arrive at this component due to changes in the reference pipeline. Those aren't for us, and have the effect of incorrectly resetting the capture stream if someone stops playback. Earlier product branches handled this logic in the pipeline layer, but that never reached SOF main, and it's easier to do here by just ignoring the event. On IPC4, triggers never propagate across pipelines (and in any case dependent pipeline state management happens in the host kernel), so this becomes a benign noop. Signed-off-by: Andy Ross <andyross@google.com>

Put the AEC tunables inside an if COMP_GOOGLE_RTC_AUDIO_PROCESSING for hygine. This prevents them from appearing in .config files of SOF builds where AEC was never enabled at all. Set MOCK via a default instead of select. Select is unoverridable, it forces the MOCK to be used whenever COMP_STUBS=y, but it's more flexible to allow the app to pick and choose which components get stubbed (STUBS is often set at the platform layer). Signed-off-by: Andy Ross <andyross@google.com>

andyross · 2024-06-18T17:41:35Z

Final rebase now that #9211 has merged; still tests OK or me on mt8195 and mtl. Should be good to go now, no code changes since last version.

kv2019i · 2024-06-19T12:00:48Z

@andyross The Zephyr LLEXT fail seem related to code added in the PR https://github.com/thesofproject/sof/actions/runs/9569679341/job/26382722247?pr=8919
Other test results still pending... will wait to see those.

Big rewrite of the core processing code of AEC: Support both S32 and S16 input and output formats, dynamically selected at prepare() time based on stream configuration. Copy/convert data in maximally inlined/unrolled loops, using cleanly-generated (no duplication!) custom conversion utilities for each format variant. Orthogonalize and elaborate the validation code in prepare(). Check all state for all input/output streams. Decouple AEC operation from the input stream, filling zeros on underflow and allowing AEC to run in circumstances where no playback data exists and to recover when it starts/stops. IPC3 setups can exploit this now, unfortunately IPC4 always starts connected pipelines from the host kernel so sees no benefit. Fix a latency bug with the original code where it would copy the processed results to the output stream before the call to ProcessCapture() instead of after, leading to a needless delay. Copy the results as soon as they are available, if the output buffer backs up, we'll continue at the next call to process() Signed-off-by: Andy Ross <andyross@google.com>

andyross · 2024-06-19T15:27:42Z

Ah, indeed. Just the build warning, right? Fixed.

lyakh · 2024-06-20T07:12:45Z

src/audio/google/google_rtc_audio_processing.c

+		dst[i] = &dst_bufs[i][frame0];
+
+	err = source_get_data(src, bytes, (void *)&buf, (void *)&bufstart, &bufsz);
+	assert(err == 0);


why not handle errors properly? Particularly because assert() is a NOP in release builds, or is Google building with them enabled always?

The only documented error happens when the size passed is larger than available data, which is untrue by construction as we've already queried the buffer state up the stack. Thus the assert, to catch bugs with that logic.

lyakh

I'd say, that assert()s can be clarified / cleaned up in a follow-up

andyross · 2024-06-21T13:51:55Z

Ping for merge? Any last reviews needed?

marc-hb · 2024-06-27T01:03:21Z

This was merged with sparse annotations broken:
https://github.com/thesofproject/sof/actions/runs/9584654607/job/26428767199

Tentative fixup from @andyross in #9265

andyross requested review from a team, RanderWang, marcinszkudlinski and pblaszko as code owners March 7, 2024 17:32

andyross requested review from lyakh, cujomalainey, eddy1021, johnylin76 and kv2019i March 7, 2024 17:34

lgirdwood reviewed Mar 8, 2024

View reviewed changes

andyross commented Mar 20, 2024

View reviewed changes

lkoenig reviewed Mar 25, 2024

View reviewed changes

lgirdwood approved these changes Apr 15, 2024

View reviewed changes

marcinszkudlinski previously requested changes Apr 15, 2024

View reviewed changes

lyakh reviewed Apr 30, 2024

View reviewed changes

singalsu approved these changes May 3, 2024

View reviewed changes

cujomalainey reviewed May 14, 2024

View reviewed changes

andyross force-pushed the aec-rework branch from 18cc95a to 2c8fb0a Compare June 7, 2024 16:29

andyross requested review from bardliao, yaochunhung, kuanhsuncheng, ranj063, jxstelter, fkwasowi and abonislawski as code owners June 7, 2024 16:29

andyross requested a review from dbaluta as a code owner June 7, 2024 16:29

andyross force-pushed the aec-rework branch from 2c8fb0a to bb28b2d Compare June 7, 2024 16:29

andyross mentioned this pull request Jun 7, 2024

component/module refactoring pass #9211

Merged

marcinszkudlinski reviewed Jun 11, 2024

View reviewed changes

cujomalainey mentioned this pull request Jun 11, 2024

Audio: aec: optimize acoustic echo cancellation processing #8877

Open

andyross force-pushed the aec-rework branch from bb28b2d to fb41c0f Compare June 11, 2024 19:49

marcinszkudlinski approved these changes Jun 12, 2024

View reviewed changes

andyross added 6 commits June 18, 2024 08:23

google_aec: Remove FLOAT_API configurability

53c9fad

The internals of the AEC library are floating point already. If we're going to support using the float variant of the API, we should use it always to avoid all the complexity. Signed-off-by: Andy Ross <andyross@google.com>

andyross force-pushed the aec-rework branch from fb41c0f to 142549e Compare June 18, 2024 17:40

andyross force-pushed the aec-rework branch from 142549e to 018cec3 Compare June 19, 2024 15:27

lyakh reviewed Jun 20, 2024

View reviewed changes

lyakh approved these changes Jun 20, 2024

View reviewed changes

cujomalainey merged commit 28a5265 into thesofproject:main Jun 21, 2024
40 of 46 checks passed

lyakh mentioned this pull request Jun 26, 2024

[TEST] Volume LLEXT #9172

Closed

marc-hb mentioned this pull request Jun 28, 2024

Global UUID registry, cleanup, simplification #9261

Merged

		@@ -24,14 +24,6 @@ config COMP_GOOGLE_RTC_AUDIO_PROCESSING
		This component takes raw microphones input and playback reference
		and outputs an echo-free microphone signal.

Google AEC rework: dynamic formats, legacy platform support #8919

Google AEC rework: dynamic formats, legacy platform support #8919

Conversation

andyross commented Mar 7, 2024

lgirdwood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyross left a comment

Choose a reason for hiding this comment

lkoenig left a comment

Choose a reason for hiding this comment

lgirdwood left a comment

Choose a reason for hiding this comment

marcinszkudlinski left a comment

Choose a reason for hiding this comment

marcinszkudlinski Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcinszkudlinski Apr 15, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

singalsu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lgirdwood commented May 31, 2024

andyross commented May 31, 2024

lgirdwood commented Jun 4, 2024

cujomalainey commented Jun 4, 2024

andyross commented Jun 4, 2024

andyross commented Jun 7, 2024

andyross commented Jun 7, 2024

Choose a reason for hiding this comment

andyross commented Jun 11, 2024

andyross commented Jun 18, 2024

kv2019i commented Jun 19, 2024

andyross commented Jun 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lyakh left a comment

Choose a reason for hiding this comment

andyross commented Jun 21, 2024

marc-hb commented Jun 27, 2024

marcinszkudlinski Apr 15, 2024 •

edited

Loading

marcinszkudlinski Apr 15, 2024 •

edited

Loading