Refactor audio handling #277

ryze312 · 2024-03-08T13:34:51Z

Hey, here is another PR from me!
Made a few changes to audio handling and cleaned up some of the code. In particular audio devices get started/stopped on demand upon entering voice call. This allows audio backends like PipeWire to suspend sinks when they there isn't anything active connected to them.

AudioManager:

Audio devices get started/stopped on voice connect/disconnect
Reduce code duplication with Open/TryOpen functions
Don't store device configs and ids

AudioDevices:

Rename [Get/Set]ActivePlaybackDevice to [Get/Set]ActivePlaybackDeviceIter
Declare getters as const and [[nodiscard]]
Reduce code duplication by using Get[Playback/Capture]DeviceIDFromModel
Add GetActive[Playback/Capture]

TODO:

Start/stop voice audio devices on demand
Move notification sounds handling to SystemAudio
Start/stop notification audio device after timeout as well
Split voice audio into multiple devices per user
Add options for persistent audio devices and split voice audio

ryze312 · 2024-03-08T13:52:53Z

The Windows build seems to be failing because of the latest commit involving macOS in master

ouwou · 2024-03-09T06:16:26Z

i like. build error will go away if you merge master back in.

for notification sounds: you might need to fiddle with cmake for this (in case notification sounds are enabled and voice is disabled). although maybe u can just change the include guard from WITH_VOICE to WITH_MINIAUDIO. other issue would be how to actually play it considering they use the high-level engine api from miniaudio while voice uses low-level device api. we can just shove both into the AudioManager but if we want to avoid that duplication, we can either use the high-level api for everything incl voice (the engine config can take context and playback device and ma_engine_read_pcm_frames can be used in the device callback). or stay with low-level and use something like ma_decoder_init_file but idk how well that will work considering eg multiple of same sound at once or repeating it

per-user devices seems niche but im ok with doing that if its not too much of a mess to work with

src/audio/manager.cpp

ryze312 · 2024-03-09T17:17:54Z

use the high-level api for everything incl voice

I don't think that's possible, looks like engine API is meant for only for playback. For captures we'll still have to resort to low-level API, especially if we want to support split voice audio, since the audio devices will need to be started/stopped dynamically.

per-user devices seems niche but im ok with doing that if its not too much of a mess to work with

My main use case for this would be for recording voice calls into separate audio channels in OBS, so that I can manipulate each user's audio independently.

ouwou · 2024-03-10T04:05:45Z

For captures we'll still have to resort to low-level

true. for playback then it might be better to just use engine api and mix device init/callback stuff like i said. capture seems like it would remain the same

My main use case for this would be for recording voice calls into separate audio channels in OBS, so that I can manipulate each user's audio independently.

ok sure im cool with that. as long as it doesnt clobber up the ui for most people who wont need that

AudioManager: * Audio devices get started/stopped on voice connect/disconnect * Reduce code duplication with Open/TryOpen functions * Don't store device configs and ids AudioDevices: * Rename [Get/Set]ActivePlaybackDevice to [Get/Set]ActivePlaybackDeviceIter * Declare getters as const and [[nodiscard]] * Reduce code duplication by using Get[Playback/Capture]DeviceIDFromModel * Add GetActive[Playback/Capture]

- Replace std::move with reference - Log warning instead of assert on opening/closing devices - Remove branching in logging + extract into a function

ryze312 · 2024-03-17T14:32:39Z

Currently rewriting and splitting AudioManager functionality into multiple classes. Is there are a particular reason ma_format_s16 is used instead of ma_format_f32? RNNoise uses f32 samples and OPUS can encode that too.

ouwou · 2024-03-17T17:37:18Z

uhh i probably figured it was simpler since im not doing any fancy processing and opus's float function just internally converts to s16. rnnoise doesnt actually use f32 it uses s16 also. if you think you get something out of switching over to f32 then go for it

ryze312 · 2024-03-17T18:56:42Z

rnnoise doesnt actually use f32 it uses s16 also

rnnoise_process_frame uses float*, the original code converts int16_t to float and then passes that to the function. Seems strange.

abaddon/src/audio/manager.cpp

Lines 571 to 575 in b3a8356

    
           static float rnnoise_input[480]; 
        
           for (size_t i = 0; i < 480; i++) { 
        
               rnnoise_input[i] = static_cast<float>(pcm[i * 2]); 
        
           } 
        
           m_vad_prob = std::max(m_vad_prob.load(), rnnoise_process_frame(m_rnnoise[0], denoised_left, rnnoise_input));

ryze312 · 2024-03-17T19:13:05Z

opus's float function just internally converts to s16

That's only the case if it was compiled with FIXED_POINT

#ifdef FIXED_POINT
// snip
#else
opus_int32 opus_encode_float(OpusEncoder *st, const float *pcm, int analysis_frame_size,
                      unsigned char *data, opus_int32 out_data_bytes)
{
   int frame_size;
   frame_size = frame_size_select(analysis_frame_size, st->variable_duration, st->Fs);
   return opus_encode_native(st, pcm, frame_size, data, out_data_bytes, 24,
                             pcm, analysis_frame_size, 0, -2, st->channels, downmix_float, 1);
}

https://github.com/xiph/opus/blob/95dbea83486b90256785aa3c75dd2827f591a34c/src/opus_encoder.c#L2522-L2529

ryze312 · 2024-03-17T19:19:12Z

Using s16 will typically lead to unnecessary conversions:

Device (f32) -> miniaudio -> Abaddon (s16) -> RNNoise (f32) -> Abaddon (s16) -> Opus (f32)

So we might just use f32 and opus_encode_float and there will be no conversions:

Device (f32) -> miniaudio -> Abaddon (f32) -> RNNoise (f32) -> Abaddon (f32) -> Opus (f32)

ouwou · 2024-03-18T00:05:22Z

ok sure lets do f32 then. bonus points for it also being easier to reason with since its easy -1 to 1

ouwou · 2024-03-31T01:59:26Z

forgot to mention but i rewrote the buffer stuff just a bit to add a jitter buffer since some ppl were complaining about sound so lemme know if u have questions on how that works. admittedly its not the best implementation

ouwou · 2024-04-01T08:33:27Z

just kidding im gonna revert that tomorrow cuz it looks like its causing more problems than it solves

ryze312 · 2024-04-06T19:06:34Z

just kidding im gonna revert that tomorrow cuz it looks like its causing more problems than it solves

Oh okay, I'll see if I can implement it anyway. It looks like the original code uses std::deque which is potentially risky in case of overruns/other kinds of blocks, we might want to implement a proper ring buffer

ouwou · 2024-04-07T01:04:17Z

you do get underruns but it just results in a little crackle. i encounter it sometimes but its not too bad but i guess its worse if you have bad internet. i think something like libspeexdsp's jitter buffer would be ideal but i dont want to bring it in as a dependency tbh. every jitter buffer i found seems to operate on RTP payloads too so i guess we should also (which my attempt didnt do lol)

ryze312 · 2024-04-07T18:01:49Z

you do get underruns but it just results in a little crackle.

I meant that FeedMeOpus just pushes to std::deque without any limit to the buffer size. In case of a stall in audio callback, this might lead to delay and/or increase in memory consumption, unless it can catch up by taking frames faster than they get decoded.

Deque is also a dynamically sized meaning it can cause a reallocation, which is something is generally discouraged in real-time audio processing.

I think we should implement a proper ring buffer with fixed size, so that Opus decoder stops pushing samples if the audio thread can't keep up.

ouwou · 2024-04-08T01:23:56Z

might lead to delay and/or increase in memory consumption

yeah true. i thought about that but chose to worry about it later 😼. miniaudio has a ring buffer for audio too https://miniaud.io/docs/manual/index.html#RingBuffers
i just wonder whatll happen if it gets congested. the problem with the jitter buffer i wrote was if it filled up too much and started trying to discard data from the start then it would kind of get stuck doing that and the end result is choppy ugly audio. so i wonder if something like that would happen with a ring buffer if it loops back to the front? im sure theres a solution to that but idk it

ryze312 · 2024-04-09T16:04:33Z

miniaudio has a ring buffer for audio too

Haven't seen that, should be easy to integrate

so i wonder if something like that would happen with a ring buffer if it loops back to the front

I don't think this would happen with ring buffer, since the writer (decoder) is going to discard new frames instead of the old ones in case the reader (audio callback) cannot keep up. The reader can then detect that the buffer is full and and apply something like interpolation to catch up with the writer. This might distort the audio (higher pitch), but it won't end up choppy.

ouwou · 2024-04-10T00:23:28Z

maybe. might be worth looking at how something like mumble does it. i know they use libspeexdsp for jitter so we can see how that interacts

ryze312 · 2024-06-04T18:02:40Z

I think this should be all

ouwou · 2024-06-10T23:22:24Z

still need to fix the build when compiling without audio support so a bunch of stuff probably needs to be wrapped in #ifdef WITH_MINIAUDIO
ill give everything another test soon ™

ryze312 · 2024-06-11T18:43:34Z

a bunch of stuff probably needs to be wrapped in #ifdef WITH_MINIAUDIO

I think the right approach would be not including the sources in CMake at all, not only does it increase the compilation time, but also having to include the guard in each file seems tedious.

Just did that, seems to compile and work fine.

ryze312 · 2024-06-11T18:46:18Z

Uh nevermind

ryze312 · 2024-06-11T20:10:52Z

The build seems to be failing now because of LTO. Might be this issue?

ouwou · 2024-06-11T20:45:54Z

maybe this? diasurgical/devilutionX@8693a0f

ryze312 · 2024-06-12T14:29:08Z

Sure, I'll see if disabling decloning for those specific files works, if not, we can try disabling it for the entire executable.

ryze312 · 2024-06-12T16:27:41Z

I have no idea what's breaking it now, I'll just disable LTO for Windows

ouwou · 2024-06-13T02:43:23Z

theres a good bit of missing headers. it builds without them thanks to precompiled headers meaning everything ends up transiently included but they should be added anyways (it breaks my intellisense too). you can disable precompiled headers in cmake and it should give you errors where stuff is missing. adding headers wont hurt compile time (at least in my testing) if they are included in the precompiled headers

ouwou · 2024-06-25T07:20:10Z

ok giving it some more testing

suppress noise seems to only denoise the right channel
StripRTPExtensionHeader is broken. obviously this isnt ur fault but if i fix it in master itll cause a conflict here so it should probably be fixed here. i think i can add a pretty easy fix so ill comment a diff soon-ish
i get a segfault eventually at mutex.hpp:93 🙀 0xbaadf00dbaadf00d suggests the memory is getting freed somewhere.

#0  0x00007ffbf8172f58 in ?? () from C:\msys64\mingw64\bin\libwinpthread-1.dll
#1  0x00007ff79f6b1f1e in __gthread_mutex_lock (__mutex=0xbaadf00dbaadf00d) at C:/msys64/mingw64/include/c++/14.1.0/x86_64-w64-mingw32/bits/gthr-default.h:762
#2  std::mutex::lock (this=0xbaadf00dbaadf00d) at C:/msys64/mingw64/include/c++/14.1.0/bits/std_mutex.h:113
#3  std::unique_lock<std::mutex>::lock (this=<synthetic pointer>) at C:/msys64/mingw64/include/c++/14.1.0/bits/unique_lock.h:147
#4  std::unique_lock<std::mutex>::unique_lock (this=<synthetic pointer>, __m=...) at C:/msys64/mingw64/include/c++/14.1.0/bits/unique_lock.h:73
#5  Mutex<AbaddonClient::Audio::Voice::Opus::OpusDecoder>::Lock (this=0xbaadf00dbaadf00d) at C:/path/src/misc/mutex.hpp:93
#6  AbaddonClient::Audio::Voice::Playback::DecodePool::OnDecodeMessage (message=...) at C:/path/src/audio/voice/playback/decode_pool.cpp:54
#7  0x00007ff79f6b20b1 in AbaddonClient::Audio::Voice::Playback::DecodePool::DecodeThread (channel=...) at C:/path/src/audio/voice/playback/decode_pool.cpp:44
#8  0x00007ffb72fc578f in ?? () from C:\msys64\mingw64\bin\libstdc++-6.dll
#9  0x00007ffbf8174dbb in ?? () from C:\msys64\mingw64\bin\libwinpthread-1.dll
#10 0x00007ffc098eaf5a in msvcrt!_beginthreadex () from C:\Windows\System32\msvcrt.dll
#11 0x00007ffc098eb02c in msvcrt!_endthreadex () from C:\Windows\System32\msvcrt.dll
#12 0x00007ffc09f07344 in KERNEL32!BaseThreadInitThunk () from C:\Windows\System32\kernel32.dll
#13 0x00007ffc0b8bcc91 in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
#14 0x0000000000000000 in ?? ()

ouwou · 2024-07-05T08:04:38Z

e6191d9 should address point 2. shouldnt be too hard to port over here

ryze312 · 2024-07-11T19:18:10Z

suppress noise seems to only denoise the right channel

Seems to happen if you use RNNoise as VAD and then switch it to gate.

ryze312 · 2024-07-11T19:23:31Z

i get a segfault eventually at mutex.hpp:93

Not sure what could be wrong here, shared_ptr should prevent Mutex from being freed

ryze312 changed the title ~~Refactor audio handlng~~ Refactor audio handling Mar 8, 2024

ouwou reviewed Mar 9, 2024

View reviewed changes

src/audio/manager.cpp Outdated Show resolved Hide resolved

ouwou reviewed Mar 9, 2024

View reviewed changes

src/audio/manager.cpp Outdated Show resolved Hide resolved

ouwou reviewed Mar 9, 2024

View reviewed changes

src/audio/manager.cpp Outdated Show resolved Hide resolved

ryze312 added 2 commits March 11, 2024 17:36

Apply suggestions

cf64646

- Replace std::move with reference - Log warning instead of assert on opening/closing devices - Remove branching in logging + extract into a function

ryze312 force-pushed the audio_refactor branch from a5064e7 to cf64646 Compare March 11, 2024 14:44

ryze312 added 5 commits May 12, 2024 23:43

Implement RAII mutex

eb1451a

Implement slice

8e1d681

Implement data channel

0992139

Implement thread pool

555a3cd

Fix RemoveSSRC never getting called

dbef57a

ryze312 marked this pull request as ready for review June 4, 2024 18:07

Fix includes, include audio sources only when needed

9d5eb73

ryze312 force-pushed the audio_refactor branch from eaf9d00 to 9d5eb73 Compare June 11, 2024 19:35

ryze312 force-pushed the audio_refactor branch 5 times, most recently from 8c9aa44 to e323410 Compare June 12, 2024 16:17

Disable LTO on Windows

12e701c

ryze312 force-pushed the audio_refactor branch from e323410 to 12e701c Compare June 12, 2024 16:30

Document separate_sources option in README

98306b2

ryze312 added 2 commits June 13, 2024 12:47

Add missing headers

2d96b38

Merge branch 'master' into audio_refactor

c3cf374

Update RTP stripping

06fb2a9

ryze312 added 3 commits July 11, 2024 23:24

Fix left denoised channel not being written

8d4384e

Sync with upstream

d6ae366

Remove duplicate definition of GetPayloadOffset

e48e487

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor audio handling #277

Refactor audio handling #277

ryze312 commented Mar 8, 2024 •

edited

Loading

ryze312 commented Mar 8, 2024

ouwou commented Mar 9, 2024

ryze312 commented Mar 9, 2024

ouwou commented Mar 10, 2024

ryze312 commented Mar 17, 2024

ouwou commented Mar 17, 2024

ryze312 commented Mar 17, 2024 •

edited

Loading

ryze312 commented Mar 17, 2024

ryze312 commented Mar 17, 2024 •

edited

Loading

ouwou commented Mar 18, 2024

ouwou commented Mar 31, 2024

ouwou commented Apr 1, 2024

ryze312 commented Apr 6, 2024

ouwou commented Apr 7, 2024

ryze312 commented Apr 7, 2024

ouwou commented Apr 8, 2024

ryze312 commented Apr 9, 2024

ouwou commented Apr 10, 2024

ryze312 commented Jun 4, 2024

ouwou commented Jun 10, 2024

ryze312 commented Jun 11, 2024

ryze312 commented Jun 11, 2024

ryze312 commented Jun 11, 2024

ouwou commented Jun 11, 2024

ryze312 commented Jun 12, 2024

ryze312 commented Jun 12, 2024

ouwou commented Jun 13, 2024

ouwou commented Jun 25, 2024 •

edited

Loading

ouwou commented Jul 5, 2024

ryze312 commented Jul 11, 2024 •

edited

Loading

ryze312 commented Jul 11, 2024

Refactor audio handling #277

Are you sure you want to change the base?

Refactor audio handling #277

Conversation

ryze312 commented Mar 8, 2024 • edited Loading

ryze312 commented Mar 8, 2024

ouwou commented Mar 9, 2024

ryze312 commented Mar 9, 2024

ouwou commented Mar 10, 2024

ryze312 commented Mar 17, 2024

ouwou commented Mar 17, 2024

ryze312 commented Mar 17, 2024 • edited Loading

ryze312 commented Mar 17, 2024

ryze312 commented Mar 17, 2024 • edited Loading

ouwou commented Mar 18, 2024

ouwou commented Mar 31, 2024

ouwou commented Apr 1, 2024

ryze312 commented Apr 6, 2024

ouwou commented Apr 7, 2024

ryze312 commented Apr 7, 2024

ouwou commented Apr 8, 2024

ryze312 commented Apr 9, 2024

ouwou commented Apr 10, 2024

ryze312 commented Jun 4, 2024

ouwou commented Jun 10, 2024

ryze312 commented Jun 11, 2024

ryze312 commented Jun 11, 2024

ryze312 commented Jun 11, 2024

ouwou commented Jun 11, 2024

ryze312 commented Jun 12, 2024

ryze312 commented Jun 12, 2024

ouwou commented Jun 13, 2024

ouwou commented Jun 25, 2024 • edited Loading

ouwou commented Jul 5, 2024

ryze312 commented Jul 11, 2024 • edited Loading

ryze312 commented Jul 11, 2024

ryze312 commented Mar 8, 2024 •

edited

Loading

ryze312 commented Mar 17, 2024 •

edited

Loading

ryze312 commented Mar 17, 2024 •

edited

Loading

ouwou commented Jun 25, 2024 •

edited

Loading

ryze312 commented Jul 11, 2024 •

edited

Loading