Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor audio handling #277

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open

Refactor audio handling #277

wants to merge 42 commits into from

Conversation

ryze312
Copy link
Contributor

@ryze312 ryze312 commented Mar 8, 2024

Hey, here is another PR from me!
Made a few changes to audio handling and cleaned up some of the code. In particular audio devices get started/stopped on demand upon entering voice call. This allows audio backends like PipeWire to suspend sinks when they there isn't anything active connected to them.

AudioManager:

  • Audio devices get started/stopped on voice connect/disconnect
  • Reduce code duplication with Open/TryOpen functions
  • Don't store device configs and ids

AudioDevices:

  • Rename [Get/Set]ActivePlaybackDevice to [Get/Set]ActivePlaybackDeviceIter
  • Declare getters as const and [[nodiscard]]
  • Reduce code duplication by using Get[Playback/Capture]DeviceIDFromModel
  • Add GetActive[Playback/Capture]

TODO:

  • Start/stop voice audio devices on demand
  • Move notification sounds handling to SystemAudio
  • Start/stop notification audio device after timeout as well
  • Split voice audio into multiple devices per user
  • Add options for persistent audio devices and split voice audio

@ryze312
Copy link
Contributor Author

ryze312 commented Mar 8, 2024

The Windows build seems to be failing because of the latest commit involving macOS in master

@ryze312 ryze312 changed the title Refactor audio handlng Refactor audio handling Mar 8, 2024
@ouwou
Copy link
Member

ouwou commented Mar 9, 2024

i like. build error will go away if you merge master back in.

for notification sounds: you might need to fiddle with cmake for this (in case notification sounds are enabled and voice is disabled). although maybe u can just change the include guard from WITH_VOICE to WITH_MINIAUDIO. other issue would be how to actually play it considering they use the high-level engine api from miniaudio while voice uses low-level device api. we can just shove both into the AudioManager but if we want to avoid that duplication, we can either use the high-level api for everything incl voice (the engine config can take context and playback device and ma_engine_read_pcm_frames can be used in the device callback). or stay with low-level and use something like ma_decoder_init_file but idk how well that will work considering eg multiple of same sound at once or repeating it

per-user devices seems niche but im ok with doing that if its not too much of a mess to work with

src/audio/manager.cpp Outdated Show resolved Hide resolved
src/audio/manager.cpp Outdated Show resolved Hide resolved
src/audio/manager.cpp Outdated Show resolved Hide resolved
@ryze312
Copy link
Contributor Author

ryze312 commented Mar 9, 2024

use the high-level api for everything incl voice

I don't think that's possible, looks like engine API is meant for only for playback. For captures we'll still have to resort to low-level API, especially if we want to support split voice audio, since the audio devices will need to be started/stopped dynamically.

per-user devices seems niche but im ok with doing that if its not too much of a mess to work with

My main use case for this would be for recording voice calls into separate audio channels in OBS, so that I can manipulate each user's audio independently.

@ouwou
Copy link
Member

ouwou commented Mar 10, 2024

For captures we'll still have to resort to low-level

true. for playback then it might be better to just use engine api and mix device init/callback stuff like i said. capture seems like it would remain the same

My main use case for this would be for recording voice calls into separate audio channels in OBS, so that I can manipulate each user's audio independently.

ok sure im cool with that. as long as it doesnt clobber up the ui for most people who wont need that

AudioManager:
* Audio devices get started/stopped on voice connect/disconnect
* Reduce code duplication with Open/TryOpen functions
* Don't store device configs and ids

AudioDevices:
* Rename [Get/Set]ActivePlaybackDevice to [Get/Set]ActivePlaybackDeviceIter
* Declare getters as const and [[nodiscard]]
* Reduce code duplication by using Get[Playback/Capture]DeviceIDFromModel
* Add GetActive[Playback/Capture]
- Replace std::move with reference
- Log warning instead of assert on opening/closing devices
- Remove branching in logging + extract into a function
@ryze312
Copy link
Contributor Author

ryze312 commented Mar 17, 2024

Currently rewriting and splitting AudioManager functionality into multiple classes. Is there are a particular reason ma_format_s16 is used instead of ma_format_f32? RNNoise uses f32 samples and OPUS can encode that too.

@ouwou
Copy link
Member

ouwou commented Mar 17, 2024

uhh i probably figured it was simpler since im not doing any fancy processing and opus's float function just internally converts to s16. rnnoise doesnt actually use f32 it uses s16 also. if you think you get something out of switching over to f32 then go for it

@ryze312
Copy link
Contributor Author

ryze312 commented Mar 17, 2024

rnnoise doesnt actually use f32 it uses s16 also

rnnoise_process_frame uses float*, the original code converts int16_t to float and then passes that to the function. Seems strange.

static float rnnoise_input[480];
for (size_t i = 0; i < 480; i++) {
rnnoise_input[i] = static_cast<float>(pcm[i * 2]);
}
m_vad_prob = std::max(m_vad_prob.load(), rnnoise_process_frame(m_rnnoise[0], denoised_left, rnnoise_input));

@ryze312
Copy link
Contributor Author

ryze312 commented Mar 17, 2024

opus's float function just internally converts to s16

That's only the case if it was compiled with FIXED_POINT

#ifdef FIXED_POINT
// snip
#else
opus_int32 opus_encode_float(OpusEncoder *st, const float *pcm, int analysis_frame_size,
                      unsigned char *data, opus_int32 out_data_bytes)
{
   int frame_size;
   frame_size = frame_size_select(analysis_frame_size, st->variable_duration, st->Fs);
   return opus_encode_native(st, pcm, frame_size, data, out_data_bytes, 24,
                             pcm, analysis_frame_size, 0, -2, st->channels, downmix_float, 1);
}

https://github.com/xiph/opus/blob/95dbea83486b90256785aa3c75dd2827f591a34c/src/opus_encoder.c#L2522-L2529

@ryze312
Copy link
Contributor Author

ryze312 commented Mar 17, 2024

Using s16 will typically lead to unnecessary conversions:

Device (f32) -> miniaudio -> Abaddon (s16) -> RNNoise (f32) -> Abaddon (s16) -> Opus (f32)

So we might just use f32 and opus_encode_float and there will be no conversions:

Device (f32) -> miniaudio -> Abaddon (f32) -> RNNoise (f32) -> Abaddon (f32) -> Opus (f32)

@ouwou
Copy link
Member

ouwou commented Mar 18, 2024

ok sure lets do f32 then. bonus points for it also being easier to reason with since its easy -1 to 1

@ouwou
Copy link
Member

ouwou commented Mar 31, 2024

forgot to mention but i rewrote the buffer stuff just a bit to add a jitter buffer since some ppl were complaining about sound so lemme know if u have questions on how that works. admittedly its not the best implementation

@ouwou
Copy link
Member

ouwou commented Apr 1, 2024

just kidding im gonna revert that tomorrow cuz it looks like its causing more problems than it solves

@ryze312
Copy link
Contributor Author

ryze312 commented Apr 6, 2024

just kidding im gonna revert that tomorrow cuz it looks like its causing more problems than it solves

Oh okay, I'll see if I can implement it anyway. It looks like the original code uses std::deque which is potentially risky in case of overruns/other kinds of blocks, we might want to implement a proper ring buffer

@ouwou
Copy link
Member

ouwou commented Apr 7, 2024

you do get underruns but it just results in a little crackle. i encounter it sometimes but its not too bad but i guess its worse if you have bad internet. i think something like libspeexdsp's jitter buffer would be ideal but i dont want to bring it in as a dependency tbh. every jitter buffer i found seems to operate on RTP payloads too so i guess we should also (which my attempt didnt do lol)

@ryze312
Copy link
Contributor Author

ryze312 commented Apr 7, 2024

you do get underruns but it just results in a little crackle.

I meant that FeedMeOpus just pushes to std::deque without any limit to the buffer size. In case of a stall in audio callback, this might lead to delay and/or increase in memory consumption, unless it can catch up by taking frames faster than they get decoded.

Deque is also a dynamically sized meaning it can cause a reallocation, which is something is generally discouraged in real-time audio processing.

I think we should implement a proper ring buffer with fixed size, so that Opus decoder stops pushing samples if the audio thread can't keep up.

@ouwou
Copy link
Member

ouwou commented Apr 8, 2024

might lead to delay and/or increase in memory consumption

yeah true. i thought about that but chose to worry about it later 😼. miniaudio has a ring buffer for audio too https://miniaud.io/docs/manual/index.html#RingBuffers
i just wonder whatll happen if it gets congested. the problem with the jitter buffer i wrote was if it filled up too much and started trying to discard data from the start then it would kind of get stuck doing that and the end result is choppy ugly audio. so i wonder if something like that would happen with a ring buffer if it loops back to the front? im sure theres a solution to that but idk it

@ryze312
Copy link
Contributor Author

ryze312 commented Apr 9, 2024

miniaudio has a ring buffer for audio too

Haven't seen that, should be easy to integrate

so i wonder if something like that would happen with a ring buffer if it loops back to the front

I don't think this would happen with ring buffer, since the writer (decoder) is going to discard new frames instead of the old ones in case the reader (audio callback) cannot keep up. The reader can then detect that the buffer is full and and apply something like interpolation to catch up with the writer. This might distort the audio (higher pitch), but it won't end up choppy.

@ouwou
Copy link
Member

ouwou commented Apr 10, 2024

maybe. might be worth looking at how something like mumble does it. i know they use libspeexdsp for jitter so we can see how that interacts

@ryze312
Copy link
Contributor Author

ryze312 commented Jun 4, 2024

I think this should be all

@ryze312 ryze312 marked this pull request as ready for review June 4, 2024 18:07
@ouwou
Copy link
Member

ouwou commented Jun 10, 2024

still need to fix the build when compiling without audio support so a bunch of stuff probably needs to be wrapped in #ifdef WITH_MINIAUDIO
ill give everything another test soon ™

@ryze312
Copy link
Contributor Author

ryze312 commented Jun 11, 2024

a bunch of stuff probably needs to be wrapped in #ifdef WITH_MINIAUDIO

I think the right approach would be not including the sources in CMake at all, not only does it increase the compilation time, but also having to include the guard in each file seems tedious.

Just did that, seems to compile and work fine.

@ryze312
Copy link
Contributor Author

ryze312 commented Jun 11, 2024

Uh nevermind

@ryze312
Copy link
Contributor Author

ryze312 commented Jun 11, 2024

The build seems to be failing now because of LTO. Might be this issue?

@ouwou
Copy link
Member

ouwou commented Jun 11, 2024

maybe this? diasurgical/devilutionX@8693a0f

@ryze312
Copy link
Contributor Author

ryze312 commented Jun 12, 2024

Sure, I'll see if disabling decloning for those specific files works, if not, we can try disabling it for the entire executable.

@ryze312 ryze312 force-pushed the audio_refactor branch 5 times, most recently from 8c9aa44 to e323410 Compare June 12, 2024 16:17
@ryze312
Copy link
Contributor Author

ryze312 commented Jun 12, 2024

I have no idea what's breaking it now, I'll just disable LTO for Windows

@ouwou
Copy link
Member

ouwou commented Jun 13, 2024

theres a good bit of missing headers. it builds without them thanks to precompiled headers meaning everything ends up transiently included but they should be added anyways (it breaks my intellisense too). you can disable precompiled headers in cmake and it should give you errors where stuff is missing. adding headers wont hurt compile time (at least in my testing) if they are included in the precompiled headers

@ouwou
Copy link
Member

ouwou commented Jun 25, 2024

ok giving it some more testing

  1. suppress noise seems to only denoise the right channel
  2. StripRTPExtensionHeader is broken. obviously this isnt ur fault but if i fix it in master itll cause a conflict here so it should probably be fixed here. i think i can add a pretty easy fix so ill comment a diff soon-ish
  3. i get a segfault eventually at mutex.hpp:93 🙀 0xbaadf00dbaadf00d suggests the memory is getting freed somewhere.
#0  0x00007ffbf8172f58 in ?? () from C:\msys64\mingw64\bin\libwinpthread-1.dll
#1  0x00007ff79f6b1f1e in __gthread_mutex_lock (__mutex=0xbaadf00dbaadf00d) at C:/msys64/mingw64/include/c++/14.1.0/x86_64-w64-mingw32/bits/gthr-default.h:762
#2  std::mutex::lock (this=0xbaadf00dbaadf00d) at C:/msys64/mingw64/include/c++/14.1.0/bits/std_mutex.h:113
#3  std::unique_lock<std::mutex>::lock (this=<synthetic pointer>) at C:/msys64/mingw64/include/c++/14.1.0/bits/unique_lock.h:147
#4  std::unique_lock<std::mutex>::unique_lock (this=<synthetic pointer>, __m=...) at C:/msys64/mingw64/include/c++/14.1.0/bits/unique_lock.h:73
#5  Mutex<AbaddonClient::Audio::Voice::Opus::OpusDecoder>::Lock (this=0xbaadf00dbaadf00d) at C:/path/src/misc/mutex.hpp:93
#6  AbaddonClient::Audio::Voice::Playback::DecodePool::OnDecodeMessage (message=...) at C:/path/src/audio/voice/playback/decode_pool.cpp:54
#7  0x00007ff79f6b20b1 in AbaddonClient::Audio::Voice::Playback::DecodePool::DecodeThread (channel=...) at C:/path/src/audio/voice/playback/decode_pool.cpp:44
#8  0x00007ffb72fc578f in ?? () from C:\msys64\mingw64\bin\libstdc++-6.dll
#9  0x00007ffbf8174dbb in ?? () from C:\msys64\mingw64\bin\libwinpthread-1.dll
#10 0x00007ffc098eaf5a in msvcrt!_beginthreadex () from C:\Windows\System32\msvcrt.dll
#11 0x00007ffc098eb02c in msvcrt!_endthreadex () from C:\Windows\System32\msvcrt.dll
#12 0x00007ffc09f07344 in KERNEL32!BaseThreadInitThunk () from C:\Windows\System32\kernel32.dll
#13 0x00007ffc0b8bcc91 in ntdll!RtlUserThreadStart () from C:\Windows\SYSTEM32\ntdll.dll
#14 0x0000000000000000 in ?? ()

@ouwou
Copy link
Member

ouwou commented Jul 5, 2024

e6191d9 should address point 2. shouldnt be too hard to port over here

@ryze312
Copy link
Contributor Author

ryze312 commented Jul 11, 2024

suppress noise seems to only denoise the right channel

Seems to happen if you use RNNoise as VAD and then switch it to gate.

@ryze312
Copy link
Contributor Author

ryze312 commented Jul 11, 2024

i get a segfault eventually at mutex.hpp:93

Not sure what could be wrong here, shared_ptr should prevent Mutex from being freed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants