Reimplement fifo as a proper lock free SCSP ring buffer #13877

m0dB · 2024-11-10T19:56:15Z

Reimplement fifo as a proper lock free SCSP ring buffer with atomics for thread-safety instead of using PaUtilRingBuffer as a backend.

Added unit tests.

One might consider that the tsan warnings for PaUtilRingBuffer or not "serious" but they sure are ugly and spam my tsan logs.

Fixes #13864 #13863 #13866 #13868

Swiftb0y · 2024-11-10T19:58:58Z

There are loads of ringbuffers in mixxx already and we already vendor https://github.com/rigtorp/SPSCQueue which has been proven to be correct.

Swiftb0y · 2024-11-10T19:59:55Z

Sorry for not making mention of this earlier. I leave the decision to you whether you want to keep this implementation or use rigtorps implementation.

m0dB · 2024-11-10T20:00:58Z

Oh, it is unfortunate I didn't know that. Also unfortunate that FIFO wasn't using that. Can we bring all these ring buffers back to a single implementations?

m0dB · 2024-11-10T20:04:48Z

rigtorp/SPSCQueue.h is not comparable though. It uses queue semantics, to push and pop single items. The ringbuffer I implemented (and what PaUtilRingBuffer does) is to read / write buffers (typically of samples), either by copy or by accessing regions in the ringbuffer directly.

Swiftb0y · 2024-11-10T20:42:44Z

Thanks. I'll try to give this a review. Though I unfortunately don't have the most experience with multithreaded programming.

Swiftb0y

A couple surface-level comments. I'll follow up with a proper audit of the thread safety aspects later.

src/test/fifotest.cpp

Swiftb0y · 2024-11-10T20:47:29Z

src/test/fifotest.cpp

+    std::vector<float> data(1024);
+    FIFO<float> fifo(1024);


a (potentially Value-Parameterized) test that tests a couple more (non power-of-two) queue sizes would make sense, wdyt?

Yes, also the offset near the numeric limit of the indices should be parametrized

Note though that internally the queue sizes are always rounded up to the next power of two. Still, good to add parametrized tests.

right, but if the size and the rounded up size are always the same and the implementation mistakenly uses the wrong values, the edge case is not caught. So even though it may not matter with the current implementation we should add a unit test to ensure this mistake is not made accidentally in the future.

Sure, sure, and once parametrized, we can add whatever size.

done. please check and resolve.

Swiftb0y · 2024-11-10T20:49:14Z

src/util/fifo.h

+        std::memcpy(pData, m_data.data() + (readIndex & m_mask), n * sizeof(DataType));
+        std::memcpy(pData + n, m_data.data(), (count - n) * sizeof(DataType));


Pretty sure this requires https://en.cppreference.com/w/cpp/types/is_trivially_copyable, right?

right, so then we should either consider adding a static_assert for that or use the appropriate more generic algorithm (std::copy?).

Yes, I’ll change it to std::copy. Old habits die hard and I often notice that you are not plagued by the baggage of over 30 years of hardcore C and primitive C++ 😅

std::copy is slow:

mixxx/src/util/sample.h

Line 62 in 2397016

// Benchmark results on 32 bit SSE2 Atom Cpu (Linux)

We want here a replacement for the original implementation, that only works with trivial types. For C++ objects, we have already an alternative.

fyi, A quick and dirty quickbench reveals that there is zero difference (apart from the zero-sized case, in which case std::copy wins).

Nice. Thank you.

Yes, I agree with Niko, my tests show the same. I find the for loop in mixxx/src/util/sample.h more dubious!

Yeah, that probably worked around a missed inefficiency on 32-bit platforms, but I think that edge case is negligible nowadays. Who runs 32-bit binaries let alone 32-bit operating systems anymore (if 64-bit binaries are available)?

using std::copy, please check and resolve.

Swiftb0y · 2024-11-10T20:51:43Z

src/util/fifo.h


 #include "util/class.h"
 #include "util/math.h"

+using ring_buffer_size_t = uint32_t;


can you share some insight why not just std::size_t?

Also works for me. Or uint_fast32_t. I gladly adapt to what you think fits the mixxx codebase best.

I think for indexing into std::vector it always converts to a std::size_t anyways, so if we want to avoid that one sign extension instruction we should use whats most compatible with the underlying container (and avoid exposing it if possible).

Sure. I’ll change it to size_t

using std::size_t, please check and resolve.

Swiftb0y · 2024-11-10T20:54:35Z

src/util/fifo.h

    std::vector<DataType> m_data;
-    PaUtilRingBuffer m_ringBuffer;
+    std::atomic<uint32_t> m_writeIndex;
+    std::atomic<uint32_t> m_readIndex;
    DISALLOW_COPY_AND_ASSIGN(FIFO);


can you implement move semantics and adhere to the rule of 5?

I’m just following the original FIFO here, apparently this was sufficient for the current use.

Haven't looked at the diff, but I have a suspicion that the original implementation was written before move semantics were a thing (C++11), so we might as well go the extra step. We then may also look for unnecessarily heap-allocated FIFOs that workaround the lack of move semantics and inline those (the eliminated pointer dereference may result in a nice runtime speedup).

Ok, but let’s do that in a follow up PR. The goal of these PRs is to get rid of all the tsan warnings.

The heap allocation happens twice. One time by the std::vector and probably a second time for the control structure.
We may consider to use a std::array() based version where the size becomes a template parameter.
This way we get also around of the unneccessary default initalization of the vector.

If its feasible to shift the size calculation to compile-time, I'm all for it.

I haven't looked at the actual use of this, so I don't know if using size as a template argument is even an option. Anyway, let's keep additional improvements beyond fixing the tsan issue for later PRs.

daschuer · 2024-11-11T07:16:21Z

src/util/fifo.h

+    uint32_t readAvailable() const {
+        const uint32_t readIndex = m_readIndex.load(std::memory_order_relaxed);
+        const uint32_t writeIndex = m_writeIndex.load(std::memory_order_acquire);
+        return writeIndex - readIndex;


Can you add a comment about the case if a read or write happens between the wo atomic assesses?

Cam you please also explain the use of the memory barriers? Is it correct?

Yes, I can write some comments about the atomics and the memory barriers.

comments added. please review and resolve.

I did not see comment about the memory barriers and what happens if read and write happens in between reading the two atomics. Is it not yet pushed?

It is the last comment, after the declaration of the two atomics.

src/util/fifo.h

uklotzde · 2024-11-11T15:33:23Z

Without inspecting the code: Implementing lock-free data structures on your own is most often not a good idea. Not only in terms of correctness. It is also a maintenance burden.

I have made this mistake myself. It is never too late to change your mind.

Just a friendly advice.

Swiftb0y · 2024-11-11T16:23:41Z

Agreed. This is probably a good time to post CppCon 2014: Herb Sutter "Lock-Free Programming (or, Juggling Razor Blades)

uklotzde · 2024-11-11T17:26:06Z

This implementation supports bulk operations and should cover most use cases: https://github.com/cameron314/concurrentqueue

With a single producer/consumer it should behave like an ordinary SPSC queue.

Instead if vendoring a second library it might be worth migrating the rigtorp/SPSCQueue code.

daschuer · 2024-11-11T17:40:39Z

In that case let's stick with the established Portaudio version. It is exactly designed for our use case of fast bulk transfer of samples and is just missing sanitizer animations.

m0dB · 2024-11-11T18:08:10Z

In that case let's stick with the established Portaudio version. It is exactly designed for our use case of fast bulk transfer of samples and is just missing sanitizer animations.

No, please! Thread sanitizer flags it up! Sure, it will work, but it is not correct. Also, seeing that Portaudio has not been thread sanitised, I would not be surprised if the hanger I experienced comes from Portaudio itself. But I consider code that does not execute cleanly with thread sanitizer broken!

I know the pitfalls of lock-free programming very well. I am confident this implementation is correct. I am not against using other options, but I needed something that would easily adapt to the FIFO class API. None of the mentioned options fit this purpose. They are about reading / writing single items, not about reading / writing blocks of memory and accessing regions in the ringbuffer.

If you consider that there is some opensource and established solution that adapts well to the FIFO class API, be my guest. But in the meantime, please don't block this PR until there is a better option.

m0dB · 2024-11-11T20:07:33Z

I am going to use revert the FIFO API to use int. I am having to touch way to many files to fixes the windows compilation.

…as a backend

daschuer · 2024-11-11T22:51:01Z

With force push, we need to review the whole PR over and over again. It is better to add commit that we can track the changes. If you like to have finally only one commit, feel free to squash everything before merge into the target branch.
See also: https://github.com/mixxxdj/mixxx/blob/main/CONTRIBUTING.md

m0dB · 2024-11-12T11:10:31Z

With force push, we need to review the whole PR over and over again. It is better to add commit that we can track the changes. If you like to have finally only one commit, feel free to squash everything before merge into the target branch. See also: https://github.com/mixxxdj/mixxx/blob/main/CONTRIBUTING.md

Yeah, sorry about that. I had all these casts added allover the code to make windows accept size_t instead of int as the return type, and I checkout just have checked out those files from 2.5 instead of rebase.

Anyway, the only things to review are fifo.h itself and the fifotest.cpp, nothing else has changed now (in other words, it really is a drop-in replacement).

daschuer · 2024-11-12T13:22:36Z

Thank you for the coment about the atomics. This puts a certein requirement on readAvailable() and writeAvailable() can you document that at the function itself? Both read two atomics. Can you also described how it is deal with the situation a read and write happens betwwen the both accesses?

I am a bit concerned, bcause the original implementation PaUtil_FullMemoryBarrier() diffrently.
Every memory barierre flushes the prefetched CPU cache which has perfomace implications. On the other hand a missing memory barriere will introduce broken functionallity.

Is PaUtil_RingBuffer() broken because of this? I am in doubt, because it is established code we use without any issues for years.
The TSAN finding are IMHO only false positive bacaus of the missing instrmentation with __tsan_acquireand__tsan_release`.

On the other hand we may introduce here extra barieres that may slow down the queue unnecessarily.
https://github.com/PortAudio/portaudio/commits/57aa393109ec996799d3a5846c9ecb0a65b64644/src/common/pa_ringbuffer.c .. has some commits regarding barrieres.

However if we are convinced it is broken actually broken, we will suffer the same issue when using it via portaudio and we should contribut a fix back. Thats the reason why need to full undertand the whole topic.

Swiftb0y · 2024-11-12T14:15:03Z

The TSAN finding are IMHO only false positive bacaus of the missing instrmentation with __tsan_acquire and __tsan_release.

I think the fact that these are missing is by itself an argument against using it since that indicates IMO that the code is not well maintained (also evident from the commit history). Moreover, does it also make sense to make ourselves less dependent on PortAudio if we ever want to switch to another library that does care about features such as hotplug.

However if we are convinced it is broken actually broken, we will suffer the same issue when using it via portaudio and we should contribut a fix back. Thats the reason why need to full undertand the whole topic.

This is a C++ implementation though and the PA code is pure C, a port wouldn't be trivial if even feasible.

daschuer · 2024-11-12T15:38:27Z

This is a C++ implementation though and the PA code is pure C, a port wouldn't be trivial if even feasible.

I am not talking abaout contributing this new C++ Fifo back. If we are certein that the Portaudio FIFO has an issue, we need to point it out and propose a fix in the C domain.

m0dB · 2024-11-12T17:41:02Z

For me the fact that thread sanitizer flags this is up is sufficient. I treat this the same as compiler warnings. One might think that a warning is harmless, and if so, even use pragmas to disable a warning. I much prefer to treat all compiler warnings as errors, and actually fix them.

I don't feel like going down the rabbit hole of investigating the PortAudio source code. I am sure this implementation is correct and it allows me to continue my thread sanitizer investigation. If you don't want to use it and prefer to stick with the original PortAudio ringbuffer, I would find that disappointing. But since it is a drop-in replacement, I will use it during my thread sanitizer investigation and that's it. And I will use it for my local builds.

As for efficiency, there is one thing missing: the atomics should be aligned with the cache size, as documented here:
https://en.cppreference.com/w/cpp/thread/hardware_destructive_interference_size , but I didn't want to add more noise to this already noisy conversation.

daschuer · 2024-11-13T00:15:38Z

It looks like we have mixed up things in the discussion. Let's clarify this:

We have discussed if the this is the right way forward to implement and probably maintain our own SCSP as proposed in this PR. I have no opinion here, but I agree that it is not an easy task. Therefor we need be extra cautiousness to make it right. The unit test are great, but they are final running on a specific metal, not on the C++ programming model that covers all our user targets.

Regarding this topic I have some concerns whether that the memory barriers are working but not unnecessarily introduce a performance penalty. Can you please describe it as source code comment?

From your posts, I got the impression that you consider the port audio implementation broken. If this is the case we have a way bigger issue, because we are rely on Portaudio a lot. Therefore we need to describe the issue to Portaudio upstream that the issue can be fixed. We can do this in a separate bug report. This also helps in 1. to show that the barriers are just right and not as wrong as in PA.

I treat this the same as compiler warnings. One might think that a
warning is harmless, and if so, even use pragmas to disable a warning. I much prefer to treat all compiler warnings as errors, and actually fix them.

I am exactly the same opinion.

I don't feel like going down the rabbit hole of investigating the PortAudio source code.

That's OK for me, but the one who is stepping in needs your probably your help. Is that OK?

I have the following ideas:

Add instrumentation to the PA cue like described her to check if it is really broken: https://github.com/google/sanitizers/wiki/ThreadSanitizerAtomicOperations
Put the new and the PA implementation into the benchmark used here: https://github.com/cameron314/concurrentqueue for my understanding both are equal fast and also "Knock-your-socks-off"

I think 1. is easy. Just uses these defines:

#define PaUtil_FullMemoryBarrier()  __tsan_atomic_thread_fence(memory_order_seq_cst)
#define PaUtil_ReadMemoryBarrier()  __tsan_atomic_thread_fence(memory_order_acquire)
#define PaUtil_WriteMemoryBarrier() __tsan_atomic_thread_fence(memory_order_release)

Can you confirm this? Do you have a CI run or something we can use for this test?

Can be complex, not sure if we can drop our implementation into that test or if we need something else. Do you have ideas?

m0dB · 2024-11-13T23:19:45Z

I added some multithread R/W tests. This is running without thread sanitizer. The variations are, for both my implementation and the PA ring buffer:

RW : reading and writing as fast as possible (as soon as space / data is available)
RW_Wait: reading and writing with a fixed buffer size (waiting until buffer size is available)
RegionRW: like RW, but instead of copying, access the ring buffer memory directly
RegionRW_Wait: like RW_Wait, idem

The test duration serves as a simple benchmark. As you can see, the differences are negligible.

[----------] 8 tests from FifoTest
[ RUN ] FifoTest.MultiThreadRW
[ OK ] FifoTest.MultiThreadRW (545 ms)
[ RUN ] FifoTest.MultiThreadRW_PA
[ OK ] FifoTest.MultiThreadRW_PA (597 ms)
[ RUN ] FifoTest.MultiThreadRW_Wait
[ OK ] FifoTest.MultiThreadRW_Wait (530 ms)
[ RUN ] FifoTest.MultiThreadRW_PA_Wait
[ OK ] FifoTest.MultiThreadRW_PA_Wait (527 ms)
[ RUN ] FifoTest.MultiThreadRegionRW
[ OK ] FifoTest.MultiThreadRegionRW (527 ms)
[ RUN ] FifoTest.MultiThreadRegionRW_PA
[ OK ] FifoTest.MultiThreadRegionRW_PA (549 ms)
[ RUN ] FifoTest.MultiThreadRegionRW_Wait
[ OK ] FifoTest.MultiThreadRegionRW_Wait (513 ms)
[ RUN ] FifoTest.MultiThreadRegionRW_PA_Wait
[ OK ] FifoTest.MultiThreadRegionRW_PA_Wait (540 ms)
[----------] 8 tests from FifoTest (4333 ms total)

As there is no winner performance wise, I see the following ways forwards:

Stick with the PA ringbuffer based FIFO and
a. Either patch it so tsan doesn't trip over it (adding tsan annotations)
b. Or I use my implementation when I am running tsan (I could put if behind an ifdef which is set by cmake when building for tsan).
Use my implementation of FIFO anyway.

The advantage of 2 is that we don't depend on PA in case we ever want to migrate to something else (I might consider using CoreAudio directly on macOS, if I would address the hot plug issues) and that we don't have to go through the hassle of patching PA.

The advantage of 1 is that the PA ringbuffer has seen more milage than my implementation, but I am fully confident it is correct (and the tests and tsan show this).

I don't have a strong opinion, so unless we all agree on 2, I think 1.b is the easiest way forward, and we keep my implementation lying around just in case we need in the future. And at least we have the FIFO covered with some unit tests!

But if you feel like doing 1.a, adding tsan annotations to PA audio and provide me with a patch, I am happy to test that.

src/util/fifo.h

daschuer · 2024-11-13T00:26:25Z

src/util/fifo.h

+    // ringbuffer, the remainder is read from the start.
+    int read(DataType* pData, size_type count) {
+        size_type readIndex = m_readIndex.load(std::memory_order_relaxed);
+        const size_type writeIndex = m_writeIndex.load(std::memory_order_acquire);


Here as well, I think m_readIndex.load() can be moved after m_writeIndex.load().

No issue, same as above.

m0dB · 2024-11-14T09:09:22Z

From your posts, I got the impression that you consider the port audio implementation broken. I

I consider it broken in the sense that it hasn't been updated to run correctly with thread sanitizer.

daschuer · 2024-11-14T14:21:38Z

Did you consider to add a single option to our CMakeList.txt that adds all the required options for tsan? That would make future work with it more easy?

daschuer · 2024-11-14T14:29:58Z

src/util/fifo.h

+    // Returns the space in the ringbuffer available for write
+    int writeAvailable() const {
+        const size_type readIndex = m_readIndex.load(std::memory_order_acquire);
+        const size_type writeIndex = m_writeIndex.load(std::memory_order_relaxed);


Following the logic below should we swap both loads?

No, it doesn't matter. The important thing here is that the m_readIndex is loaded with std::memory_order_acquire, as it is changed in the other thread, and we want to make sure we see changed value here. m_writeIndex will not be modified in any other thread than the current thread, so it's correct value will always be guaranteed in this thread. Swapping the two loads would make no difference.

daschuer · 2024-11-14T14:31:07Z

src/util/fifo.h

+            DataType** dataPtr2,
+            ring_buffer_size_t* sizePtr2) {
+        const size_type readIndex = m_readIndex.load(std::memory_order_acquire);
+        size_type writeIndex = m_writeIndex.load(std::memory_order_relaxed);


The same here, swap statements?

daschuer · 2024-11-14T14:33:12Z

src/util/fifo.h

+    }
+    // Advance the read index with count values, or maximum until the write index.
+    // Returns the new read index (wrapped inside the buffer size)
+    int flushReadData(size_type count) {


Is this function allowed from reader, writer or both? Do we need a second one for the other counterpart?

Only allowed from reader, but apparently it is only used once (in ./engine/sidechain/shoutconnection.cpp). We do not need a second one for the counter part and in fact we might as well remove this one, and simply call releaseReadRegions there, which would amount to the same.

And releaseReadRegions and releaseWriteRegions would be more aptly named advanceReadIndex, advanceWriteIndex. But all of that is IMO beyond the PR.

Yes, so documenting this as a source comment is sufficient.

src/util/fifo.h

Swiftb0y · 2024-11-14T15:17:11Z

src/util/fifo.h

+    // ringbuffer, the remainder is read from the start.
+    int read(DataType* pData, size_type count) {


Can we add overloads with that take std::spans instead that we use in new code instead? We can keep the ptr+size ones since this is a drop-in replacement if we document that the span-based one is preferred (assuming you agree with that). Ideally the implementation would also live in the std::span variation instead, but I can understand if you don't want to do that refactoring. I can offer to do that refactoring instead in exchange if you review that refactored code.

Sure, but that is a refactoring of this class, independent of the underlying implementation, be it PA ring buffer, or my implementation.

yeah, you're right. Lets concentrate on getting the lock-free stuff right first.

m0dB · 2024-11-14T19:05:45Z

Note: As mentioned with respect to the memory ordering: the read functions should only be called in the consumer thread, and the write functions only in the producer thread.

But there is indeed nothing in the API that enforces this. But likewise, there is also nothing that protects the FIFO from being used by more than a single consumer and single producer. I can think of mechanisms to enforce this (at a cost), but I think it should be enough to simply document it. TSAN will detect such abuse anyway :-)

daschuer · 2024-11-15T13:03:23Z

Here is the pa_ringbuffer with tsan functions:
https://github.com/daschuer/mixxx/tree/pa_tsan
Would be nice if you can give this a try.

daschuer · 2024-11-15T13:15:27Z

I think in terms of functionality this is ready to go. I have two editorial complains:

Comments about not obvious restrictions of functions form the caller thread
Symmetric use of atomic load operations.

After our discussion everything is obvious. My goal is that new readers shall not stumble the same way.

m0dB · 2024-11-15T15:11:06Z

Here is the pa_ringbuffer with tsan functions: https://github.com/daschuer/mixxx/tree/pa_tsan Would be nice if you can give this a try.

Thanks, I will when I have a moment!

m0dB · 2024-11-17T16:36:43Z

I added -g flag

target_compile_options(PortAudioRingBuffer PUBLIC -fsanitize=thread -g)

but I didn't get debug symbols, so I guess the ringbuffer from libportaudio itself is used? Anyway, I replaced PaUtil with PaMixxx and the local files and now I do get line info. And still tsan warnings, despite the _tsan.... barriers.

Details

WARNING: ThreadSanitizer: data race (pid=26154)
  Read of size 4 at 0x00016b70b02c by thread T7:
    #0 PaMixxx_GetRingBufferReadAvailable pa_ringbuffer.c:82 (mixxx-test:arm64+0x10192ac14)
    #1 PaMixxx_GetRingBufferReadRegions pa_ringbuffer.c:159 (mixxx-test:arm64+0x10192b114)
    #2 PaMixxx_ReadRingBuffer pa_ringbuffer.c:224 (mixxx-test:arm64+0x10192b72c)
    #3 PA::FIFO<int>::read(int*, int) fifo.h:34 (mixxx-test:arm64+0x1003947f8)
    #4 MultiThreadRW<PA::FIFO<int>>::read() fifotest.cpp:249 (mixxx-test:arm64+0x1003936b0)

  Previous write of size 4 at 0x00016b70b02c by thread T6:
    #0 PaMixxx_AdvanceRingBufferWriteIndex pa_ringbuffer.c:145 (mixxx-test:arm64+0x10192b084)
    #1 PaMixxx_WriteRingBuffer pa_ringbuffer.c:214 (mixxx-test:arm64+0x10192b6a8)
    #2 PA::FIFO<int>::write(int const*, int) fifo.h:37 (mixxx-test:arm64+0x100393860)
    #3 MultiThreadRW<PA::FIFO<int>>::write() fifotest.cpp:235 (mixxx-test:arm64+0x100393464)

WARNING: ThreadSanitizer: data race (pid=26154)
  Write of size 4 at 0x00016b70b030 by thread T7:
    #0 PaMixxx_AdvanceRingBufferReadIndex pa_ringbuffer.c:193 (mixxx-test:arm64+0x10192b444)
    #1 PaMixxx_ReadRingBuffer pa_ringbuffer.c:235 (mixxx-test:arm64+0x10192b8d0)
    #2 PA::FIFO<int>::read(int*, int) fifo.h:34 (mixxx-test:arm64+0x1003947f8)
    #3 MultiThreadRW<PA::FIFO<int>>::read() fifotest.cpp:249 (mixxx-test:arm64+0x1003936b0)

  Previous read of size 4 at 0x00016b70b030 by thread T6:
    #0 PaMixxx_GetRingBufferReadAvailable pa_ringbuffer.c:82 (mixxx-test:arm64+0x10192ac30)
    #1 PaMixxx_GetRingBufferWriteAvailable pa_ringbuffer.c:88 (mixxx-test:arm64+0x10192acc4)
    #2 PA::FIFO<int>::writeAvailable() const fifo.h:31 (mixxx-test:arm64+0x1003937e0)
    #3 MultiThreadRW<PA::FIFO<int>>::write() fifotest.cpp:231 (mixxx-test:arm64+0x100393378)

daschuer · 2024-11-19T23:15:36Z

src/util/fifo.h

+        writeIndex = writeIndex & m_mask;
+        const size_type n = std::min(m_size - writeIndex, count);
+        std::copy(pData, pData + n, m_data.data() + writeIndex);
+        std::copy(pData + n, pData + count, m_data.data());


This saves some µs

Suggested change

std::copy(pData + n, pData + count, m_data.data());

if ((count - n) > 0) {

std::copy(pData + n, pData + count, m_data.data());

}

Testes with Intel Core Ultra 5 125U Before debug [Main] Stat("read 1","count=2714,sum=594029ns,average=218.876ns,min=29ns,max=6259ns,variance=61965ns^2,stddev=248.928ns") debug [Main] Stat("read 65536","count=68,sum=1.38058e+06ns,average=20302.6ns,min=61ns,max=187712ns,variance=7.63838e+08ns^2,stddev=27637.6ns") debug [Main] Stat("write 1","count=187,sum=55599ns,average=297.321ns,min=40ns,max=866ns,variance=12077.8ns^2,stddev=109.899ns") debug [Main] Stat("write 2048","count=876,sum=1.54796e+06ns,average=1767.07ns,min=712ns,max=14696ns,variance=393266ns^2,stddev=627.109ns") After debug [Main] Stat("read 1","count=2726,sum=570707ns,average=209.357ns,min=28ns,max=3766ns,variance=41299.8ns^2,stddev=203.224ns") debug [Main] Stat("read 65536","count=78,sum=1.51157e+06ns,average=19379.1ns,min=73ns,max=184084ns,variance=7.06956e+08ns^2,stddev=26588.6ns") debug [Main] Stat("write 1","count=183,sum=44592ns,average=243.672ns,min=70ns,max=619ns,variance=10338.9ns^2,stddev=101.681ns") debug [Main] Stat("write 2048","count=994,sum=1.70078e+06ns,average=1711.05ns,min=625ns,max=15022ns,variance=464792ns^2,stddev=681.757ns") ``

daschuer · 2024-11-19T23:17:58Z

src/util/fifo.h

+        readIndex = readIndex & m_mask;
+        const size_type n = std::min(m_size - readIndex, count);
+        std::copy(m_data.data() + readIndex, m_data.data() + readIndex + n, pData);
+        std::copy(m_data.data(), m_data.data() + count - n, pData + n);


Suggested change

std::copy(m_data.data(), m_data.data() + count - n, pData + n);

if ((count - n) > 0) {

std::copy(m_data.data(), m_data.data() + count - n, pData + n);

}

github-actions bot added build code quality labels Nov 10, 2024

Swiftb0y reviewed Nov 10, 2024

View reviewed changes

daschuer reviewed Nov 11, 2024

View reviewed changes

daschuer requested changes Nov 11, 2024

View reviewed changes

src/util/fifo.h Show resolved Hide resolved

github-actions bot added engine soundio labels Nov 11, 2024

m0dB force-pushed the tsan-fix-fifo branch from da7919d to e9e21a2 Compare November 11, 2024 20:33

implement fifo with proper atomics instead of using PaUtilRingBuffer …

4270579

…as a backend

m0dB force-pushed the tsan-fix-fifo branch from e9e21a2 to 4270579 Compare November 11, 2024 20:34

missing cast for return value

dd7e160

m0dB added 2 commits November 12, 2024 19:14

remove static casts in test

432326a

adapt test to using signed instead of unsigned

b4b0153

fifo multithread test

3059a3b

use ring_buffer_size_t

ee45bd2

daschuer requested changes Nov 14, 2024

View reviewed changes

daschuer reviewed Nov 14, 2024

View reviewed changes

Swiftb0y reviewed Nov 14, 2024

View reviewed changes

daschuer requested changes Nov 19, 2024

View reviewed changes

		std::memcpy(pData, m_data.data() + (readIndex & m_mask), n * sizeof(DataType));
		std::memcpy(pData + n, m_data.data(), (count - n) * sizeof(DataType));

		// ringbuffer, the remainder is read from the start.
		int read(DataType* pData, size_type count) {

Reimplement fifo as a proper lock free SCSP ring buffer #13877

Are you sure you want to change the base?

Reimplement fifo as a proper lock free SCSP ring buffer #13877

Conversation

m0dB commented Nov 10, 2024 • edited Loading

Swiftb0y commented Nov 10, 2024

Swiftb0y commented Nov 10, 2024

m0dB commented Nov 10, 2024

m0dB commented Nov 10, 2024 • edited Loading

Swiftb0y commented Nov 10, 2024

Swiftb0y left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uklotzde commented Nov 11, 2024

Swiftb0y commented Nov 11, 2024

uklotzde commented Nov 11, 2024

daschuer commented Nov 11, 2024

m0dB commented Nov 11, 2024 • edited Loading

m0dB commented Nov 11, 2024

daschuer commented Nov 11, 2024

m0dB commented Nov 12, 2024

daschuer commented Nov 12, 2024

Swiftb0y commented Nov 12, 2024

daschuer commented Nov 12, 2024 • edited Loading

m0dB commented Nov 12, 2024

daschuer commented Nov 13, 2024

m0dB commented Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m0dB commented Nov 14, 2024

daschuer commented Nov 14, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m0dB commented Nov 14, 2024

daschuer commented Nov 15, 2024

daschuer commented Nov 15, 2024

m0dB commented Nov 15, 2024

m0dB commented Nov 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

m0dB commented Nov 10, 2024 •

edited

Loading

m0dB commented Nov 10, 2024 •

edited

Loading

m0dB commented Nov 11, 2024 •

edited

Loading

daschuer commented Nov 12, 2024 •

edited

Loading

m0dB commented Nov 13, 2024 •

edited

Loading

m0dB commented Nov 17, 2024 •

edited

Loading