Send buffer changes #346

mtmk · 2024-01-22T17:10:55Z

This PR

Method	Iter	Mean	Error	StdDev	Allocated
PublishAsync	64	156.4 us	20.34 us	1.12 us	488 B
PublishAsync	512	299.8 us	146.19 us	8.01 us	520 B
PublishAsync	1024	482.9 us	218.91 us	12.00 us	488 B

Main

Method	Iter	Mean	Error	StdDev	Allocated
PublishAsync	64	165.9 us	39.41 us	2.16 us	488 B
PublishAsync	512	375.1 us	81.65 us	4.48 us	488 B
PublishAsync	1024	607.0 us	152.22 us	8.34 us	488 B

to11mtm

Left some first-pass comments/questions, will try to review further and pull down for a look.

to11mtm · 2024-01-23T00:47:08Z

src/NATS.Client.Core/Commands/CommandWriter.cs

-            return ConnectStateMachineAsync(true, connectOpts, cancellationToken);
-        }
-
+        Interlocked.Add(ref _counter.PendingMessages, 1);


Question: Is there a reason to use this over Interlocked.Increment?

copy/paste error. (Was there an argument about Add() being faster?🤷‍♂️)
Fixed.

(Was there an argument about Add() being faster?🤷‍♂️)

It might be?

As always devil is in the details, I can't comment for ARM but for x86/x64 I did start to look at the tables on Agner Fog. However,

It has been a minute since I have peeked at these tables

They do not list LOCK INC

It's entirely possible that the JIT decides to pick the best option regardless...

However digging did lead me to this which claims that INC doesn't update the carry flag compared to ADD. Whether that still makes a difference in modern arches, I can't say...

This does almost certainly qualify as a micro opt either way, .Increment()/.Decrement() feel more readable but am happy with either option.

to11mtm · 2024-01-23T00:50:02Z

src/NATS.Client.Core/Commands/CommandWriter.cs

        }
        finally
        {
            _semLock.Release();
+            Interlocked.Add(ref _counter.PendingMessages, -1);


Per above, also why not Interlocked.Decrement?

to11mtm · 2024-01-23T00:52:52Z

src/NATS.Client.Core/Commands/CommandWriter.cs

+            PipeWriter bw;
+            lock (_lock)
            {
-                EnqueueCommand(success);
+                if (_pipeWriter == null)
+                    ThrowOnDisconnected();
+                bw = _pipeWriter!;
            }
+
+            _protocolWriter.WriteConnect(bw, connectOpts!);
+            await bw.FlushAsync(cancellationToken).ConfigureAwait(false);


This feels curious enough that a comment may be helpful?

Main question on first glance, what does the lock + capture buy us over the Semaphore usage?

_pipeWriter is replaced on reconnect in Reset(). I wanted to avoid any possible stale reads. afaik in this case semaphore or channel 'async locks' ~~would~~ wouldn't provide any memory consistency guarantees.

to11mtm · 2024-01-23T00:59:19Z

src/NATS.Client.Core/Commands/CommandWriter.cs

    public ValueTask PublishAsync<T>(string subject, T? value, NatsHeaders? headers, string? replyTo, INatsSerialize<T> serializer, CancellationToken cancellationToken)
    {
-#pragma warning disable CA2016
-#pragma warning disable VSTHRD103
-        if (!_semLock.Wait(0))
-#pragma warning restore VSTHRD103
-#pragma warning restore CA2016
+        NatsPooledBufferWriter<byte>? headersBuffer = null;
+        if (headers != null)
        {
-            return PublishStateMachineAsync(false, subject, value, headers, replyTo, serializer, cancellationToken);
+            if (!_pool.TryRent(out headersBuffer))
+                headersBuffer = new NatsPooledBufferWriter<byte>();
+            _headerWriter.Write(headersBuffer, headers);
        }

-        if (_flushTask is { IsCompletedSuccessfully: false })
-        {
-            return PublishStateMachineAsync(true, subject, value, headers, replyTo, serializer, cancellationToken);
-        }
+        NatsPooledBufferWriter<byte> payloadBuffer;
+        if (!_pool.TryRent(out payloadBuffer!))
+            payloadBuffer = new NatsPooledBufferWriter<byte>();
+        if (value != null)
+            serializer.Serialize(payloadBuffer, value);
+
+        return PublishLockedAsync(subject, replyTo, payloadBuffer, headersBuffer, cancellationToken);
+    }


Interestingly, (not sure if it was a factor in this change) but it looks like this would move or less minimize the concerns I brought up in #318.

to11mtm · 2024-01-23T01:05:10Z

src/NATS.Client.Core/Commands/CommandWriter.cs

+            lock (_lock)
            {
-                EnqueueCommand(success);
+                if (_pipeWriter == null)
+                    ThrowOnDisconnected();
+                bw = _pipeWriter!;
            }


Given the re-use of this pattern, is there any potential value in us trying to consolidate shared logic of the lock/get (and possibly more) into a method call to lower maintenance concerns? (Maybe not at this stage but though it was worth asking.)

to11mtm · 2024-01-23T01:25:21Z

src/NATS.Client.Core/Commands/CommandWriter.cs

+    private async ValueTask PublishLockedAsync(string subject, string? replyTo,  NatsPooledBufferWriter<byte> payloadBuffer, NatsPooledBufferWriter<byte>? headersBuffer, CancellationToken cancellationToken)
+    {
+        // Interlocked.Add(ref _counter.PendingMessages, 1);
+        await _semLock.WaitAsync(cancellationToken).ConfigureAwait(false);


Speaking of, putting the WaitAsync is probably 'better' here overall (the old Wait(0) -> statemachine pattern has some thrashing concerns) but it does raise some interesting questions in my head again as to the driver.

(Below is a language-lawyer disclaimer and... well, hopefully we aren't worried about it, but if we are, wanted to bring it up.)
This is going to be more fair than the old method (b/c it's always WaitAsync, but it is worth noting that SemaphoreSlim like most locking primitives in .NET do not guarantee fairness as to which thread gets released first citation.

If we -do- need stronger ordering/fairness on these, we will need to consider an alternative... and also remember the caveats mentioned in the citation; they are certainly not 'absolutes' however they may require consideration in a solution (again, only if needed, this can be a rabbit hole.)

just swapped this out with a channel. WDYT? (@stebet also raised concerns about SemaphoreSlim before)

to11mtm · 2024-01-23T01:26:49Z

src/NATS.Client.Core/Commands/CommandWriter.cs

+        // 8520 should fit into 6 packets on 1500 MTU TLS connection or 1 packet on 9000 MTU TLS connection
+        // assuming 40 bytes TCP overhead + 40 bytes TLS overhead per packet
+        const int maxSendMemoryLength = 8520;
+        var sendMemory = new Memory<byte>(new byte[maxSendMemoryLength]);


Question: Does this get re-started between connections?
And if yes is there a way to possibly reuse?

it lives as long as the CommandWriter object (and in turn NatsConnection object). On reconnect pipeline is renewed (not pooled at the moment) but the read loop stays in place.

to11mtm · 2024-01-23T01:29:02Z

src/NATS.Client.Core/Commands/CommandWriter.cs

+        // The ArrayPool<T> class has a maximum threshold of 1024 * 1024 for the maximum length of
+        // pooled arrays, and once this is exceeded it will just allocate a new array every time
+        // of exactly the requested size. In that case, we manually round up the requested size to
+        // the nearest power of two, to ensure that repeated consecutive writes when the array in
+        // use is bigger than that threshold don't end up causing a resize every single time.
+        if (minimumSize > 1024 * 1024)


<3 the exposition here! Great breakdown of the why.

not me unfortunately 😅 originally taken from the community toolkit (as mentioned in NatsBufferWriter.cs class). I'll make sure to put an attribution comment later.

# Conflicts: # src/NATS.Client.Core/Commands/ProtocolWriter.cs # src/NATS.Client.Core/Internal/NatsPipeliningWriteProtocolProcessor.cs # tests/NATS.Client.TestUtilities/InMemoryTestLoggerFactory.cs

to11mtm

Left food for thought, primarily around the channel changes.

In summary:

As we are using the channel as a semaphore this way (i.e. using full channel to signal waiting as opposed to an empty channel to signal waiting,) it is better to use WriteAsync throughout if we want to have better ordering fairness and minimize thrashing on concurrent writes.

If we -do- need to 'prioritize' certain locks, we may want to consider how to do (same problem exists with a semaphore

to11mtm · 2024-01-23T23:11:53Z

src/NATS.Client.Core/Commands/CommandWriter.cs

+        while (!_channelLock.Writer.TryWrite(1))
+        {
+            await _channelLock.Writer.WaitToWriteAsync().ConfigureAwait(false);
+        }


Hmm. This inversion of channel usage does cause some challenges.

I think, -especially- in the dispose,

try { await _channelLock.Writer.WriteAsync(1).ConfigureAwait(false); } catch(ChannelClosedExeption _) { }

Would normally be a bit less 'thrashy' in the case of a failed write.

It may be worth considering such in other cases if possible as well, however I need to take a look at overall usage to provide an informed statement.

to11mtm · 2024-01-23T23:14:31Z

src/NATS.Client.Core/Commands/CommandWriter.cs

        }
        finally
        {
-            _semLock.Release();
+            while (!_channelLock.Reader.TryRead(out _))


If we aren't doing so elsewhere, we should probably .TryComplete() the channel here, such that if any other writers/readers happen to be waiting, they will return a false on WaitToWrite/WaitToRead or throw on WriteAsync/ReadAsync

Note: If we use TryComplete we should make sure anything using WaitToXXXAsync bails in an appropriate way if it's false.

so would you say using ReadAsync() would be better?

🤔

Need to think more about the .Dispose() but probably yes.

I think in general, ReadAsync should be used (unless a strong argument exists for thrashing on non-critical ops)

I'll make a sample helper for the pattern I'm thinking of (or pull branch and provide commit link or draft pr against your branch if that's ok and I get the opportunity to)

Where using a BoundedChannel vs a SemaphoreSlim can be challenging, is in deciding, for lack of better words, 'the right pivot' on how to use the channel.

Depending on the need, and how the underlying synchronization abstraction is implemented, can have a sizeable impact on performance.

Unless there is a compelling reason to do otherwise I think I'm settled on .ReadAsync being a better idea for what we are doing here.

If we need more, we can go deeper; BoundedChannel is lower alloc than SemaphoreSlim for many cases, however there may be future opportunities for a better pooled abstraction[0]

[0] - In general, if a channel has a waiting value on read, it's no alloc. if it's the FIRST waiter with nondefault cancellation token, it gets a pooled instance. if there's a token or the pooled waiter is used, alloc. A little better than semaphoreslim and has in some ways more reliable semantics, but it can be tricky at first.

to11mtm · 2024-01-24T21:44:08Z

src/NATS.Client.Core/Commands/CommandWriter.cs

+    {
+        Interlocked.Increment(ref _counter.PendingMessages);
+
+        while (!_channelLock.Writer.TryWrite(1))


OK Yeah I think this pattern (.TryWrite loop) even for sub/publish raises some fairness/behavioral consistency concerns.

In general WriteAsync is going to provide a slightly 'more fair' ordered queue for callers than the .TryWrite() loop, as for a bounded channel, once an item is read, things happen in this order:

If there are any pending WriteAsync calls in the queue, and any are not cancelled, the first one is marked as completed and added to the queue (this is still in main read lock, so all thread safe) and the method returns.

If previous didn't return, run through the entire set of waiters and wake them all up.

So, in general WriteAsync will be a lot less thrashy (queue vs free-for-all when contention happens) and it's worth noting that on the happy path (no contention) it won't alloc.

But that order of operations is important to consider; For instance, in extreme cases, one could use WriteAsync for 'priority' operations, and take the thrashing pain on non-priority ops (or some sort of magic.)

TBH I was aiming at high throughput on a tight loop, but that's just my test and I am actually not seeing much change between TryWrite/loop vs. WriteAsync. Happy to go with WriteAsync here.

TBH I was aiming at high throughput on a tight loop, but that's just my test and I am actually not seeing much change between TryWrite/loop vs. WriteAsync. Happy to go with WriteAsync here.

How hard would it be to add some concurrent tests where we see what happens when we have a lot of threads trying to do the needful?

to11mtm · 2024-01-24T21:50:35Z

src/NATS.Client.Core/Commands/CommandWriter.cs

-            }
+internal sealed class NatsPooledBufferWriter<T> : IBufferWriter<T>, IObjectPoolNode<NatsPooledBufferWriter<T>>
+{
+    private const int DefaultInitialBufferSize = 256;


It would be nice in future if there was a way to set this (i.e. cases where people know they have tiny messages for low mem environments, or cases where people know they have 1K payloads as the norm).

OTOH I don't think it's a big deal for the latter case (which is the only one I know of being 'real')

can we derive a value from write buffer size for example? I thinking it might not be relevant if the implementation changes in the future.

I think a derived value would be fine if the math can work out.

to11mtm · 2024-01-24T21:54:32Z

src/NATS.Client.Core/Commands/PingCommand.cs

 namespace NATS.Client.Core.Commands;

-internal struct PingCommand
+internal class PingCommand : IValueTaskSource<TimeSpan>, IObjectPoolNode<PingCommand>


Nice, this will close #321! <3

should we pull this into a separate PR? It's kind of irrelevant here and this PR might linger a bit if it ever goes in that is.

Separate PR wouldn't hurt if it's not painful to do in the existing codebase, if we have benchmarks to be sure the alloc savings doesn't cost us, even bettter?

ping command moved here #358

src/NATS.Client.Core/Commands/CommandWriter.cs

Other NATS clients do the same

stebet · 2024-01-26T14:10:36Z

Taking a look at this now

stebet · 2024-01-26T15:43:12Z

This might also be something to take a look at to make things simpler for locking: https://dotnet.github.io/dotNext/features/threading/exclusive.html

mtmk · 2024-01-28T18:25:36Z

This might also be something to take a look at to make things simpler for locking: https://dotnet.github.io/dotNext/features/threading/exclusive.html

This library now net8.0 only. tried pulling the code in but there is way too much dependency I gave up at 6 KLOC.

caleblloyd · 2024-01-31T02:05:52Z

I added a new benchmark to test Parallel Publish in #367 and here are the results:

main:

Method	Concurrency	Mean	Error	StdDev	Gen0	Allocated
PublishParallelAsync	1	1.109 s	1.2463 s	0.0683 s	10000.0000	30.53 MB
PublishParallelAsync	2	2.525 s	0.6693 s	0.0367 s	205000.0000	604.19 MB
PublishParallelAsync	4	2.675 s	0.2779 s	0.0152 s	218000.0000	640.63 MB

This PR:

Method	Concurrency	Mean	Error	StdDev	Gen0	Allocated
PublishParallelAsync	1	1.314 s	0.9356 s	0.0513 s	10000.0000	30.52 MB
PublishParallelAsync	2	1.410 s	0.3972 s	0.0218 s	25000.0000	76.3 MB
PublishParallelAsync	4	1.829 s	1.1481 s	0.0629 s	48000.0000	144.59 MB

Looking great!

src/NATS.Client.Core/Commands/CommandWriter.cs

Signed-off-by: Caleb Lloyd <caleb@synadia.com>

caleblloyd

LGTM 🎉

* Send buffer changes (#346) * Added deliver_group to consumer config (#366) * Ping command as value task source (#358)

Send buffer refinements

e90d588

mtmk linked an issue Jan 22, 2024 that may be closed by this pull request

Client parser ERROR #341

Closed

caleblloyd mentioned this pull request Jan 22, 2024

pipe reader: don't mark commands as consumed until pending=0 #347

Merged

mtmk added 4 commits January 22, 2024 22:55

Command writer pipeline encapsulation

2604370

Default write buffer size

85972f5

Revert tmp comment outs

78f0392

HPUB writer fix

abf863c

to11mtm reviewed Jan 23, 2024

View reviewed changes

mtmk removed a link to an issue Jan 23, 2024

Client parser ERROR #341

Closed

mtmk added 2 commits January 23, 2024 14:04

Merge branch 'main' into 341-send-buffer-refinement

d607449

# Conflicts: # src/NATS.Client.Core/Commands/ProtocolWriter.cs # src/NATS.Client.Core/Internal/NatsPipeliningWriteProtocolProcessor.cs # tests/NATS.Client.TestUtilities/InMemoryTestLoggerFactory.cs

Use channel instead of semaphore for locks

7b87f22

to11mtm reviewed Jan 24, 2024

View reviewed changes

caleblloyd reviewed Jan 24, 2024

View reviewed changes

src/NATS.Client.Core/Commands/CommandWriter.cs Show resolved Hide resolved

mtmk added 5 commits January 26, 2024 09:41

Merge branch 'main' into 341-send-buffer-refinement

0623d61

Reuse same pipeline avoiding data loss

2e15356

Tidy up

1d11ccf

Reverted ping command

693266d

Revert to clearing buffer

2ee24c0

Other NATS clients do the same

mtmk added 2 commits January 26, 2024 18:30

Merge branch 'main' into 341-send-buffer-refinement

ddc880c

Merge branch 'main' into 341-send-buffer-refinement

15bf6ea

mtmk added 3 commits January 28, 2024 18:43

Lock function

3efe37c

Added inlining

2404ff3

Command timeouts

722150f

mtmk added 3 commits January 31, 2024 15:07

Cancellation fixes

a8a5d5d

Merge branch 'main' into 341-send-buffer-refinement

08e1095

Derive pool rent size from buffer size option

3787133

mtmk added 5 commits January 31, 2024 16:17

Fixed format

45411a6

Command timeout test

01b742b

Test debug

535d697

Keep send buffer at message boundaries

51fa05d

Flush buffers cleanly on dispose

54c007f

caleblloyd reviewed Feb 2, 2024

View reviewed changes

mtmk added 3 commits February 2, 2024 16:33

Fixing buffer msg boundry issue

cce3450

Format fixed

d3c0288

Buffer position

6334b57

caleblloyd reviewed Feb 2, 2024

View reviewed changes

src/NATS.Client.Core/Commands/CommandWriter.cs Show resolved Hide resolved

mtmk and others added 2 commits February 2, 2024 18:02

Handle socket exception

ad5c0b2

throw away bytes in send buffer after a failed send (#368)

72f1213

Signed-off-by: Caleb Lloyd <caleb@synadia.com>

caleblloyd approved these changes Feb 2, 2024

View reviewed changes

mtmk merged commit d7b6baa into main Feb 2, 2024
10 checks passed

mtmk deleted the 341-send-buffer-refinement branch February 2, 2024 19:46

mtmk added a commit that referenced this pull request Feb 2, 2024

Release 2.1.0-preview.6

0999202

* Send buffer changes (#346) * Added deliver_group to consumer config (#366) * Ping command as value task source (#358)

mtmk mentioned this pull request Feb 2, 2024

Release 2.1.0-preview.6 #369

Merged

mtmk added a commit that referenced this pull request Feb 2, 2024

Release 2.1.0-preview.6 (#369)

667b77e

* Send buffer changes (#346) * Added deliver_group to consumer config (#366) * Ping command as value task source (#358)

caleblloyd mentioned this pull request Feb 4, 2024

Publish - avoid async state machine when possible #373

Merged

Send buffer changes #346

Send buffer changes #346

Conversation

mtmk commented Jan 22, 2024 • edited Loading

to11mtm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtmk Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

to11mtm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

to11mtm Jan 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stebet commented Jan 26, 2024

stebet commented Jan 26, 2024

mtmk commented Jan 28, 2024

caleblloyd commented Jan 31, 2024 • edited Loading

caleblloyd left a comment

Choose a reason for hiding this comment

mtmk commented Jan 22, 2024 •

edited

Loading

mtmk Jan 23, 2024 •

edited

Loading

to11mtm Jan 23, 2024 •

edited

Loading

caleblloyd commented Jan 31, 2024 •

edited

Loading