Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility for datagrams to be dropped when not able to send #4320

Open
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

iiztp
Copy link

@iiztp iiztp commented May 23, 2024

Description

When adding a Token-Bucket filter and sending more data than we are allowed to, I found out that there was a lot of delay (the more we wait the more we have delay) between sending and receiving.
After some discussion, it was established that msquic is buffering datagrams when it can't send them immediately.
With this PR, it is now possible to drop them when you can't send them immediately.
I added a flag to the QUIC_SEND_FLAGS : "QUIC_SEND_FLAG_DGRAM_CANCEL_ON_BLOCKED"
When this flag is set, on sending it will check if there is queued messages to send, if there is, it drops (cancels) them.

Testing

Do any existing tests cover this change?

I don't think so.

Are new tests needed?

Maybe...

Documentation

Is there any documentation impact for this change?

Just the new flag : "QUIC_SEND_FLAG_DGRAM_CANCEL_ON_BLOCKED"

Some more infos

This is the one-way delay of some datagrams with a TBF after sending more than we are allowed to, see images.
Without flag (before patch):
image

With flag:
image

The values are like this : "{packet number}, {one-way delay in ms}"
The values with the flag are around the same values as I have with UDP (which normally drops packets when it can't send them).

@iiztp iiztp requested a review from a team as a code owner May 23, 2024 16:23
@iiztp
Copy link
Author

iiztp commented May 23, 2024

@microsoft-github-policy-service agree

Copy link

codecov bot commented May 24, 2024

Codecov Report

Attention: Patch coverage is 95.23810% with 1 line in your changes missing coverage. Please review.

Project coverage is 85.52%. Comparing base (ae542fa) to head (6f716c2).

Files with missing lines Patch % Lines
src/core/datagram.c 95.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4320      +/-   ##
==========================================
- Coverage   87.17%   85.52%   -1.66%     
==========================================
  Files          56       56              
  Lines       17354    17375      +21     
==========================================
- Hits        15129    14860     -269     
- Misses       2225     2515     +290     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/core/datagram.c Outdated Show resolved Hide resolved
@nibanks
Copy link
Member

nibanks commented May 24, 2024

Are new tests needed?

Maybe...

Yeah, we will need to add some tests, as well as make the minor edits to the docs.

@nibanks
Copy link
Member

nibanks commented May 24, 2024

You will also need to run .\scripts\generate-dotnet.ps1 to update the .NET files.

@nibanks nibanks added external Proposed by non-MSFT Area: API Area: Core Related to the shared, core protocol logic labels May 24, 2024
Copy link
Member

@nibanks nibanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good so far! Thanks!

src/core/datagram.h Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/send.c Outdated Show resolved Hide resolved
src/core/datagram.h Outdated Show resolved Hide resolved
src/inc/msquic.h Outdated Show resolved Hide resolved
src/inc/msquic.h Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
src/core/datagram.c Outdated Show resolved Hide resolved
Still removing newlines
@nibanks
Copy link
Member

nibanks commented Jun 1, 2024

The changes looks good, but how do should we add a test to verify it? One minimum option is to update spinquic.cpp to ensure it uses this new flag (should just require changing one number). But ideally, it'd be nice to have a functional test case.

@iiztp
Copy link
Author

iiztp commented Jun 1, 2024

The changes looks good, but how do should we add a test to verify it? One minimum option is to update spinquic.cpp to ensure it uses this new flag (should just require changing one number). But ideally, it'd be nice to have a functional test case.

Maybe we should just add more than we can send i.e a lot of DatagramSend stuck in a loop until we reach out the MTU which is 1500 if we are over ethernet or wifi (so maybe 2 datagrams of a thousand bytes then multiple datagrams that we can drop) or afaik ~65000 for localhost (2 datagrams of 45 000 bytes and multiple datagrams that we can drop) ?

I don't know if this would work, otherwise we could just create a new CI and run a QUIC server on it with special traffic control rules (i.e bandwidth limit) and check if some packets that we allowed to be dropped are dropped and the others are not but the outputs would be different for each run and wouldn't give us a general idea.

@nibanks
Copy link
Member

nibanks commented Jun 6, 2024

The changes looks good, but how do should we add a test to verify it? One minimum option is to update spinquic.cpp to ensure it uses this new flag (should just require changing one number). But ideally, it'd be nice to have a functional test case.

Maybe we should just add more than we can send i.e a lot of DatagramSend stuck in a loop until we reach out the MTU which is 1500 if we are over ethernet or wifi (so maybe 2 datagrams of a thousand bytes then multiple datagrams that we can drop) or afaik ~65000 for localhost (2 datagrams of 45 000 bytes and multiple datagrams that we can drop) ?

I don't know if this would work, otherwise we could just create a new CI and run a QUIC server on it with special traffic control rules (i.e bandwidth limit) and check if some packets that we allowed to be dropped are dropped and the others are not but the outputs would be different for each run and wouldn't give us a general idea.

I think a test that queues up more than 10 1000-byte datagrams before it starts a connection, and then calls connection start might trigger the discard notification.

submodules/googletest Outdated Show resolved Hide resolved
@iiztp
Copy link
Author

iiztp commented Jun 22, 2024

Quick question on this PR, even though some tests have failed (because they were too long to run or because they had their connection was interrupted (?) ) do I have something more to do? Like, do I have to fix the failed tests? I saw that it was the same thing for everyone about these tests so I don't know?

@nibanks
Copy link
Member

nibanks commented Oct 27, 2024

I'm not sure what happened, but it looks like you pushed a lot of unexpected new files.

@iiztp
Copy link
Author

iiztp commented Oct 27, 2024

I'm not sure what happened, but it looks like you pushed a lot of unexpected new files.

Other than merging, I did nothing else :/

@nibanks
Copy link
Member

nibanks commented Oct 28, 2024

I think the problem is the msquicdocs folder shouldn't be in the main branch. Can you please remove that?

@@ -255,6 +257,22 @@ QuicTestDatagramSend(
TEST_TRUE(Client.GetDatagramsSuspectLost() > InitialLostCount);
CxPlatSleep(100);
#endif
for (int i = 0; i < 20; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we actually create a new test for this stuff, instead of putting it in this existing one? Also, It's possible, though unlikely, that you don't get any cancelled datagrams if they all happen to fit in the congestion control window. So, I recommend the new test queues (sends) all these datagrams on the connection before you call Client.Start to ensure you've queued up more datagrams than can be sent in the initial congestion window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: API Area: Core Related to the shared, core protocol logic external Proposed by non-MSFT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants