Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve resiliency of quic test #57020

Merged
merged 3 commits into from
Aug 12, 2021
Merged

improve resiliency of quic test #57020

merged 3 commits into from
Aug 12, 2021

Conversation

wfurt
Copy link
Member

@wfurt wfurt commented Aug 7, 2021

The core change is in CreateConnectedQuicConnection.
I modified it to ignore ConnectionRefused and try 3 times with increasing timeout.
While this also logs to Unit it would crash in debug build to collect core if all 3 attempts fail.

I also bump to fact that is not easy to catch particular quic exception. MsQuicStatusCodes has different values on each platform so it is not easy to use it even as const int in test, beside the fact that we only propagate string not actual error code.

I added mini PAL function to map the low-level error code to something globally meaningful.
I pick few relevant and I map them to SocketError aka WSA*.
We may also throw QuicConnectionAbort exception (especially if we derive from IOExcpetion) but I was not ready to make that change.

Since I don't have good way how to test it it needs most attention.

I updated RunClientServer to use the helper above so the retry logic is in single place. (so far)

I updated bunch of test to use the helper. There may be more as well as this really only works for for tests expecting success. Tests expecting different failure will still need some more work.

I updated few places I bump into with missing validation or generic Exception.

Added some counter to debug builds for MsQuicListener. This may help when getting core dump and/or stepping through in debugger.

contributes to #55979

@wfurt wfurt requested a review from a team August 7, 2021 04:11
@wfurt wfurt self-assigned this Aug 7, 2021
@ghost
Copy link

ghost commented Aug 7, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

The core change is in CreateConnectedQuicConnection.
I modified it to ignore ConnectionRefused and try 3 times with increasing timeout.
While this also logs to Unit it would crash in debug build to collect core if all 3 attempts fail.

I also bump to fact that is not easy to catch particular quic exception. MsQuicStatusCodes has different values on each platform so it is not easy to use it even as const int in test, beside the fact that we only propagate string not actual error code.

I added mini PAL function to map the low-level error code to something globally meaningful.
I pick few relevant and I map them to SocketError aka WSA*.
We may also throw QuicConnectionAbort exception (especially if we derive from IOExcpetion) but I was not ready to make that change.

Since I don't have good way how to test it it needs most attention.

I updated RunClientServer to use the helper above so the retry logic is in single place. (so far)

I updated bunch of test to use the helper. There may be more as well as this really only works for for tests expecting success. Tests expecting different failure will still need some more work.

I updated few places I bump into with missing validation or generic Exception.

Added some counter to debug builds for MsQuicListener. This may help when getting core dump and/or stepping through in debugger.

contributes to #55979

Author: wfurt
Assignees: wfurt
Labels:

area-System.Net.Quic

Milestone: -

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@@ -37,6 +36,12 @@ private sealed class State

public QuicOptions ConnectionOptions = new QuicOptions();
public SslServerAuthenticationOptions AuthenticationOptions = new SslServerAuthenticationOptions();
#if DEBUG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? I'm not a fan of cluttering the product code with debugging helpers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really? => no.

I spent fair amount of time debugging test failures and I was adding and removing various instrumentations wishing it was there.
I work on products in the past where counter were part of diagnostic strategy.
I wish we can keep them for product as well but I'm not ready to push it at this point.

Perhaps @geoffkizer or @stephentoub would have preference.
I can certainly take them out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a third opinion here 😄 @geoffkizer @stephentoub could you chime in here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind having stuff like this as long as we think it's generally useful. I think it starts to verge into clutter if it's not super useful and/or not clear what the value is.

I would suggest
(a) Add comments on the field declarations explaining what each of these does
(b) Let's all evaluate each one and agree if we think it's worth keeping or not

return new QuicException($"{message} Error Code: {MsQuicStatusCodes.GetError(status)}", innerException, MapMsQuicStatusToHResult(status));
}

internal static int MapMsQuicStatusToHResult(uint status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is a temporary solution and we should define our own status codes that'll abstract the msquic ones.
That could be probably covered with #32066.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may change the exception base and define quic specific codes. #32066 is primarily talking about the messages.

However, the HResult I pick are generally portable and for example you can use https://www.hresult.info to look them up. What comes out should pretty match what is happening at MsQuic.
Hopefully this will not be thrown directly in 6.0 and we can finalize this when we do exception cleanup and when we make it public.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so basically we would rather expand the usage of HResult to cover all the relevant codes from msquic.
I assume this will be thrown directly from S.N.Quic, but not from S.N.Http. QuicException is a public API of S.N.Quic. But we're talking 7.0 time frame here.

Anyway, what I was trying to say/suggest is that we should cover all the result codes eventually. However, this is perfectly good enough for the problem we have here.

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the open discussion LGTM.
Once that's decided, we can merge, approving to not to hold it back.

@ManickaP
Copy link
Member

BTW, we should close #55979 with this fix, we shouldn't keep it hanging around. We can reopen if this together with #57190 doesn't help.

@wfurt wfurt merged commit 88b3ebf into dotnet:main Aug 12, 2021
@wfurt
Copy link
Member Author

wfurt commented Aug 12, 2021

Since this does not address #55979 I moved it to 7.0. This is only workaround IMHO to pass our TEST and I feel it deserves some more attention.

@karelz karelz added this to the 6.0.0 milestone Aug 17, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants