-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix e2e EnvoyShutdown #3283
fix e2e EnvoyShutdown #3283
Conversation
Signed-off-by: shawnh2 <shawnhxh@outlook.com>
Signed-off-by: shawnh2 <shawnhxh@outlook.com>
Signed-off-by: shawnh2 <shawnhxh@outlook.com>
/retest |
Hi @shawnh2! The purpose of the test is to run load during the restart and assert that there's no loss of requests... Running it after restart means that we're not really testing for hitless shutdown anymore.. |
Thanks for the clarification, by root causing this e2e test case, sometimes after the restart process, the url that one request send to may be a 404 path, meaning there are some loss requests. Maybe by sending another request to detect the url path right after the restart process can avoid this situation. |
Maybe the upgrade process isn't hitless anymore, I hadn't noticed the 404 issue before, so perhaps that's a new thing. Perhaps it's worth implementing a temporary workaround to prevent instability, but we should also address the underlying issue that causes some requests to result in a 404 error during an upgrade. |
Hi @shawnh2 , @alexwo - that's interesting input. Yes, I assume that if failures are 404s, then we have a problem with new envoy proxies receiving traffic before they are programmed. In that case, I think we can split this activity to two parts:
WDYT? |
It's also possible that existing proxies fail, as they may refresh their cache when attempting to obtain a new cache snapshot version from newly started EGs. Maybe something as initialDelaySeconds suggested here can help: Or potentially: I'm not sure that EG infra controller, will re-deploy proxies in every EG upgrade test, perhaps only if there are changes that require re-deployment? |
this test skiped in #3306, can you remove this SkipTests?
|
hey @shawnh2 is this PR still needed ? |
close this for now, will raise a fix if we have a clue. |
What type of PR is this?
What this PR does / why we need it:
test
into codecov ignore pathWhich issue(s) this PR fixes:
Fixes #3262