fix e2e EnvoyShutdown #3283

shawnh2 · 2024-04-26T08:47:52Z

What type of PR is this?

What this PR does / why we need it:

add test into codecov ignore path
trying to fix flaky: e2e TestEGUpgrade/EnvoyShutdown #3262 by increasing the HTTPRoute Duration and move load generation after the deployment has been restarted
fix the order of merge-gateways e2e test in makefile

Which issue(s) this PR fixes:

Fixes #3262

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

shawnh2 · 2024-04-28T10:01:36Z

/retest

guydc · 2024-04-29T11:56:32Z

move load generation after the deployment has been restarted

Hi @shawnh2! The purpose of the test is to run load during the restart and assert that there's no loss of requests... Running it after restart means that we're not really testing for hitless shutdown anymore..

shawnh2 · 2024-04-30T02:38:55Z

move load generation after the deployment has been restarted

Hi @shawnh2! The purpose of the test is to run load during the restart and assert that there's no loss of requests... Running it after restart means that we're not really testing for hitless shutdown anymore..

Thanks for the clarification, by root causing this e2e test case, sometimes after the restart process, the url that one request send to may be a 404 path, meaning there are some loss requests.

Maybe by sending another request to detect the url path right after the restart process can avoid this situation.

alexwo · 2024-04-30T08:00:57Z

move load generation after the deployment has been restarted

Hi @shawnh2! The purpose of the test is to run load during the restart and assert that there's no loss of requests... Running it after restart means that we're not really testing for hitless shutdown anymore..

Thanks for the clarification, by root causing this e2e test case, sometimes after the restart process, the url that one request send to may be a 404 path, meaning there are some loss requests.

Maybe by sending another request to detect the url path right after the restart process can avoid this situation.

Maybe the upgrade process isn't hitless anymore, I hadn't noticed the 404 issue before, so perhaps that's a new thing. Perhaps it's worth implementing a temporary workaround to prevent instability, but we should also address the underlying issue that causes some requests to result in a 404 error during an upgrade.

guydc · 2024-04-30T12:07:22Z

Hi @shawnh2 , @alexwo - that's interesting input. Yes, I assume that if failures are 404s, then we have a problem with new envoy proxies receiving traffic before they are programmed.

In that case, I think we can split this activity to two parts:

Run load and log load stats, but only assert on success of converging single-user requests. Basically, only make sure that restarted proxies are eventually functional and stop this test from being flaky.
Investigate how to fix the underlying issue causing 404s.

WDYT?

alexwo · 2024-04-30T12:46:11Z

Hi @shawnh2 , @alexwo - that's interesting input. Yes, I assume that if failures are 404s, then we have a problem with new envoy proxies receiving traffic before they are programmed.

In that case, I think we can split this activity to two parts:

Run load and log load stats, but only assert on success of converging single-user requests. Basically, only make sure that restarted proxies are eventually functional and stop this test from being flaky.

Investigate how to fix the underlying issue causing 404s.

WDYT?

It's also possible that existing proxies fail, as they may refresh their cache when attempting to obtain a new cache snapshot version from newly started EGs. Maybe something as initialDelaySeconds suggested here can help:
#2810 (comment)

Or potentially:
#2918

I'm not sure that EG infra controller, will re-deploy proxies in every EG upgrade test, perhaps only if there are changes that require re-deployment?

zirain · 2024-04-30T14:09:01Z

this test skiped in #3306, can you remove this SkipTests?

gateway/test/e2e/upgrade/eg_upgrade_test.go

Line 56 in d32256c

    
           tests.EnvoyShutdownTest.ShortName, // https://github.com/envoyproxy/gateway/issues/3262

arkodg · 2024-05-16T22:56:50Z

hey @shawnh2 is this PR still needed ?

shawnh2 · 2024-05-17T00:28:58Z

close this for now, will raise a fix if we have a clue.

shawnh2 added 2 commits April 26, 2024 12:53

polish code comment and add test to codecov ignore path

12c6c64

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

increase http runner duration and fix e2e makefile

7304149

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

shawnh2 requested a review from a team as a code owner April 26, 2024 08:47

shawnh2 marked this pull request as draft April 26, 2024 09:12

shawnh2 added 2 commits April 28, 2024 14:54

move load generation process behind restart deployment

5661d2c

Signed-off-by: shawnh2 <shawnhxh@outlook.com>

Merge branch 'main' into fix-e2e-envoyshutdown

9df130c

shawnh2 marked this pull request as ready for review April 28, 2024 09:59

zirain added 2 commits April 30, 2024 21:48

Merge branch 'main' into fix-e2e-envoyshutdown

c55d30c

Merge branch 'main' into fix-e2e-envoyshutdown

0d1b075

shawnh2 closed this May 17, 2024

shawnh2 deleted the fix-e2e-envoyshutdown branch May 17, 2024 00:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix e2e EnvoyShutdown #3283

fix e2e EnvoyShutdown #3283

shawnh2 commented Apr 26, 2024 •

edited

Loading

shawnh2 commented Apr 28, 2024

guydc commented Apr 29, 2024 •

edited

Loading

shawnh2 commented Apr 30, 2024

alexwo commented Apr 30, 2024 •

edited

Loading

guydc commented Apr 30, 2024

alexwo commented Apr 30, 2024 •

edited

Loading

zirain commented Apr 30, 2024 •

edited

Loading

arkodg commented May 16, 2024

shawnh2 commented May 17, 2024

fix e2e EnvoyShutdown #3283

fix e2e EnvoyShutdown #3283

Conversation

shawnh2 commented Apr 26, 2024 • edited Loading

shawnh2 commented Apr 28, 2024

guydc commented Apr 29, 2024 • edited Loading

shawnh2 commented Apr 30, 2024

alexwo commented Apr 30, 2024 • edited Loading

guydc commented Apr 30, 2024

alexwo commented Apr 30, 2024 • edited Loading

zirain commented Apr 30, 2024 • edited Loading

arkodg commented May 16, 2024

shawnh2 commented May 17, 2024

shawnh2 commented Apr 26, 2024 •

edited

Loading

guydc commented Apr 29, 2024 •

edited

Loading

alexwo commented Apr 30, 2024 •

edited

Loading

alexwo commented Apr 30, 2024 •

edited

Loading

zirain commented Apr 30, 2024 •

edited

Loading