Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: race condition? when using mergeGateway and multiple gateways #2968

Closed
zetaab opened this issue Mar 18, 2024 · 4 comments · Fixed by #3332
Closed

bug: race condition? when using mergeGateway and multiple gateways #2968

zetaab opened this issue Mar 18, 2024 · 4 comments · Fixed by #3332
Labels

Comments

@zetaab
Copy link
Contributor

zetaab commented Mar 18, 2024

Description:

I have following installation

% kubectl get gateway -A             
NAMESPACE              NAME       CLASS         ADDRESS          PROGRAMMED   AGE
echoserver             foobar     eg-internal   10.222.156.49    True         34m
envoy-gateway-system   internal   eg-internal   10.222.156.49    True         50m

full yaml spec: https://gist.github.com/zetaab/8caa34f5072d5a8efc5c2425c331c561

and httproutes https://gist.github.com/zetaab/149545f3e0ae17c0b925bafd3512d1eb

When I am adding httproutes and envoy proxies are restarting, it will randomly all services unavailable. When envoy pods are starting I can see following in logs

[2024-03-18 06:46:10.571][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/https'
[2024-03-18 06:46:10.573][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'

This means that all services are running (also services that are coming from other gateways than https). However, when the log says

[2024-03-18 07:17:24.768][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'echoserver/foobar/https-foo'
[2024-03-18 07:17:24.769][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'

Nothing will work.

 % curl https://foo.bar -v -k     
*   Trying 10.222.156.49:443...
* Connected to foo.bar (10.222.156.49) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to foo.bar:443 
* Closing connection
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to foo.bar:443
% curl https://eg-int.company.com -v
*   Trying 10.222.156.49:443...
* Connected to eg-int.company.com (10.222.156.49) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to eg-int.company.com:443 
* Closing connection
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to eg-int.company.com:443 

Why listeners are loaded in different order sometimes?

Repro steps:

  1. use mergeGateways
  2. create two different gateways
  3. add and delete httproutes
  4. soonish you should see the situation that listener will fail for some reason (there is no error anywhere but the port is not just listening)

Environment:
eg 1.0.0

Logs:

took listener configurations using egctl egctl config envoy-proxy listener -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gatewayclass=eg-internal

not working:
https://gist.github.com/zetaab/2e0f2f00174d4b6189290e095ebe5cf5

working:
https://gist.github.com/zetaab/08d2f7b3bfbd04a28be14bd990552214

like can be seen: sometimes its missing 400+ rows of configurations. When this happens, I need to delete all other gateways than primary one (located in envoy-gateway-system) and then add other gateways back. Then everything starts to work again.

@zetaab zetaab added the triage label Mar 18, 2024
Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days.

@github-actions github-actions bot added the stale label Apr 17, 2024
@zetaab
Copy link
Contributor Author

zetaab commented Apr 25, 2024

this is still the issue with 1.0.1.

@github-actions github-actions bot removed the stale label Apr 25, 2024
@zetaab
Copy link
Contributor Author

zetaab commented Apr 26, 2024

this has maybe something to do with infrastructure annotations in gateway. I am trying to reproduce this now with latest main

@zetaab
Copy link
Contributor Author

zetaab commented May 6, 2024

envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.682][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.687][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/https'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.720][1][info][upstream] [source/common/listener_manager/lds_api.cc:63] lds: remove listener 'envoy-gateway-system/internal/https'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.724][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'echoserver/foobar/https-foo'
envoy-eg-internal-7f4ff7e4-579df9cdbb-lrm4k envoy [2024-05-05 00:02:18.725][1][info][upstream] [source/common/listener_manager/lds_api.cc:99] lds: add/update listener 'envoy-gateway-system/internal/http'

-> nothing will work. It is removing original listener and adding foobar listener

I went through the configuration and imo it looks quite same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant