Linkerd-proxy routing traffic to wrong pod #12941

bc185174 · 2024-08-06T10:17:37Z

What is the issue?

We are occasionally seeing that traffic from our applications are being routed to the wrong pod. This was noticed when we started getting 403 responses from linkerd-proxy due to policy rejection, even though the policies were correctly configured.

After enabling the debug logs, we noticed that the proxy was routing traffic to a different IP than that of the pod the application was trying to resolve to.

From the proxy logs output below, the resolved IP is 100.127.166.1; however, when querying the destination pod using the linkerd CLI, the IP we expect to call to is 100.127.166.50.

Proxy logs:

{"timestamp":"[ 34142.675035s]","level":"DEBUG","fields":{"message":"Remote proxy error","error":"client 100.127.166.59:34612: server: 100.127.166.1:5984: unauthorized request on route"},"target":"linkerd_app_outbound::http::handle_proxy_error_headers","spans":[{"addr":"100.127.166.1:5984","name":"forward"}],"threadId":"ThreadId(1)"}

Linkerd CLI output:

linkerd diagnostics endpoints api-service-0.api-service.api-service.svc.cluster.local:8080 --kubeconfig=/etc/kubernetes/zylevel0.conf --linkerd-namespace=linkerd --destination-pod linkerd-destination-lqv74
NAMESPACE           IP               PORT   POD                   SERVICE
api-service         100.127.166.50   8080   api-service-0         api-service.api-service

This issue is resolved when we stop/start the linkerd-proxy container using crictl CLI. Note we did not restart application container. Is there anything else we can check/help to debug?

How can it be reproduced?

Deploy linkerd using Linkerd CLI as per the docs https://linkerd.io/2.15/getting-started/#step-1-install-the-cli.
Deploy basic HTTP application and apply the linkerd policies.
Reboot the Kubernetes node where application is running.

Logs, error output, etc

{"timestamp":"[ 34142.675035s]","level":"DEBUG","fields":{"message":"Remote proxy error","error":"client 100.127.166.59:34612: server: 100.127.166.1:5984: unauthorized request on route"},"target":"linkerd_app_outbound::http::handle_proxy_error_headers","spans":[{"addr":"100.127.166.1:5984","name":"forward"}],"threadId":"ThreadId(1)"}

output of `linkerd check -o short`

N/A

Environment

Kubernetes version: v1.28.9
Host O/S: Ubuntu Jammy

Possible solution

N/A

Additional context

No response

Would you like to work on fixing this bug?

maybe

The text was updated successfully, but these errors were encountered:

wmorgan · 2024-08-06T13:58:41Z

We will need version of Linkerd control plane (and data plane, if different)

bc185174 · 2024-08-06T14:09:46Z

We will need version of Linkerd control plane (and data plane, if different)

Linkerd 2.14.10 for both control plane and data plane.

bc185174 · 2024-08-08T14:04:10Z

Something to note is our application sends out a HEAD request every 5s as keep-alive. How does this work with the destination caching? AFAIK, this caches TTL is also 5s. Could this cause issues?

kflynn · 2024-08-08T16:19:22Z

@bc185174, there have been a number of changes around destination selection after 2.14.10 -- does the latest edge release show this failure for you?

bc185174 · 2024-08-13T09:56:59Z

@bc185174, there have been a number of changes around destination selection after 2.14.10 -- does the latest edge release show this failure for you?

We've just tried edge-24.2.4 and still hitting the same issue. It seems reproducible with our builds and restarting the client pod resolves the issue.

DavidMcLaughlin · 2024-08-15T16:29:07Z

A couple of clarifying questions:

Did you unsafely restart the node (i.e. turn the power switch off) or did you gracefully terminate it and drain the workloads?
Once you restart the node, the wrong pod IP is being used until the application is restarted? For how long did you let a pod run and it remained in a bad state?
Is edge-24.2.4 the right version you tried? That release is from February. A lot of work was done this year to improve this so it would be interesting to try with a newer (ideally latest) edge.

bc185174 added the bug label Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linkerd-proxy routing traffic to wrong pod #12941

Linkerd-proxy routing traffic to wrong pod #12941

bc185174 commented Aug 6, 2024 •

edited

Loading

wmorgan commented Aug 6, 2024

bc185174 commented Aug 6, 2024

bc185174 commented Aug 8, 2024

kflynn commented Aug 8, 2024

bc185174 commented Aug 13, 2024

DavidMcLaughlin commented Aug 15, 2024

Linkerd-proxy routing traffic to wrong pod #12941

Linkerd-proxy routing traffic to wrong pod #12941

Comments

bc185174 commented Aug 6, 2024 • edited Loading

What is the issue?

How can it be reproduced?

Logs, error output, etc

output of linkerd check -o short

Environment

Possible solution

Additional context

Would you like to work on fixing this bug?

wmorgan commented Aug 6, 2024

bc185174 commented Aug 6, 2024

bc185174 commented Aug 8, 2024

kflynn commented Aug 8, 2024

bc185174 commented Aug 13, 2024

DavidMcLaughlin commented Aug 15, 2024

bc185174 commented Aug 6, 2024 •

edited

Loading

output of `linkerd check -o short`