Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛"stream #### canceled by remote with error code 0" connIndex=0 event=1 ingressRule=0 originService=" Started roughly 7/19 #1300

Open
dledfordcf opened this issue Jul 24, 2024 · 27 comments
Labels
Priority: Normal Minor issue impacting one or more users Type: Bug Something isn't working

Comments

@dledfordcf
Copy link

dledfordcf commented Jul 24, 2024

Describe the bug
Opening this as a centralized place for this issue.

To Reproduce
Was unable to directly reproduce on my own tunnel, but the from what I have gathered from others:
The issue happens at random, rebooting tunnel resolves the issue temporarily, but it will resurface.

Unifying fact seems to be version 2024.6.1 and issues starting around 7/19

If it's an issue with Cloudflare Tunnel:
4. Tunnel ID : Multiple
5. cloudflared config:

Expected behavior
Tunnel would connect to edge and work

Environment and versions

  • OS: Multiple
  • Architecture: Multiple
  • Version: 2024.6.1

Logs and errors
I dont have any of my own logs for this

Additional context
No tunnel updates were released when this started as 2024.6.1 has been out around a month, but there is a large grouping of people with this starting around the same time of 7/19/2024

If anyone from the Cloudflare team checks this bug as well, feel free to hit me up internally.

@dledfordcf dledfordcf added Priority: Normal Minor issue impacting one or more users Type: Bug Something isn't working labels Jul 24, 2024
@dledfordcf
Copy link
Author

Please feel free to comment with rough time issue started, cloudflared version used, way tunnel is deployed (Docker, VM, Baremetal, etc) I admittedly am not able to directly do 1 on 1 troubleshooting for everyone but I've let people know internally that multiple people started reporting this error recently and linked the Github bug so it will be helpful for gathering information.

@14wkinnersley
Copy link

14wkinnersley commented Jul 24, 2024

Issue started for me around 20JUL24 00:21 MT is when one of my monitors first detected an issue. I have Cloudflare Tunnels deployed using Docker image cloudflare/cloudflared:latest. Host: Ubuntu 22.04.3.
All of my instances are updated to the latest with 2024.6.1 but have been updated since that update was released. I didn't experience issues until recently.
I have multiple instances of cloudflare tunnels running that have been running for a few years now. No issues till recently. I attempted to add the flag "--protocol http2" to my docker setup to rule out quic issues but that did not fix anything.

Logs:

2024-07-23T22:09:50Z INF Unregistered tunnel connection connIndex=1 event=0 ip=198.41.200.113
2024-07-23T22:09:50Z WRN Failed to serve quic connection error="timeout: no recent network activity" connIndex=1 event=0 ip=198.41.200.113
2024-07-23T22:09:50Z WRN Serve tunnel error error="timeout: no recent network activity" connIndex=1 event=0 ip=198.41.200.113
2024-07-23T22:09:50Z INF Retrying connection in up to 1s connIndex=1 event=0 ip=198.41.200.113
2024-07-23T22:09:52Z WRN Connection terminated error="timeout: no recent network activity" connIndex=1
2024-07-23T22:09:55Z INF Registered tunnel connection connIndex=1 connection=061f0765-2e24-421a-98df-dd49bda31254 event=0 ip=198.41.200.63 location=slc01 protocol=quic
2024-07-23T22:21:44Z INF Unregistered tunnel connection connIndex=1 event=0 ip=198.41.200.63
2024-07-23T22:21:44Z WRN Failed to serve quic connection error="timeout: no recent network activity" connIndex=1 event=0 ip=198.41.200.63
2024-07-23T22:21:44Z WRN Serve tunnel error error="timeout: no recent network activity" connIndex=1 event=0 ip=198.41.200.63
2024-07-23T22:21:44Z INF Retrying connection in up to 1s connIndex=1 event=0 ip=198.41.200.63
2024-07-23T22:21:46Z WRN Connection terminated error="timeout: no recent network activity" connIndex=1
2024-07-23T22:21:54Z INF Unregistered tunnel connection connIndex=2 event=0 ip=198.41.200.73
2024-07-23T22:21:54Z WRN Failed to serve quic connection error="timeout: no recent network activity" connIndex=2 event=0 ip=198.41.200.73
2024-07-23T22:21:54Z WRN Serve tunnel error error="timeout: no recent network activity" connIndex=2 event=0 ip=198.41.200.73
2024-07-23T22:21:54Z INF Retrying connection in up to 1s connIndex=2 event=0 ip=198.41.200.73
2024-07-23T22:21:56Z WRN Connection terminated error="timeout: no recent network activity" connIndex=2
2024-07-23T22:22:02Z INF Registered tunnel connection connIndex=1 connection=53437d5d-008b-46ea-99ef-c9aa8ff8ca6c event=0 ip=198.41.200.23 location=slc01 protocol=quic
2024-07-23T22:22:02Z INF Registered tunnel connection connIndex=2 connection=a74ddd5c-0401-4d47-af87-4751d9cc5001 event=0 ip=198.41.200.43 location=slc01 protocol=quic

@clickbg
Copy link

clickbg commented Jul 24, 2024

Issue started for us ~4 days ago - 20.07.24. We are using multiple Docker based tunnels distributed across various OSes from Ubuntu to RPi OS. Most of them are 2024.6.1 but some systems are 2024.6.0. Issue is that after a 1-2 days of uptime the tunnel starts disconnecting intermittently - some calls to the backend work and some fail with error code 524 (Timeout). We have noticed that when this happens out of 10 HTTP calls 3 fail. The workaround is to restart the tunnel. We have confirmed that its not a network issue on our side. It also happens to multiple, independent systems that are located in different datacenters. We have observed the issue happening at Hetzner Germany, OVH France and our own DC in Bulgaria on the same day - not the same time. Those providers have direct peering with Cloudflare and have no reported network outages during that time.

Logs:

2024-07-22T19:31:40Z ERR  error="stream 301 canceled by remote with error code 0" connIndex=1 event=1 ingressRule=0 originService=https://REMOVED-nginx
2024-07-22T19:31:40Z ERR Request failed error="stream 301 canceled by remote with error code 0" connIndex=1 dest=https://REMOVED/apps/theming/favicon?v=0ade7c2c event=0 ip=198.41.200.33 type=http
2024-07-22T19:38:45Z ERR  error="stream 57 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=0 originService=https://REMOVED-nginx
2024-07-22T19:38:45Z ERR Request failed error="stream 57 canceled by remote with error code 0" connIndex=3 dest=https://REMOVED/apps/theming/favicon?v=0ade7c2c event=0 ip=198.41.200.193 type=http
2024-07-22T19:38:45Z ERR  error="stream 313 canceled by remote with error code 0" connIndex=1 event=1 ingressRule=0 originService=https://REMOVED-nginx
2024-07-22T19:38:45Z ERR Request failed error="stream 313 canceled by remote with error code 0" connIndex=1 dest=https://REMOVED/apps/theming/icon?v=0ade7c2c event=0 ip=198.41.200.33 type=http
2024-07-22T19:38:55Z ERR  error="stream 61 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=0 originService=https://REMOVED-nginx
2024-07-22T19:38:55Z ERR Request failed error="stream 61 canceled by remote with error code 0" connIndex=3 dest=https://REMOVED/apps/theming/favicon?v=0ade7c2c event=0 ip=198.41.200.193 type=http
2024-07-22T21:52:18Z ERR  error="stream 89 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=0 originService=https://REMOVED-nginx
2024-07-22T21:52:18Z ERR Request failed error="stream 89 canceled by remote with error code 0" connIndex=3 dest=https://REMOVED/apps/theming/icon?v=0ade7c2c event=0 ip=198.41.200.193 type=http
2024-07-22T21:52:18Z ERR  error="stream 461 canceled by remote with error code 0" connIndex=1 event=1 ingressRule=0 originService=https://REMOVED-nginx
2024-07-22T21:52:18Z ERR Request failed error="stream 461 canceled by remote with error code 0" connIndex=1 dest=https://REMOVED/apps/theming/favicon?v=0ade7c2c event=0 ip=198.41.200.33 type=http

We are also seeing:

2024-07-22T15:22:31Z WRN Failed to serve quic connection error="failed to accept QUIC stream: timeout: no recent network activity" connIndex=1 event=0 ip=198.41.200.23
2024-07-22T15:22:31Z WRN Serve tunnel error error="failed to accept QUIC stream: timeout: no recent network activity" connIndex=1 event=0 ip=198.41.200.23
2024-07-22T15:22:31Z INF Retrying connection in up to 1s connIndex=1 event=0 ip=198.41.200.23
2024-07-22T15:22:31Z WRN Connection terminated error="failed to accept QUIC stream: timeout: no recent network activity" connIndex=1
2024-07-22T15:22:33Z INF Unregistered tunnel connection connIndex=3 event=0 ip=198.41.200.13
2024-07-22T15:22:33Z WRN Failed to serve quic connection error="timeout: no recent network activity" connIndex=3 event=0 ip=198.41.200.13
2024-07-22T15:22:33Z WRN Serve tunnel error error="timeout: no recent network activity" connIndex=3 event=0 ip=198.41.200.13
2024-07-22T15:22:33Z INF Retrying connection in up to 1s connIndex=3 event=0 ip=198.41.200.13
2024-07-22T15:22:34Z WRN Connection terminated error="timeout: no recent network activity" connIndex=3

@firecow
Copy link

firecow commented Jul 24, 2024

Our production and staging environment went down across two K8S clusters and eight Docker Swarm clusters in three different psychical locations the 19th of July and again this morning (24th of July).

Restarting our cloudflared system services and cloudflared containers helped.

The tunnel metrics /ready endpoint was exiting with code 0, indicating that there was no problems.

@DevinCarr @jcsf This needs to be addressed immidiately.

I have also opened an enterprise support ticket, to make sure this gets some traction.

I don't think this is a problem with the cloudflared binary, since we see it accross a wide array of cloudflared versions.

@jfarre20
Copy link

same here across multiple tunnels.

@OcifferAction
Copy link

I utilize a Cloudflare tunnel on my home lab and I initially ran into this issue while on vacation last week. Had to VPN into my home network to restart the tunnel. I'm running a tunnel on Unraid utilizing this Docker repo: https://github.com/AriaGomes/Unraid-Cloudflared-Tunnel. I'm receiving the same error messages as everyone else.

@jcsf
Copy link
Contributor

jcsf commented Jul 24, 2024

We are investigating on our side we will let you know once we have more information. Sorry for not having more information to provide right now.

@leet4tari
Copy link

Started Monday for us using multiple docker tunnels on version 2024.06.[0/1]

@joakimlemb
Copy link

Same issue here, running cloudflared in docker with following config:
Debian 12.6/Proxmox with kernel: 6.8.8-2-pve
Docker version 27.0.3, build 7d4bcd8

  cloudflare-tunnel:
    image: cloudflare/cloudflared
    hostname: cloudflare-tunnel
    restart: unless-stopped
    mem_limit: 1g
    network_mode: "host"
    user: 1000:1000
    command: tunnel run --protocol http2
    environment:
      - "TUNNEL_TOKEN=REDACTED"
    logging:
      options:
       max-size: "5m"

@tamisoft
Copy link

Same here, it started when the larger cloudflare update was rolling out into different datacenters earlier this month. The phenomenon looks like this:
tunnel connects to nearest datacenters (hel01,tll01) then randomly there will be:
WRN Failed to serve quic connection error="timeout: no recent network activity" connIndex=1 event=0 ip=198.41.200.63
for all connected indices, then the client frees up the unused connections, and retries to connect again. BUT this time it never connects back to the nearest datacenters, lately it'll connect to dme01, dme06, then that fails the same way after a who;e, and then I normally land on rix01 rix01 for both. And at this point the data will move painfully slowly, if at all. As if the client would just ignore the nearest/previously already used datacenters and would be moving away from the physical location.
I can imagine if all the clients in the nordic region act the same then rix01 would be pretty unhappy getting all the traffic. And that connection has no second datacenter handle either, so all connection indices will be connected to rix01.
I hope this is helpful @jcsf

@cageyv
Copy link

cageyv commented Jul 25, 2024

Same here.
Works perfect in case of fra datacenters
But super slow when connected ham01, arn02
Tunnel versions: from 2023.12 to 2024.3
Support ticked was opened

@cageyv
Copy link

cageyv commented Jul 25, 2024

Looks better now. Tunnels were rerouted to other datacenters. No issues again.
Support helps. In case of issues, I could recommend opening a ticket.

@daegalus
Copy link

I have cloudflared running directly on a VM handling traffic to docker containers and services on other machines in the network. Over the last few days, I have been bombarded with alerts from my Uptime Kuma of 524 errors and timeouts on my external checkers.

When I check the logs, I just see a sea of stream closed with error 0 messages.

A reboot fixes it for a few hours, then it starts up again.

@825i
Copy link

825i commented Jul 25, 2024

I have cloudflared running directly on a VM handling traffic to docker containers and services on other machines in the network. Over the last few days, I have been bombarded with alerts from my Uptime Kuma of 524 errors and timeouts on my external checkers.

When I check the logs, I just see a sea of stream closed with error 0 messages.

A reboot fixes it for a few hours, then it starts up again.

Exact same issue. Seems this is hitting a LOT of people suddenly and probably thousands more who don't even know it's happening. This ticket should be changed to Priority "HIGH".

@danparisd
Copy link

We're seeing a similar issue. 2024.6.1 for all clients for us. Started 7/22/24 for us.

@Mika-
Copy link

Mika- commented Jul 26, 2024

My tunnel went first time down on 12th and after that it's been working couple of days at a time. Yesterday it only worked couple of hours after restarting so I tested reverting to an older version. Now almost a day with version 2024.4.1 I haven't (yet) seen any issues. Before these issues tunnel had been working without any problems way over a year.

@danparisd
Copy link

Our connectors are set to use http2, anyone having this issue using quic?

@tholland15
Copy link

Our connectors are set to use http2, anyone having this issue using quic?

Yes I only use quic and have this issue.

@Siwus90
Copy link

Siwus90 commented Jul 26, 2024

Same here, HA 12.4, Cloudflared version: 5.1.15

@jhult
Copy link

jhult commented Jul 26, 2024

Running a NixOS 23.11 VM with version 2024.1.5 directly installed via nix.

We noticed issues as early as July 17 but possibly even a few days earlier.

@nperez0111
Copy link

I've also noticed this issue. Is affecting the stability of my tunnels with no obvious sign of issues, I wish there was a health check so that I could restart my tunnel when I have an issue like this.

@jhult
Copy link

jhult commented Jul 28, 2024

@dledfordcf, any news from the inside team(s)?

@DevinCarr
Copy link
Contributor

At this time, the impact should no longer be visible. We had made a change on the edge that caused a small amount of QUIC packets to be routed and dropped for some cloudflared tunnel connections. This is the reason why many of your cloudflared logs mentioned timeouts and remote/local closing the stream connections.

This change has been rolled back and your tunnels should go back to normal without any change on your part.

However, please keep in mind that you may still occasionally see the error message in your cloudflared logs: # stream #### canceled by remote with error code 0. This can happen from a varying set of reasons, such as:

  • The eyeball making the original request has left abruptly
  • Significant packets loss to/from Cloudflare from cloudflared
  • Origin service is unable to handle connection load (causing timeouts)

Thank you for your patience as we investigated this.

@825i
Copy link

825i commented Jul 30, 2024

I'm still seeing this problem. I realise that you said we'll still sometimes see it.

Does this really count as a fix then? At least I guess I'll have to wait and see.

@alby258
Copy link

alby258 commented Jul 30, 2024

Same for me. The problem is still here on every hour

aoirint added a commit to aoirint/mstdn.aoirint.com that referenced this issue Aug 1, 2024
@morpig
Copy link

morpig commented Aug 2, 2024

However, please keep in mind that you may still occasionally see the error message in your cloudflared logs: # stream #### canceled by remote with error code 0.

per your last comment @DevinCarr, is it possible to make it silent/only shown on deeper log levels?

I dont think this is shown at general level on web servers such as nginx (eyeball early disconnects, etc..) please correct me if i'm wrong.

@LewisSpring
Copy link

Hi.
Still getting this issue quite badly.

2024-08-27T22:37:48Z ERR Request failed error="stream 529 canceled by remote with error code 0" connIndex=3 dest=https://example.com/1080p.mp4 event=0 ip=198.41.192.107 type=http
2024-08-27T22:37:48Z ERR  error="stream 533 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=1 originService=http://172.20.0.3:9005
2024-08-27T22:37:48Z ERR Request failed error="stream 533 canceled by remote with error code 0" connIndex=3 dest=https://example.com/1080p.mp4 event=0 ip=198.41.192.107 type=http
2024-08-27T22:37:54Z ERR  error="stream 537 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=1 originService=http://172.20.0.3:9005
2024-08-27T22:37:54Z ERR Request failed error="stream 537 canceled by remote with error code 0" connIndex=3 dest=https://example.com/1080p.mp4 event=0 ip=198.41.192.107 type=http
2024-08-27T22:37:55Z ERR  error="stream 541 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=1 originService=http://172.20.0.3:9005
2024-08-27T22:37:55Z ERR Request failed error="stream 541 canceled by remote with error code 0" connIndex=3 dest=https://example.com/1080p.mp4 event=0 ip=198.41.192.107 type=http
2024-08-27T22:38:01Z ERR  error="stream 545 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=1 originService=http://172.20.0.3:9005
2024-08-27T22:38:01Z ERR Request failed error="stream 545 canceled by remote with error code 0" connIndex=3 dest=https://example.com/1080p.mp4 event=0 ip=198.41.192.107 type=http
2024-08-27T22:38:01Z ERR  error="stream 549 canceled by remote with error code 0" connIndex=3 event=1 ingressRule=1 originService=http://172.20.0.3:9005
2024-08-27T22:38:01Z ERR Request failed error="stream 549 canceled by remote with error code 0" connIndex=3 dest=https://example.com/1080p.mp4 event=0 ip=198.41.192.107 type=http

Anything I can provide for further investigation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Normal Minor issue impacting one or more users Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests