Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network traffic triggers watchdog #337

Closed
ryan-summers opened this issue Oct 11, 2023 · 16 comments · Fixed by #423
Closed

Network traffic triggers watchdog #337

ryan-summers opened this issue Oct 11, 2023 · 16 comments · Fixed by #423
Labels
bug Something isn't working

Comments

@ryan-summers
Copy link
Member

ryan-summers commented Oct 11, 2023

While testing out some multicast UDP traffic, I observed booster reboot twice. The service then indicated that a watchdog occurred. It's not clear if this was related to the Multicast traffic or not, but seemed to only occur when executing the Stabilizer streaming HITL with a streaming IP of 224.192.168.1

> service
Version             : Unspecified [release]
Hardware Revision   : v1.5
Rustc Version       : rustc 1.72.0 (5680fa18f 2023-08-23)
Features            :
Detected Phy        : W5500
Panic Info          : None
Watchdog Detected   : true

>
@ryan-summers ryan-summers added the bug Something isn't working label Oct 11, 2023
@ryan-summers
Copy link
Member Author

This was running on 1ea1124

@jordens
Copy link
Member

jordens commented Oct 11, 2023

n.b.
I'm also streaming multicast in the same net on 239.34.16.10.
I would not use multicast in 224.0.0.0/8. Let's stick to the admin scoped 239.0.0.0/8.

@ryan-summers
Copy link
Member Author

This appears pretty reproducible with 239.192.168.1 in my network. I'm looking into what's happening

@ryan-summers
Copy link
Member Author

Disabling the watchdog indicates that booster remains operational throughout the whole event and doesn't encounter a true lockup. What I believe may be happening is that there's so much traffic incoming on the PHY that smoltcp and the W5500 need to process that it slows down processing of the application in general. I wonder if there's a way we can handle events like this where there's excessive ethernet traffic.

@jordens
Copy link
Member

jordens commented Oct 12, 2023

It should not be seeing that traffic at all. The switch doesn't flood.
It might see IGMP traffic but that's not much at all and always has been there.

@ryan-summers
Copy link
Member Author

I see the green ethernet LED remain consistently on during the Stabilizer stream period, indicating that Booster is indeed receiving excessive packets.

I opened smoltcp-rs/smoltcp#848 around this, but this could be due to a cheap switch that isn't handling multicast properly as well.

@jordens
Copy link
Member

jordens commented Oct 12, 2023

That doesn't seem to be the case here. I'm seeing close to zero traffic towards stabilizer (while another one is streaming multicast). As expected.

@jordens
Copy link
Member

jordens commented Oct 12, 2023

This is likely then a W5500 or w5500 specific bug.

@ryan-summers
Copy link
Member Author

Do you have a router and/or managed switch in between? The switch I'm using with my local network is just some cheap unmanaged unit. I suspect that it's forwarding the multicast traffic to all of the ports regardless of subscription to the multicast group.

@jordens
Copy link
Member

jordens commented Oct 12, 2023

There is a cisco 2960L in between. Whether that one can be called "managed" or not is debatable. But it does behave properly regarding non-flooding of multicast.

@jordens jordens changed the title Multicast UDP traffic appears to cause Booster to encounter watchdog resets Network traffic triggers watchdog Jun 13, 2024
@aferk
Copy link

aferk commented Jul 8, 2024

We have found a similar issue when using the booster with an unmanaged switch with other traffic. The booster seems to reboot when it receives invalid packages on the MQTT port 1883. After that, when trying to enable channels, they only seem to change to powered instead of enabled and stay this way until the booster is powercycled.
This behavior can be reproduced by sending invalid packages via netcat, e.g. nc -u <booster-ip> 1883 < /dev/random.

We will use a vlan for now, but it would be nice if this could be fixed to make integration into existing network structures easier.

@jordens
Copy link
Member

jordens commented Jul 8, 2024

@aferk Contributions welcome.

@ryan-summers
Copy link
Member Author

@aferk if you connect to the device when it's in this state and run the platform service command and attach the output here, that would be helpful.

That will tell us why the device is resetting.

@aferk
Copy link

aferk commented Jul 12, 2024

Running service before sending packages:

> service
Version             : v0.5.0 [debug]
Hardware Revision   : v1.6
Rustc Version       : rustc 1.76.0 (07dca489a 2024-02-04)
Features            : 
Detected Phy        : W5500
Panic Info          : None
Watchdog Detected   : false

Sometimes I just get the Watchdog Detected: true as above, but when re-trying a few times, I do also get the more informative panic info below:

> service
Version             : v0.5.0 [debug]
Hardware Revision   : v1.6
Rustc Version       : rustc 1.76.0 (07dca489a 2024-02-04)
Features            : 
Detected Phy        : W5500
Panic Info          : panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/w5500-0.4.1/src/raw_device.rs:92:13:
attempt to subtract with overflow

Watchdog Detected   : true

Booster HW is version 1.6 (non-HL) from Creotec. We tried to upgrade the firmware because of this issue sinara-hw/Booster#393, which both of our non-HL boosters regularly experience.

@jordens
Copy link
Member

jordens commented Jul 12, 2024

@aferk The old firmware is dead and rotten. You will hopefully understand that this firmware here is bound to have the same fate as long as people don't invest into it or contribute or buy where the development is funded.

@ryan-summers
Copy link
Member Author

The panic referenced above should be fixed in the latest release of the w5500 crate, which was updated recently (in #373). I'm currently testing a fix for the watchdog event in smoltcp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants