Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix(iroh, iroh-relay)!: Optimise the relay datagram path through the …
…MagicSock (#3062) ## Description This refactors how datagrams flow from the MagicSock (AsyncUdpSocket) to relay server and back. It also vastly simplifies the actors involved in communicating with a relay server. - The `RelayActor` managed all connections to relay servers. - It starts a new `ActiveRelayActor` for each relay server needed. - The `ActiveRelayActor` will exit when unused. - Unless it is for the home relay, this one never exits. - Each `ActiveRelayActor` uses a relay `Client`. - The relay `Client` is now a `Stream` and `Sink` directly connected to the `TcpStream` connected to the relay server. This eliminates several actors previously used here in the `Client` and `Conn`. - Each `ActiveRelayActor` will try and maintain a connection with the relay server. - If connections fail, exponential backoff is used for reconnections. - When `AsyncUdpSocket` needs to send datagrams: - It (now) puts them on a queue to the `RelayActor`. - The `RelayActor` ensures the correct `ActiveRelayActor` is running and forwards datagrams to it. - The `ActiveRelayActor` sends datagrams directly to the relay server. - The relay receive path is now: - Whenever `ActiveRelayActor` is connected it reads from the underlying `TcpStream`. - Received datagrams are placed on an mpsc channel that now bypasses the `RelayActor` and goes straight to the `AsyncUpdSocket` interface. Along the way many bugs are fixed. Some of them: - The relay datagrams send and receive queue now behave more correctly when they are full. So the `AsyncUdpSocket` behaves better. - Though there still is a bug with the send queue not waking up all the tasks that might be waiting to send. This needs a followup: #3067. - The `RelayActor` now avoids blocking. This means it can still react to events when the datagrams queues are full and reconnect relay servers etc as needed to unblock. - The `ActiveRelayActor` also avoids blocking. Allowing it to react to connection breakage and the need to reconnect at any time. - The `ActiveRelayActor` now correctly handles connection errors and retries with backoff. - The `ActiveRleayActor` will no longer queue unsent datagrams forever, but flush them every 400ms. - This also stops the send queue into the `RelayActor` from completely blocking. ## Breaking Changes ### iroh-relay - `Conn` is no longer public. - The `Client` is completely changed. See module docs. ## Notes & open questions - Potentially the relay `Conn` and `Client` don't need to be two separate things now? Though Client is convenient as it only implements one Sink interface, while Conn is also a Frame sink. This means on Conn you often have to use very annoying syntax when calling things like .flush() or .close() etc. - Maybe a few items from the `ActiveRelayActor` can be moved back into the relay `Client`, though that would probably require some gymnastics. The current structure of `ActiveRelayActor` is fairly reasonable and handles things correctly. Though it does have a lot of client knowledge baked in. Being able to reason about the client as a stream + sink is what enabled me to write the good `ActiveRelayActor` though, so I'm fairly happy that this code makes sense as it is. If all goes well this should: Closes #3008 Closes #2971 Closes #2951 ## Change checklist - [x] Self-review. - [x] Documentation updates following the [style guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text), if relevant. - [x] Tests if relevant. - [x] All breaking changes documented. --------- Co-authored-by: Friedel Ziegelmayer <me@dignifiedquire.com>
- Loading branch information