-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(iroh-net): allow the underlying UdpSockets to be rebound #2946
Conversation
Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2946/docs/iroh/ Last updated: 2024-11-26T17:16:36Z |
net-tools/netwatch/src/udp.rs
Outdated
} | ||
|
||
/// Marks this socket as needing a rebind | ||
pub fn mark_broken(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark_broken is not called from anywhere.
What's the reason this doesn't just set the option to none? Fear of deadlocks? I was going to look if this is a concern, but since it is not called I could not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all in progress, don't worry about the api too much for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer to go with setting the Option
to None
since it makes this less vulnerable to toctou races.
Maybe there's something that prevents setting is_broken
in a non synchronized way with rebind
(which is called after is_broken()
) in the higher level calls, but since this is exposed as part of the public i think it would be better to have rebinding (the use part) and is_broken (the check part) as a single locked entity.
We currently use the option only for drop but we could simply do the fancy drop routine when there's actually a socket
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going a bit further we could have a enum like
enum InnerSock {
Active(tokio::net::UddSocket),
Rebind(std::net::SocketAddr),
}
should use a teeny tiny less memory, does not require storing the address when not needed, no need to think about races about whether being broken and actually rebinding are being updated in some incorrect way and uses a single lock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe there's something that prevents setting
is_broken
in a non synchronized way withrebind
(which is called afteris_broken()
) in the higher level calls, but since this is exposed as part of the public i think it would be better to have rebinding (the use part) and is_broken (the check part) as a single locked entity.
I understand the desire, but mark_broken
needs to work without taking a write lock, so it needs to be seperated into an atomic, or its own lock
0323e98
to
16d1e39
Compare
Current run
From the last merged PR
|
d90f32a
to
8d574ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reviewed a small piece (what seemed more critical I think?) in any case I think this is the right approach. What we had before "replacing" some Arc
was totally wrong
net-tools/netwatch/src/udp.rs
Outdated
} | ||
|
||
/// Marks this socket as needing a rebind | ||
pub fn mark_broken(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also prefer to go with setting the Option
to None
since it makes this less vulnerable to toctou races.
Maybe there's something that prevents setting is_broken
in a non synchronized way with rebind
(which is called after is_broken()
) in the higher level calls, but since this is exposed as part of the public i think it would be better to have rebinding (the use part) and is_broken (the check part) as a single locked entity.
We currently use the option only for drop but we could simply do the fancy drop routine when there's actually a socket
net-tools/netwatch/src/udp.rs
Outdated
} | ||
|
||
/// Marks this socket as needing a rebind | ||
pub fn mark_broken(&self) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going a bit further we could have a enum like
enum InnerSock {
Active(tokio::net::UddSocket),
Rebind(std::net::SocketAddr),
}
should use a teeny tiny less memory, does not require storing the address when not needed, no need to think about races about whether being broken and actually rebinding are being updated in some incorrect way and uses a single lock
net-tools/netwatch/src/udp.rs
Outdated
let mut guard = self.socket.write().unwrap(); | ||
{ | ||
let socket = guard.take().expect("not yet dropped"); | ||
drop(socket); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we pay here the penalty of the slow drop of the previous socket?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do, but I this needs to be sync because it is called in poll methods, and I don’t see a way around it
iroh-net/src/magicsock/udp_conn.rs
Outdated
// update socket state | ||
let new_state = self.io.with_socket(|socket| { | ||
quinn_udp::UdpSocketState::new(quinn_udp::UdpSockRef::from(socket)) | ||
})??; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this fails the UdpConn is left in an unknown and unusable state right? But this takes &self
so there is no real indicator that this is broken afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is? all the methods will return an error if this happens
iroh-net/src/magicsock/udp_conn.rs
Outdated
warn!("failed to rebind socket: {:?}", err); | ||
// TODO: improve error | ||
let err = | ||
std::io::Error::new(std::io::ErrorKind::NotConnected, err.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that quinn will swallow these errors and try to carry on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rebind should directly return std::io::Error
? This branch returns NotConnected
while in try_send
it returns BrokenPipe
. Should these not be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear that quinn will swallow these errors and try to carry on.
really? that would sucks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps rebind should directly return
std::io::Error
?
I thought about it, but that would be wrong, because these are send and recv calls, returning an error about binding would be..odd. Which is why I want to return an error which means that the connection is broken.
iroh-net/src/magicsock/udp_conn.rs
Outdated
count = meta.len / meta.stride, | ||
dst = %meta.dst_ip.map(|x| x.to_string()).unwrap_or_default(), | ||
"UDP recv" | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
top marks for logging call! :)
net-tools/netwatch/src/udp.rs
Outdated
let guard = self.socket.read().unwrap(); | ||
let Some(socket) = guard.as_ref() else { | ||
return Err(std::io::Error::new( | ||
std::io::ErrorKind::BrokenPipe, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how was this error chosen? i mean, why is this one the right error? might be worth documenting in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was the closest to me that indicates that the connection is broken, would love to hear better suggestions.
0b5966c
to
e9bcb40
Compare
net-tools/netwatch/src/udp.rs
Outdated
let mut guard = self.socket.write().unwrap(); | ||
{ | ||
let Some(socket) = guard.take() else { | ||
bail!("cannot rebind closed socket"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should state the reason: socket already closed or no socket to rebind or something like that.
net-tools/netwatch/src/udp.rs
Outdated
} | ||
Err(err) => { | ||
warn!("failed to rebind socket: {:?}", err); | ||
// TODO: improve error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What still needs to be added to the error? I think this should be resolved before merging as otherwise it'll be here forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, I am just not sure what the best way here is
@flub cleaned up and dryed the code a bit, based on your comments |
to deal with broken pipes
f518925
to
605fd16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still unresolved comments, but essentially this looks fine I think.
} | ||
|
||
#[derive(Debug)] | ||
enum SocketState { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay this is nice
(Also make the formatter happy though: cargo make format
)
Current branch last run:
Current main:
Seems like its about the same or slight improvement. |
Description
In order to handle supsension and exits on mobile. we need to rebind our UDP sockets when they break.
This PR adds the ability to rebind the socket on errors, and does so automatically on known suspension errors for iOS.
When reviewing this, please specifically look at the duration of lock holding, as this is the most sensitive part in this code.
Some references for these errors
TODOs
Closes #2939
Breaking Changes
The overall API for
netmon::UdpSocket
has changed entirely, everything else is the same.Notes & open questions
Change checklist