-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor] improve synchronizer and bulk block processing #59
Conversation
syncer: Mutex<Syncer<H, S, C>>, | ||
syncer: RwLock<Syncer<H, S, C>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the insight behind this change? Are we afraid that too many reads will starve the writes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I am not sure we need this change, since this is simulated code anyway and simulator runs in the single thread(so there is no difference between Mutex
and RwLock
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep this is not really needed, it has been transferred as part of some experimentation.
mysticeti-core/src/metrics.rs
Outdated
block_receive_latency: register_histogram_vec_with_registry!( | ||
"block_receive_latency", | ||
"The time it took for a block to reach our node", | ||
&["authority", "proc"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the authority
the author of the block? What is proc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've refactored that metric a bit. The authority
is basically the block author
@@ -158,15 +168,23 @@ impl<H: BlockHandler + 'static, C: CommitObserver + 'static> NetworkSyncer<H, C> | |||
block_fetcher: Arc<BlockFetcher>, | |||
block_verifier: Arc<impl BlockVerifier>, | |||
metrics: Arc<Metrics>, | |||
leader_timeout: Duration, | |||
parameters: SynchronizerParameters, | |||
cleanup_enabled: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which cases are we enabling/disabling cleanups?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was mainly during some experimentation and found it convenient when investigated possible interference of clean up with latencies. I would suggest keeping it for now and we can always remote it later once closer to production, what do you think?
let (_al_sender, al_receiver) = tokio::sync::watch::channel(Duration::from_secs(0)); | ||
let (_bl_sender, bl_receiver) = tokio::sync::watch::channel(Duration::from_secs(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an easy way to indicate what these channels are for (e.g comments or more explicit names)?
if to_send.len() >= CHUNK_SIZE { | ||
self.send(peer, NetworkMessage::Blocks(std::mem::take(&mut to_send)))?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do these kind of changes are we running benchmarking to ensure we are not regressing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean in AWS right? Basically those changes are introduced as part of our integration and experimentation in private-testnet where we already see better results and more sustainable behaviour. Also we are willing to run AWS experiments too.
.senders | ||
.iter() | ||
.filter(|&(index, _)| !except.contains(index)) | ||
.filter(|&(index, (_, latency_receiver))| { | ||
!except.contains(index) && latency_receiver.borrow().as_millis() <= CUT_OFF_MILLIS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also implement an anti-dos check on the helper nodes to ensure they do not allocate more than a set amount of resources per peer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's something that we definitely need to do but I would suggest doing it in later iterations as it's a separate domain on its own.
2d2dea9
to
9c63296
Compare
…eer latency. Also send and process blocks at bulk.
f4ca0ed
to
cb8f5b2
Compare
This PR is: