[refactor] improve synchronizer and bulk block processing #59

akichidis · 2023-12-01T17:27:53Z

This PR is:

making the synchronzer continuously sending blocks to peers and not attempt send only when a new block is out.
sample peers based on latency to ensure that lower latency peers are prioritised (there might be some exceptions to that)
make the send operations on the block disseminator non blocking - so we avoid clogging the network loops
send multiple blocks at once and process them in bulk and pre-check if they have been already processed so we avoid any penalties from verification etc.

asonnino · 2023-12-05T05:05:43Z

mysticeti-core/src/core_thread/simulated.rs

- syncer: Mutex<Syncer<H, S, C>>,
+ syncer: RwLock<Syncer<H, S, C>>,


what is the insight behind this change? Are we afraid that too many reads will starve the writes?

Yeah, I am not sure we need this change, since this is simulated code anyway and simulator runs in the single thread(so there is no difference between Mutex and RwLock)

Yep this is not really needed, it has been transferred as part of some experimentation.

asonnino · 2023-12-05T05:06:50Z

mysticeti-core/src/metrics.rs

+ block_receive_latency: register_histogram_vec_with_registry!(
+ "block_receive_latency",
+ "The time it took for a block to reach our node",
+ &["authority", "proc"],


Is the authority the author of the block? What is proc?

I've refactored that metric a bit. The authority is basically the block author

asonnino · 2023-12-05T05:07:19Z

mysticeti-core/src/net_sync.rs

@@ -158,15 +168,23 @@ impl<H: BlockHandler + 'static, C: CommitObserver + 'static> NetworkSyncer<H, C>
 block_fetcher: Arc<BlockFetcher>,
 block_verifier: Arc<impl BlockVerifier>,
 metrics: Arc<Metrics>,
+ leader_timeout: Duration,
+ parameters: SynchronizerParameters,
+ cleanup_enabled: bool,


In which cases are we enabling/disabling cleanups?

That was mainly during some experimentation and found it convenient when investigated possible interference of clean up with latencies. I would suggest keeping it for now and we can always remote it later once closer to production, what do you think?

asonnino · 2023-12-05T06:49:20Z

mysticeti-core/src/simulated_network.rs

+ let (_al_sender, al_receiver) = tokio::sync::watch::channel(Duration::from_secs(0));
+ let (_bl_sender, bl_receiver) = tokio::sync::watch::channel(Duration::from_secs(0));


Is there an easy way to indicate what these channels are for (e.g comments or more explicit names)?

asonnino · 2023-12-05T06:50:30Z

mysticeti-core/src/synchronizer.rs

+ if to_send.len() >= CHUNK_SIZE {
+ self.send(peer, NetworkMessage::Blocks(std::mem::take(&mut to_send)))?;
+ }


If we do these kind of changes are we running benchmarking to ensure we are not regressing?

You mean in AWS right? Basically those changes are introduced as part of our integration and experimentation in private-testnet where we already see better results and more sustainable behaviour. Also we are willing to run AWS experiments too.

asonnino · 2023-12-05T06:52:47Z

mysticeti-core/src/synchronizer.rs

 .senders
 .iter()
- .filter(|&(index, _)| !except.contains(index))
+ .filter(|&(index, (_, latency_receiver))| {
+ !except.contains(index) && latency_receiver.borrow().as_millis() <= CUT_OFF_MILLIS


Should we also implement an anti-dos check on the helper nodes to ensure they do not allocate more than a set amount of resources per peer?

That's something that we definitely need to do but I would suggest doing it in later iterations as it's a separate domain on its own.

…eer latency. Also send and process blocks at bulk.

akichidis requested review from andll and asonnino December 4, 2023 15:00

asonnino reviewed Dec 5, 2023

View reviewed changes

akichidis force-pushed the akichidis/improve-commit-performance-2 branch from 2d2dea9 to 9c63296 Compare December 11, 2023 17:51

[refactor] improve the synchronizer strategy by taking into account p…

cb8f5b2

…eer latency. Also send and process blocks at bulk.

akichidis force-pushed the akichidis/synchronizer-improvements branch from f4ca0ed to cb8f5b2 Compare December 15, 2023 13:45

akichidis changed the base branch from akichidis/improve-commit-performance-2 to main December 15, 2023 13:45

akichidis added 2 commits December 15, 2023 14:23

[review] address review comments

2b4edcc

[review] address review comments

e8b8c07

akichidis merged commit f4bae7f into main Dec 20, 2023
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] improve synchronizer and bulk block processing #59

[refactor] improve synchronizer and bulk block processing #59

akichidis commented Dec 1, 2023

asonnino Dec 5, 2023

andll Dec 6, 2023

akichidis Dec 15, 2023

asonnino Dec 5, 2023

akichidis Dec 15, 2023

asonnino Dec 5, 2023

akichidis Dec 15, 2023

asonnino Dec 5, 2023

asonnino Dec 5, 2023

akichidis Dec 15, 2023

asonnino Dec 5, 2023

akichidis Dec 15, 2023

		syncer: Mutex<Syncer<H, S, C>>,
		syncer: RwLock<Syncer<H, S, C>>,

		let (_al_sender, al_receiver) = tokio::sync::watch::channel(Duration::from_secs(0));
		let (_bl_sender, bl_receiver) = tokio::sync::watch::channel(Duration::from_secs(0));

[refactor] improve synchronizer and bulk block processing #59

[refactor] improve synchronizer and bulk block processing #59

Conversation

akichidis commented Dec 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment