Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(consensus): test receiver to mock network failures during simulations #166

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

matan-starkware
Copy link
Contributor

@matan-starkware matan-starkware commented Jul 29, 2024

This change is Reviewable

@matan-starkware
Copy link
Contributor Author

matan-starkware commented Jul 29, 2024

@matan-starkware matan-starkware marked this pull request as ready for review July 29, 2024 08:50
@matan-starkware matan-starkware force-pushed the matan/consensus/m3/test_network_receiver branch from 73b6681 to 2cf7254 Compare July 29, 2024 08:52
Copy link

graphite-app bot commented Jul 29, 2024

Graphite Automations

"Request reviewers once CI passes" took an action on this PR • (07/29/24)

1 reviewer was added to this PR based on 's automation.

Copy link
Contributor

@asmaastarkware asmaastarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @dorimedini-starkware, @matan-starkware, and @ShahakShama)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 195 at r2 (raw file):

        drop(sender);

        while let Some(_) = receiver.next().await {

while receiver.next().await.is_some() {

Code quote:

while let Some(_) = receiver.next().await {

Copy link

codecov bot commented Jul 30, 2024

Codecov Report

Attention: Patch coverage is 95.77465% with 3 lines in your changes missing coverage. Please review.

Project coverage is 76.84%. Comparing base (293466d) to head (f5bb03a).
Report is 4 commits behind head on main.

Files Patch % Lines
...pyrus_consensus/src/simulation_network_receiver.rs 95.77% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #166      +/-   ##
==========================================
+ Coverage   76.76%   76.84%   +0.07%     
==========================================
  Files         311      312       +1     
  Lines       34356    34427      +71     
  Branches    34356    34427      +71     
==========================================
+ Hits        26375    26455      +80     
+ Misses       5692     5687       -5     
+ Partials     2289     2285       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@matan-starkware matan-starkware force-pushed the matan/consensus/m3/test_network_receiver branch from 0bfd2bb to b138c57 Compare July 31, 2024 07:31
Copy link
Contributor

@asmaastarkware asmaastarkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 of 4 files reviewed, 1 unresolved discussion (waiting on @dorimedini-starkware, @matan-starkware, and @ShahakShama)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 70 at r4 (raw file):

        invalid_probability: u64,
        drop_probability: u64,
        cache_size: usize,

to keep things consistent, let's keep the order of params as in the struct

Suggestion:

    pub fn new(
        receiver: ReceiverT,
        cache_size: usize,
        seed: u64,
        drop_probability: u64,
        invalid_probability: u64,
        

@matan-starkware matan-starkware force-pushed the matan/consensus/m3/test_network_receiver branch from b138c57 to 582a446 Compare July 31, 2024 12:29
Copy link
Contributor

@ShahakShama ShahakShama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r3, 1 of 1 files at r5, all commit messages.
Reviewable status: all files reviewed, 14 unresolved discussions (waiting on @asmaastarkware, @dorimedini-starkware, and @matan-starkware)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 13 at r5 (raw file):

/// A simple cache used to count the occurrences of a key. It is constant size and simply overwrites
/// keys when they overlap (resetting their count).
pub struct Cache {

What's the goal of this cache? Why not use an existing cache implementation like LruCache (which we use in central sync)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 14 at r5 (raw file):

/// keys when they overlap (resetting their count).
pub struct Cache {
    data: Vec<Option<(u64, u32)>>,

It's very not clear what each value stands for here, and what does none mean.
Either document it or define more structs with named fields to pass the meaning (I prefer the latter)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 18 at r5 (raw file):

impl Cache {
    fn new(size: usize) -> Self {

Why do you limit the size of the cache and not just rename it to counter? Are there too many messages in a test?


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 58 at r5 (raw file):

    pub seed: u64,
    pub drop_probability: u64,
    pub invalid_probability: u64,

Clarify that invalid probability is for the case that the message wasn't dropped (the final invalid probability is (1 - drop_probability) * invalid_probability)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 72 at r5 (raw file):

        invalid_probability: u64,
    ) -> Self {
        assert!(invalid_probability <= 100);

It's very unclear that 100 means 1. Why not f64 between 0 and 1?


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 91 at r5 (raw file):

        }

        let randint = self.calculate_msg_hash(&msg) % 100;

Extract this to a helper method that returns a bool


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 153 at r5 (raw file):

#[cfg(test)]
mod test {

this entire file is only declared as a module if cfg(test), so you can remove this two lines and unindent everything below


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 158 at r5 (raw file):

    use test_case::test_case;

    use super::*;

This is never good. Specify what you need


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 163 at r5 (raw file):

    #[test_case(false; "repeat_messages")]
    #[tokio::test]
    async fn test_invalid(distinct_messages: bool) {

This is a test for the NetworkReceiver, which is a test utility, right?
If yes, mention it explicitly in a comment
Same for the test below


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 165 at r5 (raw file):

    async fn test_invalid(distinct_messages: bool) {
        let (mut sender, receiver) = futures::channel::mpsc::unbounded();
        let mut receiver = NetworkReceiver::new(receiver, 10, 123, 0, 50);

Extract all these numbers to constants so the reader can understand what they mean (unfortunately rust is not python and we don't have kwargs)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 166 at r5 (raw file):

        let (mut sender, receiver) = futures::channel::mpsc::unbounded();
        let mut receiver = NetworkReceiver::new(receiver, 10, 123, 0, 50);
        let mut invalid = 0;

invalid -> invalid_messages


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 174 at r5 (raw file):

            let report_sender = futures::channel::oneshot::channel().0;
            let msg = ConsensusMessage::Proposal(proposal.clone());
            sender.send((Ok(msg.clone()), report_sender)).await.unwrap();

@eitanm-starkware it's worth noting that the network's users also create channels when they "mock the network" in tests


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 179 at r5 (raw file):

            }
        }
        assert!(40 <= invalid && invalid <= 60, "num_invalid={invalid}");

That sounds harsh. maybe 30/70? Could you remind me what's the probability this will happen?

@matan-starkware matan-starkware force-pushed the matan/consensus/m3/test_network_receiver branch from 582a446 to d166e52 Compare July 31, 2024 14:22
Copy link
Contributor Author

@matan-starkware matan-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 of 6 files reviewed, 14 unresolved discussions (waiting on @asmaastarkware, @dorimedini-starkware, and @ShahakShama)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 70 at r4 (raw file):

Previously, asmaastarkware (asmaa-starkware) wrote…

to keep things consistent, let's keep the order of params as in the struct

Done.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 13 at r5 (raw file):

Previously, ShahakShama wrote…

What's the goal of this cache? Why not use an existing cache implementation like LruCache (which we use in central sync)

I switched to LruCache. The comments on the struct definition of NetworkReceiever should make it clear why I'm using the cache. Can you tell me if that helps or what's missing?


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 14 at r5 (raw file):

Previously, ShahakShama wrote…

It's very not clear what each value stands for here, and what does none mean.
Either document it or define more structs with named fields to pass the meaning (I prefer the latter)

LruCache makes this obsolete.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 18 at r5 (raw file):

Previously, ShahakShama wrote…

Why do you limit the size of the cache and not just rename it to counter? Are there too many messages in a test?

Well I want this to be useful even for a longevity test, so I thought I have to cap the size at some point.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 58 at r5 (raw file):

Previously, ShahakShama wrote…

Clarify that invalid probability is for the case that the message wasn't dropped (the final invalid probability is (1 - drop_probability) * invalid_probability)

I put this on fn filter_msg


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 72 at r5 (raw file):

Previously, ShahakShama wrote…

It's very unclear that 100 means 1. Why not f64 between 0 and 1?

Done.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 91 at r5 (raw file):

Previously, ShahakShama wrote…

Extract this to a helper method that returns a bool

Done.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 153 at r5 (raw file):

Previously, ShahakShama wrote…

this entire file is only declared as a module if cfg(test), so you can remove this two lines and unindent everything below

I prefer to keep the tests in their own module.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 158 at r5 (raw file):

Previously, ShahakShama wrote…

This is never good. Specify what you need

Done.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 163 at r5 (raw file):

Previously, ShahakShama wrote…

This is a test for the NetworkReceiver, which is a test utility, right?
If yes, mention it explicitly in a comment
Same for the test below

added at the mod level for tests.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 165 at r5 (raw file):

Previously, ShahakShama wrote…

Extract all these numbers to constants so the reader can understand what they mean (unfortunately rust is not python and we don't have kwargs)

Done.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 166 at r5 (raw file):

Previously, ShahakShama wrote…

invalid -> invalid_messages

Done.


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 179 at r5 (raw file):

Previously, ShahakShama wrote…

That sounds harsh. maybe 30/70? Could you remind me what's the probability this will happen?

I used a binomial calculator to check and you make a good point. The odds of failure with properly working code (100 msgs, 50% probability):

  1. 40-60: 3.5%
  2. 30-70: 0.004%

Copy link
Contributor

@ShahakShama ShahakShama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r6, all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @asmaastarkware, @dorimedini-starkware, and @matan-starkware)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 153 at r5 (raw file):

Previously, matan-starkware wrote…

I prefer to keep the tests in their own module.

Our convention is not to have mod foo { inside our code. If you want to keep them separate, please create a separate file


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 179 at r5 (raw file):

Previously, matan-starkware wrote…

I used a binomial calculator to check and you make a good point. The odds of failure with properly working code (100 msgs, 50% probability):

  1. 40-60: 3.5%
  2. 30-70: 0.004%

0.004% is still high (2^-14.6)
On the other side, changing it to 20/80 allows the test to pass if there's a bug.
Maybe increase the number of messages? could you send a link to your calculator

@matan-starkware matan-starkware force-pushed the matan/consensus/m3/test_network_receiver branch from d166e52 to f5bb03a Compare August 1, 2024 08:55
Copy link
Contributor Author

@matan-starkware matan-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 2 of 7 files reviewed, 3 unresolved discussions (waiting on @asmaastarkware, @dorimedini-starkware, and @ShahakShama)


crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 153 at r5 (raw file):

Previously, ShahakShama wrote…

Our convention is not to have mod foo { inside our code. If you want to keep them separate, please create a separate file

I/Asmaa actually realized that we want this for simulations, which are tests but we don't compile them with the flag for tests. So I am refactoring this:

  1. The file will be simulation_network_receiver
  2. I will move the tests to their own file.

crates/sequencing/papyrus_consensus/src/test_network_receiver.rs line 179 at r5 (raw file):

Previously, ShahakShama wrote…

0.004% is still high (2^-14.6)
On the other side, changing it to 20/80 allows the test to pass if there's a bug.
Maybe increase the number of messages? could you send a link to your calculator

https://stattrek.com/online-calculator/binomial

I'll just switch to 1000 messages. This way, the odds of being outside (400-600) are too low for the calculator to show anything.

Copy link
Contributor

@ShahakShama ShahakShama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 5 files at r7, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @asmaastarkware and @dorimedini-starkware)

@matan-starkware matan-starkware dismissed asmaastarkware’s stale review August 1, 2024 10:18

She resolved everything on the PR and together we can't find the blocking comment...

@matan-starkware matan-starkware force-pushed the matan/consensus/m3/test_network_receiver branch from f5bb03a to cff4d46 Compare August 1, 2024 10:18
Copy link

github-actions bot commented Aug 1, 2024

Benchmark movements:
tree_computation_flow performance regressed!
tree_computation_flow time: [34.589 ms 35.156 ms 35.800 ms]
change: [+3.1239% +4.9344% +6.8786%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
6 (6.00%) high mild
6 (6.00%) high severe

@matan-starkware matan-starkware merged commit e9b0b9c into main Aug 1, 2024
21 checks passed
@matan-starkware matan-starkware deleted the matan/consensus/m3/test_network_receiver branch August 1, 2024 10:27
@github-actions github-actions bot locked and limited conversation to collaborators Aug 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants