Add timeouts for builds #345

zaneduffield · 2024-04-21T03:22:49Z

Fixes #322.

I mostly followed the suggested steps given in #322 (comment), with a couple of differences:

I didn't find any existing unit tests for CLI options, so I didn't add any CLI unit tests for the new options
I had to modify the rounding behaviour for timeout durations so that a very short build timeout wouldn't get rounded to zero (e.g. in the integration test of the well_tested tree)
I set the default build-timeout-multiplier value to 2, rather than the suggested 5. I figure that since incremental builds are expected to be substantially faster than the initial build, we shouldn't expect to be waiting very long for a build.
I didn't add a new ScenarioOutcome for a build timeout, mostly because in the default terminal output you can tell whether it was the build or the test that timed out by looking at the phase at the end of the line:
- a build timeout looks like
```
TIMEOUT  src/lib.rs:12:5: replace should_stop_const -> bool with false in 1.0s build
```
- a test timeout looks like
```
TIMEOUT  src/lib.rs:25:5: replace should_stop -> bool with false in 0.2s build + 1.0s test
```

In updating the book/src documentation, I felt like I would be repeating a lot of text if I were to create a new section just for build timeouts, so I rewrote the whole page to talk about both timeouts simultaneously.

zaneduffield · 2024-04-21T03:41:38Z

I forgot to test after rebasing. I'll fix it later today.

zaneduffield · 2024-04-21T04:44:53Z

The failing test passes on my system. I might need to implement a minimum for the build timeout to make the CI tests less flaky.

The reason I didn't do this at first was it made the well_tested_tree_with_short_build_timeout test difficult to implement. For that test, we really do want a short timeout that's less than any reasonable minimum. We can't use the plain --build-timeout option (which wouldn't be affected by the minimum timeout) to achieve this though, because that would cause the baseline build to timeout and we just want the incremental builds to timeout.

I could perhaps copy the --minimum-test-timeout option (and potentially the associated environment variable) for the build timeout, but it didn't feel right to be adding new CLI options just to make testing easier.

On the other hand, I could just delete the well_tested_tree_with_short_build_timeout test. I do have a test case that exercises this feature in the hang_when_mutated test tree.

What do you think @sourcefrog?

sourcefrog · 2024-04-21T22:19:13Z

Fixes #322.

I mostly followed the suggested steps given in #322 (comment), with a couple of differences:

I didn't find any existing unit tests for CLI options, so I didn't add any CLI unit tests for the new options

There are some like this at the bottom of options.rs but since this is probably a pretty simple option covered by other things it may not really be necessary

cargo-mutants/src/options.rs

Lines 297 to 308 in 9cf2216

    
               #[test] 
        
               fn test_tool_from_config() { 
        
                   let config = indoc! { r#" 
        
                       test_tool = "nextest" 
        
                   "#}; 
        
                   let mut config_file = NamedTempFile::new().unwrap(); 
        
                   config_file.write_all(config.as_bytes()).unwrap(); 
        
                   let args = Args::parse_from(["mutants"]); 
        
                   let config = Config::read_file(config_file.path()).unwrap(); 
        
                   let options = Options::new(&args, &config).unwrap(); 
        
                   assert_eq!(options.test_tool, TestTool::Nextest); 
        
               }

I had to modify the rounding behaviour for timeout durations so that a very short build timeout wouldn't get rounded to zero (e.g. in the integration test of the well_tested tree)

I set the default build-timeout-multiplier value to 2, rather than the suggested 5. I figure that since incremental builds are expected to be substantially faster than the initial build, we shouldn't expect to be waiting very long for a build.

Makes sense.

I didn't add a new ScenarioOutcome for a build timeout, mostly because in the default terminal output you can tell whether it was the build or the test that timed out by looking at the phase at the end of the line:
a build timeout looks like
TIMEOUT  src/lib.rs:12:5: replace should_stop_const -> bool with false in 1.0s build
a test timeout looks like
TIMEOUT  src/lib.rs:25:5: replace should_stop -> bool with false in 0.2s build + 1.0s test

In general I want to make sure it's visible in structured json output too, but it probably is, by looking at which phase failed, so this seems fine. Anyhow, your motivating case of a const function shows that it's really fairly similar to a timeout in running the tests.

In updating the book/src documentation, I felt like I would be repeating a lot of text if I were to create a new section just for build timeouts, so I rewrote the whole page to talk about both timeouts simultaneously.

Sounds good.

sourcefrog · 2024-04-21T22:35:05Z

src/lab.rs

+fn build_timeout(baseline_duration: Option<Duration>, options: &Options) -> Duration {
+    phase_timeout(
+        options.build_timeout,
+        Duration::MAX,


This seems to say that with --check, i.e. running just cargo check --tests on each mutant, the build timeout is unlimited. I haven't checked yet but I suspect a const fn that hangs the compiler might also hang the checks? Maybe we should check that it doesn't.

And, maybe the logic for the check timeout should be the same as for non-check builds, i.e. a multiplier of the baseline?

And, the book seems to say that there is a check timeout.

My intent was for the check timeouts to always be the same as the build. It's definitely true that the check can hang when a const function is divergent.

I didn't really test that the check works like this, though. Might be an area to improve the testing in.

This code here was a little backwards, the code change here
https://github.com/sourcefrog/cargo-mutants/pull/345/files/9cf221652a74ac8dff8fbe95a01b5909a56b3e0d#diff-43aa7803b24379ea66e0395da5f5baf8ea6eef4bd298b984bfa7c992f0de312dR309
made sure that the timeout used for check phases is whatever the final build timeout is.

I'll delete the special handing here, and add some tests to assert how it works.

sourcefrog · 2024-04-21T23:02:13Z

Thanks, that looks good!

tests for options

Oh I see you have some in lab.rs, that's fine.

I realized that one complication will be from --in-place: the tree may have been freshly build and so the baseline build will be very fast.

One option there would be to treat it analogous to --baseline=skip and just assume a fairly generous default of say 300s, and print a warning.

sourcefrog · 2024-04-21T22:36:46Z

src/lab.rs

@@ -306,6 +356,18 @@ mod test {
        );
    }

+    #[test]
+    fn build_timeout_multiplier_from_option() {


Maybe these should also test --build-timeout.

sourcefrog · 2024-04-21T22:40:39Z

src/lab.rs

+fn build_timeout(baseline_duration: Option<Duration>, options: &Options) -> Duration {
+    phase_timeout(
+        options.build_timeout,
+        Duration::MAX,


And, the book seems to say that there is a check timeout.

sourcefrog · 2024-04-21T22:41:42Z

src/lab.rs

+        let options = Options::new(&args, &Config::default()).unwrap();
+
+        assert_eq!(options.build_timeout_multiplier, None);
+        assert_eq!(build_timeout(None, &options), Duration::from_secs(300),);


It's not very obvious why this ends up at 300s. I presume this passes but it looks like the build timeout doesn't have a minimum, even if the baseline is skipped? And, the book doesn't seem to talk about this 300s minimum?

The 300s fallback is used when

the explicit_timeout (i.e. --timeout/--build-timeout) is None, and

check_only is false, and

there is no baseline_duration

I was just following the original code here. I feel like the message printed when this 300s value is chosen is pretty clear.

I'll add a line about this in the book if that helps.

sourcefrog · 2024-04-21T22:42:58Z

src/options.rs

+            build_timeout: args.build_timeout.map(Duration::from_secs_f64),
+            build_timeout_multiplier: config
+                .build_timeout_multiplier
+                .or(args.build_timeout_multiplier),


I realized they existing code is backwards: the args, if set, should have priority over the config, but I wrote the opposite. Can you make it args...or(config....)?

Sure, I'll change that in a subsequent commit.

sourcefrog · 2024-04-21T22:43:48Z

testdata/hang_when_mutated/src/lib.rs

+
+/// If `should_stop_const` is mutated to return false, then this const block
+/// will hang and block compilation.
+pub const VAL: i32 = loop {


Nice. I think it would be good to test cargo mutants --check on this tree too.

tests/main.rs

sourcefrog · 2024-04-21T22:52:23Z

src/main.rs

+    build_timeout: Option<f64>,
+
+    /// build timeout multiplier (relative to base build time).
+    #[arg(long, help_heading = "Execution")]


We could make these conflict with each other, https://docs.rs/clap/latest/clap/struct.Arg.html#method.conflicts_with, although I did not yet do that on the existing arguments.

Sure, I'll also put that in a subsequent commit.

sourcefrog · 2024-04-21T22:55:44Z

src/lab.rs

        );
+
+        // Round large timeouts to a nice even number, but round smaller timeouts to 2DP


It sounds from your post like this is here only so that you can test very short timeouts? I think I'd rather not have this wrinkle in the behavior just for that, when we could probably test it reasonably on the tree that's known to hang.

Will remove.

sourcefrog · 2024-04-21T22:56:40Z

src/lab.rs

+
+        // Round large timeouts to a nice even number, but round smaller timeouts to 2DP
+        timeout = if timeout.as_secs() >= 1 {
+            Duration::from_secs(timeout.as_secs_f64().round() as u64)


Also, I know the existing code has round but especially when we're multiplying by a smaller number we had better use ceil so that it can't round to zero if the build is fast.

ceil would be more appropriate here! That would have been a much simpler way to do it. I'll replace the original code, with round substituted for ceil.

sourcefrog · 2024-04-21T23:01:45Z

src/lab.rs

-                .round() as u64,
-            ),
+        check_only_timeout
+    } else if let Some(baseline_duration) = baseline_duration {


I think somewhere in here we want to say, "if Options.in_place && phase == Build|Check" then don't use the baseline, give a warning and use the minimum.

That might be best done by giving this a phase enum parameter and folding the build/test_timeout functions into this, but see what works.

Good point, I didn't know that the baseline can be computed in-place.

sourcefrog · 2024-04-21T23:11:43Z

Regarding the test failing with an unexpected timeout, https://github.com/sourcefrog/cargo-mutants/actions/runs/8770143687/job/24066400701


command=`env -u CARGO_MUTANTS_MINIMUM_TEST_TIMEOUT -u CARGO_TERM_COLOR "/home/runner/work/cargo-mutants/cargo-mutants/target/debug/cargo-mutants" "mutants" "--no-times" "--no-shuffle" "-d" "/tmp/.tmpRJLpMJ"`
code=3
stdout=```
Found 6 mutants to test
ok       Unmutated baseline
MISSED   src/b.rs:2:5: replace one_untested -> i32 with 0
MISSED   src/b.rs:2:5: replace one_untested -> i32 with -1
TIMEOUT  src/c.rs:2:5: replace one -> i32 with 0
6 mutants tested: 2 missed, 3 caught, 1 timeouts

I can't immediately see why that would timeout, but I guess it is just the build is fairly fast on the baseline and so the timeout is set low enough that it's slightly flaky.

I think you could add -Ltrace to the arguments, so that we can see more in the stderr of the failure what was going on.

This might be something that could happen in user's CI on their own tree so I think we should change something to avoid it: maybe a minimum build timeout hardcoded to 60s?

These should be a lot less common that mutants causing hangs in runtime code so I think we could afford to set the minimum fairly high.

zaneduffield · 2024-04-22T10:45:49Z

I haven't finished responding to your review, but I've done as much as I can for today and pushed it. I'll return later this week and try and finish it.

zaneduffield · 2024-04-23T11:37:20Z

The test failure now is a bit weird...
https://github.com/sourcefrog/cargo-mutants/actions/runs/8799621313/job/24149040400?pr=345#step:9:149

---- output::test::create_output_dir stdout ----
thread 'output::test::create_output_dir' panicked at src/output.rs:275:51:
called `Result::unwrap()` on an `Err` value: run cargo locate-project in "/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/.tmpqosAJJ"
error: test failed, to rerun pass `-p cargo-mutants --bin cargo-mutants`

Caused by:
    ["/Users/runner/.rustup/toolchains/1.74-x86_64-apple-darwin/bin/cargo", "locate-project", "--workspace"] failed with status Signaled(4):

I'm not really sure what that's about, but I think I've addressed all the review comments now.

sourcefrog · 2024-04-24T13:03:43Z

cargo seems to be failing with signal 4, SIGILL, "illegal instruction". I see this sometimes on the GitHub actions and I don't know a cause for it yet, but it seems to be intermittent and fixed by a rebuild. Maybe it's some issue in their mac virtualization (if any).

With that fixed it looks like there is one missed mutant

MISSED src/lab.rs:225:46: replace != with == in phase_timeout in 6.8s build + 93.0s test

Please see if you can add a reasonable test for it; if not just let me know.

…er options

zaneduffield · 2024-04-24T13:57:46Z

With that fixed it looks like there is one missed mutant

MISSED src/lab.rs:225:46: replace != with == in phase_timeout in 6.8s build + 93.0s test

Please see if you can add a reasonable test for it; if not just let me know.

Done

zaneduffield · 2024-04-25T03:53:33Z

Also, this PR fixes #323.

sourcefrog · 2024-04-25T16:04:46Z

Thanks, I'll look on the weekend if not before.

zaneduffield force-pushed the buildTimeout branch from cb581fc to 3c2ab1a Compare April 21, 2024 03:27

zaneduffield force-pushed the buildTimeout branch from 3c2ab1a to 9cf2216 Compare April 21, 2024 04:33

sourcefrog reviewed Apr 21, 2024

View reviewed changes

sourcefrog requested changes Apr 21, 2024

View reviewed changes

zaneduffield force-pushed the buildTimeout branch from 9cf2216 to 494ac06 Compare April 22, 2024 10:44

zaneduffield force-pushed the buildTimeout branch 2 times, most recently from 8cfe7ae to 3640f00 Compare April 23, 2024 11:28

zaneduffield requested a review from sourcefrog April 23, 2024 11:37

zaneduffield added 3 commits April 24, 2024 23:57

Add timeouts for builds

282611f

Make CLI timeout multipliers have precedence over config equivalents

cb7ac6e

Make CLI timeout options conflict with corresponding timeout-multipli…

da6443d

…er options

zaneduffield force-pushed the buildTimeout branch from 3640f00 to da6443d Compare April 24, 2024 13:57

sourcefrog approved these changes Apr 29, 2024

View reviewed changes

sourcefrog merged commit 9789d53 into sourcefrog:main Apr 29, 2024
36 checks passed

zaneduffield deleted the buildTimeout branch April 29, 2024 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeouts for builds #345

Add timeouts for builds #345

zaneduffield commented Apr 21, 2024

zaneduffield commented Apr 21, 2024

zaneduffield commented Apr 21, 2024 •

edited

Loading

sourcefrog commented Apr 21, 2024

sourcefrog Apr 21, 2024

sourcefrog Apr 21, 2024

zaneduffield Apr 21, 2024 •

edited

Loading

zaneduffield Apr 22, 2024

sourcefrog commented Apr 21, 2024

sourcefrog Apr 21, 2024

sourcefrog Apr 21, 2024

sourcefrog Apr 21, 2024

zaneduffield Apr 22, 2024

sourcefrog Apr 21, 2024

zaneduffield Apr 22, 2024

sourcefrog Apr 21, 2024

sourcefrog Apr 21, 2024

zaneduffield Apr 22, 2024

sourcefrog Apr 21, 2024

zaneduffield Apr 22, 2024

sourcefrog Apr 21, 2024

zaneduffield Apr 22, 2024

sourcefrog Apr 21, 2024

zaneduffield Apr 22, 2024

sourcefrog commented Apr 21, 2024

zaneduffield commented Apr 22, 2024

zaneduffield commented Apr 23, 2024

sourcefrog commented Apr 24, 2024

zaneduffield commented Apr 24, 2024

zaneduffield commented Apr 25, 2024

sourcefrog commented Apr 25, 2024

		);

		// Round large timeouts to a nice even number, but round smaller timeouts to 2DP

Add timeouts for builds #345

Add timeouts for builds #345

Conversation

zaneduffield commented Apr 21, 2024

zaneduffield commented Apr 21, 2024

zaneduffield commented Apr 21, 2024 • edited Loading

sourcefrog commented Apr 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zaneduffield Apr 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourcefrog commented Apr 21, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sourcefrog commented Apr 21, 2024

zaneduffield commented Apr 22, 2024

zaneduffield commented Apr 23, 2024

sourcefrog commented Apr 24, 2024

zaneduffield commented Apr 24, 2024

zaneduffield commented Apr 25, 2024

sourcefrog commented Apr 25, 2024

zaneduffield commented Apr 21, 2024 •

edited

Loading

zaneduffield Apr 21, 2024 •

edited

Loading