Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timeouts for builds #345

Merged
merged 3 commits into from
Apr 29, 2024
Merged

Conversation

zaneduffield
Copy link
Contributor

Fixes #322.

I mostly followed the suggested steps given in #322 (comment), with a couple of differences:

  • I didn't find any existing unit tests for CLI options, so I didn't add any CLI unit tests for the new options
  • I had to modify the rounding behaviour for timeout durations so that a very short build timeout wouldn't get rounded to zero (e.g. in the integration test of the well_tested tree)
  • I set the default build-timeout-multiplier value to 2, rather than the suggested 5. I figure that since incremental builds are expected to be substantially faster than the initial build, we shouldn't expect to be waiting very long for a build.
  • I didn't add a new ScenarioOutcome for a build timeout, mostly because in the default terminal output you can tell whether it was the build or the test that timed out by looking at the phase at the end of the line:
    • a build timeout looks like
      TIMEOUT  src/lib.rs:12:5: replace should_stop_const -> bool with false in 1.0s build
      
    • a test timeout looks like
      TIMEOUT  src/lib.rs:25:5: replace should_stop -> bool with false in 0.2s build + 1.0s test
      

In updating the book/src documentation, I felt like I would be repeating a lot of text if I were to create a new section just for build timeouts, so I rewrote the whole page to talk about both timeouts simultaneously.

@zaneduffield
Copy link
Contributor Author

I forgot to test after rebasing. I'll fix it later today.

@zaneduffield
Copy link
Contributor Author

zaneduffield commented Apr 21, 2024

The failing test passes on my system. I might need to implement a minimum for the build timeout to make the CI tests less flaky.

The reason I didn't do this at first was it made the well_tested_tree_with_short_build_timeout test difficult to implement. For that test, we really do want a short timeout that's less than any reasonable minimum. We can't use the plain --build-timeout option (which wouldn't be affected by the minimum timeout) to achieve this though, because that would cause the baseline build to timeout and we just want the incremental builds to timeout.

I could perhaps copy the --minimum-test-timeout option (and potentially the associated environment variable) for the build timeout, but it didn't feel right to be adding new CLI options just to make testing easier.

On the other hand, I could just delete the well_tested_tree_with_short_build_timeout test. I do have a test case that exercises this feature in the hang_when_mutated test tree.

What do you think @sourcefrog?

@sourcefrog
Copy link
Owner

Fixes #322.

I mostly followed the suggested steps given in #322 (comment), with a couple of differences:

  • I didn't find any existing unit tests for CLI options, so I didn't add any CLI unit tests for the new options

There are some like this at the bottom of options.rs but since this is probably a pretty simple option covered by other things it may not really be necessary

#[test]
fn test_tool_from_config() {
let config = indoc! { r#"
test_tool = "nextest"
"#};
let mut config_file = NamedTempFile::new().unwrap();
config_file.write_all(config.as_bytes()).unwrap();
let args = Args::parse_from(["mutants"]);
let config = Config::read_file(config_file.path()).unwrap();
let options = Options::new(&args, &config).unwrap();
assert_eq!(options.test_tool, TestTool::Nextest);
}

  • I had to modify the rounding behaviour for timeout durations so that a very short build timeout wouldn't get rounded to zero (e.g. in the integration test of the well_tested tree)
  • I set the default build-timeout-multiplier value to 2, rather than the suggested 5. I figure that since incremental builds are expected to be substantially faster than the initial build, we shouldn't expect to be waiting very long for a build.

Makes sense.

  • I didn't add a new ScenarioOutcome for a build timeout, mostly because in the default terminal output you can tell whether it was the build or the test that timed out by looking at the phase at the end of the line:

    • a build timeout looks like
      TIMEOUT  src/lib.rs:12:5: replace should_stop_const -> bool with false in 1.0s build
      
    • a test timeout looks like
      TIMEOUT  src/lib.rs:25:5: replace should_stop -> bool with false in 0.2s build + 1.0s test
      

In general I want to make sure it's visible in structured json output too, but it probably is, by looking at which phase failed, so this seems fine. Anyhow, your motivating case of a const function shows that it's really fairly similar to a timeout in running the tests.

In updating the book/src documentation, I felt like I would be repeating a lot of text if I were to create a new section just for build timeouts, so I rewrote the whole page to talk about both timeouts simultaneously.

Sounds good.

src/lab.rs Outdated
fn build_timeout(baseline_duration: Option<Duration>, options: &Options) -> Duration {
phase_timeout(
options.build_timeout,
Duration::MAX,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to say that with --check, i.e. running just cargo check --tests on each mutant, the build timeout is unlimited. I haven't checked yet but I suspect a const fn that hangs the compiler might also hang the checks? Maybe we should check that it doesn't.

And, maybe the logic for the check timeout should be the same as for non-check builds, i.e. a multiplier of the baseline?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, the book seems to say that there is a check timeout.

Copy link
Contributor Author

@zaneduffield zaneduffield Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was for the check timeouts to always be the same as the build. It's definitely true that the check can hang when a const function is divergent.

I didn't really test that the check works like this, though. Might be an area to improve the testing in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code here was a little backwards, the code change here
https://github.com/sourcefrog/cargo-mutants/pull/345/files/9cf221652a74ac8dff8fbe95a01b5909a56b3e0d#diff-43aa7803b24379ea66e0395da5f5baf8ea6eef4bd298b984bfa7c992f0de312dR309
made sure that the timeout used for check phases is whatever the final build timeout is.

I'll delete the special handing here, and add some tests to assert how it works.

@sourcefrog
Copy link
Owner

Thanks, that looks good!

tests for options

Oh I see you have some in lab.rs, that's fine.

I realized that one complication will be from --in-place: the tree may have been freshly build and so the baseline build will be very fast.

One option there would be to treat it analogous to --baseline=skip and just assume a fairly generous default of say 300s, and print a warning.

@@ -306,6 +356,18 @@ mod test {
);
}

#[test]
fn build_timeout_multiplier_from_option() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe these should also test --build-timeout.

src/lab.rs Outdated
fn build_timeout(baseline_duration: Option<Duration>, options: &Options) -> Duration {
phase_timeout(
options.build_timeout,
Duration::MAX,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, the book seems to say that there is a check timeout.

let options = Options::new(&args, &Config::default()).unwrap();

assert_eq!(options.build_timeout_multiplier, None);
assert_eq!(build_timeout(None, &options), Duration::from_secs(300),);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very obvious why this ends up at 300s. I presume this passes but it looks like the build timeout doesn't have a minimum, even if the baseline is skipped? And, the book doesn't seem to talk about this 300s minimum?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 300s fallback is used when

  • the explicit_timeout (i.e. --timeout/--build-timeout) is None, and
  • check_only is false, and
  • there is no baseline_duration

I was just following the original code here. I feel like the message printed when this 300s value is chosen is pretty clear.

I'll add a line about this in the book if that helps.

src/options.rs Outdated
build_timeout: args.build_timeout.map(Duration::from_secs_f64),
build_timeout_multiplier: config
.build_timeout_multiplier
.or(args.build_timeout_multiplier),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized they existing code is backwards: the args, if set, should have priority over the config, but I wrote the opposite. Can you make it args...or(config....)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll change that in a subsequent commit.


/// If `should_stop_const` is mutated to return false, then this const block
/// will hang and block compilation.
pub const VAL: i32 = loop {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think it would be good to test cargo mutants --check on this tree too.

tests/main.rs Show resolved Hide resolved
src/main.rs Outdated
build_timeout: Option<f64>,

/// build timeout multiplier (relative to base build time).
#[arg(long, help_heading = "Execution")]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make these conflict with each other, https://docs.rs/clap/latest/clap/struct.Arg.html#method.conflicts_with, although I did not yet do that on the existing arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll also put that in a subsequent commit.

src/lab.rs Outdated
);

// Round large timeouts to a nice even number, but round smaller timeouts to 2DP
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds from your post like this is here only so that you can test very short timeouts? I think I'd rather not have this wrinkle in the behavior just for that, when we could probably test it reasonably on the tree that's known to hang.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove.

src/lab.rs Outdated

// Round large timeouts to a nice even number, but round smaller timeouts to 2DP
timeout = if timeout.as_secs() >= 1 {
Duration::from_secs(timeout.as_secs_f64().round() as u64)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I know the existing code has round but especially when we're multiplying by a smaller number we had better use ceil so that it can't round to zero if the build is fast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ceil would be more appropriate here! That would have been a much simpler way to do it. I'll replace the original code, with round substituted for ceil.

src/lab.rs Outdated
.round() as u64,
),
check_only_timeout
} else if let Some(baseline_duration) = baseline_duration {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think somewhere in here we want to say, "if Options.in_place && phase == Build|Check" then don't use the baseline, give a warning and use the minimum.

That might be best done by giving this a phase enum parameter and folding the build/test_timeout functions into this, but see what works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I didn't know that the baseline can be computed in-place.

@sourcefrog
Copy link
Owner

Regarding the test failing with an unexpected timeout, https://github.com/sourcefrog/cargo-mutants/actions/runs/8770143687/job/24066400701


command=`env -u CARGO_MUTANTS_MINIMUM_TEST_TIMEOUT -u CARGO_TERM_COLOR "/home/runner/work/cargo-mutants/cargo-mutants/target/debug/cargo-mutants" "mutants" "--no-times" "--no-shuffle" "-d" "/tmp/.tmpRJLpMJ"`
code=3
stdout=```
Found 6 mutants to test
ok       Unmutated baseline
MISSED   src/b.rs:2:5: replace one_untested -> i32 with 0
MISSED   src/b.rs:2:5: replace one_untested -> i32 with -1
TIMEOUT  src/c.rs:2:5: replace one -> i32 with 0
6 mutants tested: 2 missed, 3 caught, 1 timeouts

I can't immediately see why that would timeout, but I guess it is just the build is fairly fast on the baseline and so the timeout is set low enough that it's slightly flaky.

I think you could add -Ltrace to the arguments, so that we can see more in the stderr of the failure what was going on.

This might be something that could happen in user's CI on their own tree so I think we should change something to avoid it: maybe a minimum build timeout hardcoded to 60s?

These should be a lot less common that mutants causing hangs in runtime code so I think we could afford to set the minimum fairly high.

@zaneduffield
Copy link
Contributor Author

I haven't finished responding to your review, but I've done as much as I can for today and pushed it. I'll return later this week and try and finish it.

@zaneduffield zaneduffield force-pushed the buildTimeout branch 2 times, most recently from 8cfe7ae to 3640f00 Compare April 23, 2024 11:28
@zaneduffield
Copy link
Contributor Author

The test failure now is a bit weird...
https://github.com/sourcefrog/cargo-mutants/actions/runs/8799621313/job/24149040400?pr=345#step:9:149

---- output::test::create_output_dir stdout ----
thread 'output::test::create_output_dir' panicked at src/output.rs:275:51:
called `Result::unwrap()` on an `Err` value: run cargo locate-project in "/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/.tmpqosAJJ"
error: test failed, to rerun pass `-p cargo-mutants --bin cargo-mutants`

Caused by:
    ["/Users/runner/.rustup/toolchains/1.74-x86_64-apple-darwin/bin/cargo", "locate-project", "--workspace"] failed with status Signaled(4): 

I'm not really sure what that's about, but I think I've addressed all the review comments now.

@zaneduffield zaneduffield requested a review from sourcefrog April 23, 2024 11:37
@sourcefrog
Copy link
Owner

cargo seems to be failing with signal 4, SIGILL, "illegal instruction". I see this sometimes on the GitHub actions and I don't know a cause for it yet, but it seems to be intermittent and fixed by a rebuild. Maybe it's some issue in their mac virtualization (if any).

With that fixed it looks like there is one missed mutant

MISSED src/lab.rs:225:46: replace != with == in phase_timeout in 6.8s build + 93.0s test

Please see if you can add a reasonable test for it; if not just let me know.

@zaneduffield
Copy link
Contributor Author

With that fixed it looks like there is one missed mutant

MISSED src/lab.rs:225:46: replace != with == in phase_timeout in 6.8s build + 93.0s test

Please see if you can add a reasonable test for it; if not just let me know.

Done

@zaneduffield
Copy link
Contributor Author

Also, this PR fixes #323.

@sourcefrog
Copy link
Owner

Thanks, I'll look on the weekend if not before.

@sourcefrog sourcefrog merged commit 9789d53 into sourcefrog:main Apr 29, 2024
36 checks passed
@zaneduffield zaneduffield deleted the buildTimeout branch April 29, 2024 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mutating const functions can cause the build to hang
2 participants