Skip to content

Latest commit

 

History

History
593 lines (444 loc) · 22.4 KB

README.md

File metadata and controls

593 lines (444 loc) · 22.4 KB

Firecracker Integration Tests

The tests herein are meant to uphold the security, quality, and performance contracts of Firecracker.

Running

The testing system is built around pytest. Our tools/devtool script is a convenience wrapper which automatically downloads necessary test artifacts from S3, before invoking pytest inside a docker container. For detailed help on usage, see tools/devtool help.

To run all available tests that would also run as part of our PR CI (e.g. excluding tests marked with pytest.mark.nonci):

tools/devtool -y test

To run only tests from specific directories and/or files:

tools/devtool -y test -- integration_tests/performance/test_boottime.py

To run a single specific test from a file:

tools/devtool -y test -- integration_tests/performance/test_boottime.py::test_boottime

Note that all paths should be specified relative to the tests directory, not the repository root.

Alternatively, pytest provides the option to run all tests where the test name contains some substring via the -k option:

tools/devtool -y test -- -k 1024 integration_tests/performance/test_boottime.py::test_boottime

This is particularly useful for specifying parameters of test functions. For example, the above command will run all boottime tests with a microVM size of 1024MB.

If you are not interested in the capabilities of devtool, use pytest directly, either from inside the container:

tools/devtool -y shell -p
pytest [<pytest argument>...]

or natively on your dev box:

python3 -m pytest [<pytest argument>...]

Output

Output, including testrun results, goes to stdout. Errors go to stderr. By default, stdout and stderr are captured while tests are running and are printed in the final failure report only if they fail. To print them while running regardless of success or failure, pass the -s flag, e.g. tools/devtool -y test -- -s.

Dependencies

  • A bare-metal Linux host with uname -r >= 4.14 and KVM enabled (/dev/kvm device node exists)
  • Docker
  • awscli version 2

Rust Integration Tests

The pytest-powered integration tests rely on Firecracker's HTTP API for configuring and communicating with the VMM. Alongside these, the vmm crate also includes several native-Rust integration tests, which exercise its programmatic API without the HTTP integration. Cargo automatically picks up these tests when cargo test is issued. They also count towards code coverage.

To run only the Rust integration tests:

cargo test --test integration_tests --all

Unlike unit tests, Rust integration tests are each run in a separate process. cargo also packages them in a new crate. This has several known side effects:

  1. Only the pub functions can be called. This is fine, as it allows the VMM to be consumed as a programmatic user would. If any function is necessary but not pub, please consider carefully whether it conceptually needs to be in the public interface before making it so.

  2. The correct functioning scenario of the vmm implies that it exits with code 0. This is necessary for proper resource cleanup. However, cargo doesn't expect the test process to initiate its own demise, therefore it will not be able to properly collect test output.

    Example:

    cargo test --test integration_tests
    running 3 tests
    test test_setup_serial_device ... ok

To learn more about Rust integration test, see the Rust book.

A/B-Tests

A/B-Testing is a testing strategy where some test function is executed twice in different environments (the A and B environments), and the overall test result depends on a comparison of these outputs of the test function in these two environments. The advantage of A/B-testing is that it does not require the specification of a ground truth to compare against. It is instead dynamically generated by running the test function in environment A. Firecracker's A/B-testing generally compares Firecracker binaries compiled from two separate commits (e.g. an A binary which is compiled from the HEAD of the main branch, and a B binary which is compiled from the HEAD of a pull request opened against main).

We use this testing approach if a test's ground truth...

  • ...can change due to influence external to the code base (e.g. a security test that fails if a CVE is published for one of our dependencies), or
  • ...is too complex/changes too often to reasonably be contained in the code base (e.g. extensive performance benchmark results).

For examples of how to utilize A/B-testing inside an integration test, have a look at our A/B-Testing module or our cargo audit test.

If such an A/B-Test is executed outside of the context of a PR (meaning there is no canonical choice of A and B to be made), it will simply try to assert the state of the environment in which it was executed (e.g. the cargo audit test above when run on a PR will fail iff a newly added dependency has a known open RustSec advisory. If run outside a PR, it will fail if any existing dependency has an open RustSec advisory).

Performance A/B-Tests

Firecracker has a special framework for orchestrating long-running A/B-tests which run outside the pre-PR CI. Instead, these tests are scheduled to run post-merge. Specific tests, such as our snapshot restore latency tests contain no assertions themselves, but rather they emit data series using the aws_embedded_metrics library. When executed by the tools/ab_test.py orchestration script, these data series are collected. The orchestration script executes each test twice with different Firecracker binaries, and then matches up corresponding data series from the A and B run. For each data series, it performs a non-parametric test. For each data series where the difference between the A and B run is considered statically significant, it will print out the associated metric. Please see tools/ab_test.py --help for information on how to configure what the script considers significant.

Writing your own A/B-Test is easy: Simply write a test that outputs a data series and has no functional assertions. Then, when this test is run under the A/B-Test orchestrator, all data series emitted will be picked up automatically for statistical analysis.

To add a new A/B-Test to our post-PR test suite, add the corresponding test function to .buildkite/pipeline_perf.py. To manually run an A/B-Test, use

tools/devtool -y test --ab [optional arguments to ab_test.py] run <revision A> <revision B> --test <test specification>

Here, revision A and revision B can be arbitrary git objects, such as commit SHAs, branches or tags. For example, to compare boottime of microVMs between Firecracker binaries compiled from the main branch and the HEAD of your current branch, run

tools/devtool -y test --ab run main HEAD --test integration_tests/performance/test_boottime.py::test_boottime

How to Write an A/B-Compatible Test and Common Pitfalls

First, A/B-Compatible tests need to emit more than one data point for each metric for which they wish to support A/B-testing. This is because non-parametric tests operate on data series instead of individual data points.

When emitting metrics with aws_embedded_metrics, each metric (data series) is associated with a set of dimensions. The tools/ab_test.py script uses these dimension to match up data series between two test runs. It only matches up two data series with the same name if their dimensions match.

Special care needs to be taken when pytest expands the argument passed to tools/ab_test.py's --test option into multiple individual test cases. If two test cases use the same dimensions for different data series, the script will fail and print out the names of the violating data series. For this reason, A/B-Compatible tests should include a performance_test key in their dimension set whose value is set to the name of the test.

In addition to the above, care should be taken that the dimensions of the data series emitted by some test case are unique to that test case. For example, if we have a boottime test parameterized by number of vcpus, but the emitted boottime data series' dimension set is just {"performance_test": "test_boottime"}, then tools/ab_test.py will not be able to tell apart data series belonging to different microVM sizes, and instead combine them (which is probably not desired). For this reason A/B-Compatible tests should always include all pytest parameters in their dimension set.

Lastly, performance A/B-Testing through tools/ab_test.py can only detect performance differences that are present in the Firecracker binary. The tools/ab_test.py script only checks out the revisions it is passed to execute cargo build to generate a Firecracker binary. It does not run integration tests in the context of the checked out revision. In particular, both the A and the B run will be triggered from within the same docker container, and using the same revision of the integration test code. This means it is not possible to use orchestrated A/B-Testing to assess the impact of, say, changing only python code (such as enabling logging). Only Rust code can be A/B-Tested. The exception to this are toolchain differences. If both specified revisions have rust-toolchain.toml files, then tools/ab_test.py will compile using the toolchain specified by the revision, instead of the toolchain installed in the docker container from which the script is executed.

A/B-Testing in Buildkite

We run automated A/B-Tests on every pull request after merge, if the pull request touches any rust code. The pipeline is generated by the pipeline_perf.py script. To manually schedule an A/B-Test in buildkite, the REVISION_A and REVISION_B environment variables need to be set in the "Environment Variables" field under "Options" in buildkite's "New Build" modal.

Beyond commit comparisons

While our automated A/B-Testing suite only supports A/B-Tests across commit ranges, you can also use the scripts to manually run A/B-comparisons for arbitrary environment (such as comparison how the same Firecracker binary behaves on different hosts).

For this, run the desired tests in your environments using devtool as you would for a non-A/B test. The only difference to a normal test run is you should set two environment variables: AWS_EMF_ENVIRONMENT=local and AWS_EMF_NAMESPACE=local:

AWS_EMF_ENVIRONMENT=local AWS_EMF_NAMESPACE=local tools/devtool -y test -- integration_tests/performance/test_boottime.py::test_boottime

This instructs aws_embedded_metrics to dump all data series that our A/B-Test orchestration would analyze to stdout, and pytest will capture this output into a file stored at ./test_results/test-report.json.

The tools/ab_test.py script can consume these test reports, so next collect your two test report files to your local machine and run

tools/ab_test.py analyze <first test-report.json> <second test-report.json>

This will then print the same analysis described in the previous sections.

Adding Python Tests

Tests can be added in any (existing or new) sub-directory of tests/, in files named test_*.py.

Fixtures

By default, pytest makes all fixtures in conftest.py available to all test functions. You can also create conftest.py in sub-directories containing tests, or define fixtures directly in test files. See the pytest documentation for details.

Most integration tests use fixtures that abstract away the creation and teardown of Firecracker processes. The following fixtures spawn Firecracker processes that are pre-initialized with specific guest kernels and rootfs:

  • uvm_plain_any is parametrized by the guest kernels supported by Firecracker and a read-only Ubuntu 22.04 squashfs as rootfs,
  • uvm_plain yields a Firecracker process pre-initialized with a 5.10 kernel and the same Ubuntu 22.04 squashfs.

Generally, tests should use the former if you are testing some interaction between the guest and Firecracker, while the latter should be used if Firecracker functionality unrelated to the guest is being tested.

Markers

Firecracker uses two special pytest markers to determine which tests are run in which context:

  • Tests marked as nonci are not run in the PR CI pipelines. Instead, they run in separate pipelines according to various cron schedules.
  • Tests marked as no_block_pr are run in the "optional" PR CI pipeline. This pipeline is not required to pass for merging a PR.

All tests without markers are run for every pull request, and are required to pass for the PR to be merged.

Adding Rust Tests

Add a new function annotated with #[test] in integration_tests.rs.

Working With Guest Files

There are helper methods for writing to and reading from a guest filesystem. For example, to overwrite the guest init process and later extract a log:

def test_with_any_microvm_and_my_init(test_microvm_any):
    # [...]
    test_microvm_any.slot.fsfiles['mounted_root_fs'].copy_to(my_init, 'sbin/')
    # [...]
    test_microvm_any.slot.fsfiles['mounted_root_fs'].copy_from('logs/', 'log')

copy_to() source paths are relative to the host root and destination paths are relative to the mounted_root_fs root. Vice versa for copy_from().

Copying files to/from a guest file system while the guest is running results in undefined behavior.

Example Manual Testrun

Running on an EC2 .metal instance with an Amazon Linux 2 AMI:

# Get firecracker
yum install -y git
git clone https://github.com/firecracker-microvm/firecracker.git

# Run all tests
cd firecracker
tools/devtool test

CI Environment

In our CI, integration tests are run on EC2 .metal instances. We list the instance types and host operating systems we test in our README. Multiple test runs can share a .metal instance, meaning it is possible to observe noisy neighbor effects when running the integration test suite (and particularly, tests should not assume the ability to configure host-global resources). The exception to this are integration tests found in integration_tests/performance. These tests are always executed single-tenant, and additionally tweak various host-level setting to achieve consistent performance. Please see the test section of tools/devtool help for more information.

Terminology

  • Testrun: A sandboxed run of all (or a selection of) integration tests.
  • Test Session: A pytest testing session. One per testrun. A Testrun will start a Test Session once the sandbox is created.
  • Test: A function named test_ from this tree, that ensures a feature, functional parameter, or quality metric of Firecracker. Should assert or raise an exception if it fails.
  • Fixture: A function that returns an object that makes it very easy to add Tests: E.g., a spawned Firecracker microvm. Fixtures are functions marked with @pytest.fixture from a files named either conftest.py, or from files where tests are found. See pytest documentation on fixtures.
  • Test Case: An element from the cartesian product of a Test and all possible states of its parameters (including its fixtures).

FAQ

Q1: I have a shell script that runs my tests and I don't want to rewrite it.
A1: Insofar as it makes sense, you should write it as a python test function. However, you can always call the script from a shim python test function. You can also add it as a microvm image resource in the s3 bucket (and it will be made available under microvm.slot.path) or copy it over to a guest filesystem as part of your test.

Q2: I want to add more tests that I don't want to commit to the Firecracker repository.
A2: Before a testrun or test session, just add your test directory under tests/. pytest will discover all tests in this tree.

Q3: I want to have my own test fixtures, and not commit them in the repo.
A3: Add a conftest.py file in your test directory, and place your fixtures there. pytest will bring them into scope for all your tests.

Q4: I want to use more/other microvm test images, but I don't want to add them to the common s3 bucket.
A4: Add your custom images to the build/img subdirectory in the Firecracker source tree. This directory is bind-mounted in the container and used as a local image cache.

Q5: How can I get live logger output from the tests?
A5: Accessing pytest.ini will allow you to modify logger settings.

Q6: Is there a way to speed up integration tests execution time?
A6: You can narrow down the test selection as described in the Running section. For example:

  1. Pass the -k substring option to pytest to only run a subset of tests by specifying a part of their name.
  2. Only run the tests contained in a file or directory.

Implementation Goals

  • Easily run tests manually on a development/test machine, and in a continuous integration environments.
  • Each test should be independent, and self-contained. Tests will time out, expect a clean environment, and leave a clean environment behind.
  • Always run with the latest dependencies and resources.

Choice of Pytest & Dependencies

Pytest was chosen because:

  • Python makes it easy to work in the clouds.
  • Python has built-in sandbox (virtual environment) support.
  • pytest has great test discovery and allows for simple, function-like tests.
  • pytest has powerful test fixture support.

Test System TODOs

Note: The below TODOs are also mentioned in their respective code files.

Features

  • Use the Firecracker Open API spec to populate Microvm API resource URLs.
  • Event-based monitoring of microvm socket file creation to avoid while spins.
  • Self-tests (e.g., Tests that test the testing system).

Implementation

  • Looking into pytest-ordering to ensure test order.
  • Create an integrated, layered say system across the test runner and pytest (probably based on an environment variable).
  • Per test function dependency installation would make tests easier to write.
  • Type hinting is used sparsely across tests/* python module. The code would be more easily understood with consistent type hints everywhere.

Bug fixes

Further Reading

Contributing to this testing system requires a dive deep on pytest.

Troubleshooting tests

When troubleshooting tests, it is important to only narrow down the ones that are of interest. One can use the --last-failed parameter to only run the tests that failed from the previous run. Useful when several tests fail after making large changes.

Run tests from within the container

To avoid having to enter/exit Docker every test run, you can run the tests directly within a Docker session:

tools/devtool -y shell --privileged
tools/test.sh integration_tests/functional/test_api.py

How to use the Python debugger (pdb) for debugging

Just append --pdb, and when a test fails it will drop you in pdb, where you can examine local variables and the stack, and can use the normal Python REPL.

tools/devtool -y test -- -k 1024 integration_tests/performance/test_boottime.py::test_boottime --pdb

How to use ipython's ipdb instead of pdb

tools/devtool -y shell --privileged
export PYTEST_ADDOPTS=--pdbcls=IPython.terminal.debugger:TerminalPdb
tools/test.sh -k 1024 integration_tests/performance/test_boottime.py::test_boottime

There is a helper command in devtool that does just that, and is easier to type:

tools/devtool -y test_debug -k 1024 integration_tests/performance/test_boottime.py::test_boottime

How to connect to the console interactively

There is a helper to enable the console, but it has to be run before spawning the Firecracker process:

uvm.help.enable_console()
uvm.spawn()
uvm.basic_config()
uvm.start()
...

Once that is done, if you get dropped into pdb, you can do this to open a tmux tab connected to the console (via screen).

uvm.help.tmux_console()

How to reproduce intermittent (aka flaky) tests

Just run the test in a loop, and make it drop you into pdb when it fails.

while true; do
    tools/devtool -y test -- integration_tests/functional/test_balloon.py::test_deflate_on_oom -k False --pdb
done

How to run tests in parallel with -n

We can run the tests in parallel via pytest-xdist. Not all tests can run in parallel (the ones in build and performance are not supposed to run in parallel).

By default, the tests run sequentially. One can use the -n to control the parallelism. Just -n will run as many workers as CPUs, which may be too many. As a rough heuristic, use half the available CPUs. I use -n4 for my 8 CPU (HT-enabled) laptop. In metals 8 is a good number; more than that just gives diminishing returns.

tools/devtool -y test -- integration_tests/functional -n$(expr $(nproc) / 2) --dist worksteal

How to attach gdb to a running uvm

First, make the test fail and drop you into PDB. For example:

tools/devtool -y test_debug integration_tests/functional/test_api.py::test_api_happy_start --pdb

Then,

ipdb> test_microvm.help.gdbserver()

You get some instructions on how to run GDB to attach to gdbserver.

Sandbox

tools/devtool -y sandbox

That should drop you in an IPython REPL, where you can interact with a microvm:

uvm.help.print_log()
uvm.get_all_metrics()
uvm.ssh.run("ls")
snap = uvm.snapshot_full()
uvm.help.tmux_ssh()

It supports a number of options, you can check with devtool sandbox -- --help.

Running outside of Docker

Running without Docker

source /etc/os-release
case $ID-$VERSION_ID in
amzn-2)
    sudo yum remove -y python3
    sudo amazon-linux-extras install -y python3.8
    sudo ln -sv /usr/bin/python3.8 /usr/bin/python3
    sudo ln -sv /usr/bin/pip3.8 /usr/bin/pip3
esac

sudo pip3 install pytest ipython requests psutil tenacity filelock "urllib3<2.0" requests_unixsocket

sudo env PYTHONPATH=tests HOME=$HOME ~/.local/bin/ipython3 -i tools/sandbox.py -- --binary-dir ../repro/v1.4.1

Warning

Notice this runs as root!