-
Notifications
You must be signed in to change notification settings - Fork 31
Fuzzing ruamel yaml (Python) project with sydr fuzz (Atheris backend)
In this article I'll share my experience of fuzzing Python projects. For this purpose I use sydr-fuzz with Atheris backend. Sydr-fuzz was originally designed as a hybrid fuzzer that combines Sydr (DSE tool) and top world fuzzers like AFLplusplus and libFuzzer. Also, sydr-fuzz supports some useful features like crash triage by casr, ability to check security predicates, and some convenient subcommands for corpus minimization and code coverage collection. Atheris is a coverage-guided Python fuzzing engine. It supports fuzzing of Python code and also native extensions written for CPython. Atheris is based on libFuzzer. Atheris looks and works like libFuzzer, so we decided to support it in sydr-fuzz, why not? Though we don't have symbolic execution for Python code but we still could do fuzzing, crash triage, corpus minimization, and coverage collection using sydr-fuzz interface.
Atheris github page has a nice instruction about installing and using it. We will fuzz yaml project from it's examples. There is a docker container already prepared for building with all needed fuzzing environment. I'll use it for my fuzzing experiments, but for now let's look more precisely at fuzz target and build script.
import atheris
with atheris.instrument_imports():
from ruamel import yaml as ruamel_yaml
import sys
import warnings
# Suppress all warnings.
warnings.simplefilter("ignore")
ryaml = ruamel_yaml.YAML(typ="safe", pure=True)
ryaml.allow_duplicate_keys = True
@atheris.instrument_func
def TestOneInput(input_bytes):
fdp = atheris.FuzzedDataProvider(input_bytes)
data = fdp.ConsumeUnicode(sys.maxsize)
try:
iterator = ryaml.load_all(data)
for _ in iterator:
pass
except ruamel_yaml.error.YAMLError:
return
except Exception:
input_type = str(type(data))
codepoints = [hex(ord(x)) for x in data]
sys.stderr.write(
"Input was {input_type}: {data}\nCodepoints: {codepoints}".format(
input_type=input_type, data=data, codepoints=codepoints))
raise
def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
if __name__ == "__main__":
main()
First, we define which modules we want to instrument. There is also an ability to instrument everything: atheris.instrument_all()
. It might be useful when your project has many dependencies. Then we need to implement def TestOneInput(input_bytes)
, this is similar to int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size)
for C/C++. It is important to catch exceptions that are thrown by the target function. But you need to catch only those exceptions that are specified by developers or that the function throws directly. For example, IndexError doesn't need to be caught if it is not specified in documentation. Atheris catches it as a crash. At last, we need to write some code to start fuzzing process:
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()
As for build, we could just use pip install .
from project directory and install instrumented project in our fuzzing environment. Ok, let's build docker container and start fuzzing!
Before we begin, let's look at yaml_fuzzer.toml:
exit-on-time = 3600
[atheris]
path = "/yaml_fuzzer.py"
args = "/corpus -dict=yaml.dict -jobs=1000 -workers=4"
It's pretty simple.
exit-on-time - is an optional parameter that takes time in seconds. If during this time (1 hour in our case) the coverage does not increase, fuzzing is automatically terminated.
I'll use 4 workers for fuzzing till 1000 crashes are found or exit-on-time is triggered. Let's start fuzzing with this command:
# sydr-fuzz -c yaml_fuzzer.toml run
[2023-01-11 17:22:47] [INFO] #3582 RELOAD cov: 1178 ft: 5252 corp: 478/64Kb lim: 487 exec/s: 275 rss: 713Mb
[2023-01-11 17:22:48] [INFO] Uncaught Python exception: KeyError: (0, 1) /fuzz/yaml_fuzzer-out/crashes/crash-a0acd109aef7675ce2268eec4e0901759f4e1edc
[2023-01-11 17:22:50] [INFO] #17540 REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 343 rss: 677Mb L: 13/481 MS: 2 CrossOver-EraseBytes-
[2023-01-11 17:22:50] [INFO] #17573 REDUCE cov: 1178 ft: 5257 corp: 511/86Kb lim: 481 exec/s: 344 rss: 677Mb L: 58/481 MS: 3 ChangeBit-ManualDict-EraseBytes- DE: "'"-
[2023-01-11 17:22:50] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-4230d57dcf9dce49804ffd9abbc43a751068c6a2
[2023-01-11 17:22:55] [INFO] #1024 pulse cov: 1171 ft: 4926 corp: 413/36Kb exec/s: 204 rss: 710Mb
[2023-01-11 17:22:55] [INFO] [ATHERIS] run time : 0 days, 0 hrs, 0 min, 57 sec
[2023-01-11 17:22:55] [INFO] [ATHERIS] last new find : 0 days, 0 hrs, 0 min, 8 sec
[2023-01-11 17:22:57] [INFO] #1268 INITED cov: 1178 ft: 5265 corp: 477/60Kb exec/s: 181 rss: 710Mb
After some amount of time we have found some crashes. Let's wait till fuzzing is finished.
[2023-01-11 19:21:27] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-1f71bdb8cbba856923a45f50c7873bdb7ef64e2d
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 5) /fuzz/yaml_fuzzer-out/crashes/crash-c3152f019e73c4c01925ae1533f47583fe3006df
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (2, 1) /fuzz/yaml_fuzzer-out/crashes/crash-fed9747bec6197c9c8cbc0cf10c051c8f807d407
[2023-01-11 19:21:28] [INFO] Uncaught Python exception: KeyError: (1, 0) /fuzz/yaml_fuzzer-out/crashes/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 19:21:41] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] EXIT_ON_TIME: No new coverage (cov) for 3600 secs.
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing corpus is saved in /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 19:21:42] [INFO] [RESULTS] oom/leak/timeout/crash: 0/0/0/407
[2023-01-11 19:21:42] [INFO] [RESULTS] Fuzzing results are saved in /fuzz/yaml_fuzzer-out/crashes
Nice, our fuzzing experiment is ended by exit-on-time. We've got 407 crashes to analyze! This is a job for casr.
Let's minimize corpus first:
# sydr-fuzz -c yaml_fuzzer.toml cmin
[2023-01-11 20:30:08] [INFO] Original fuzzing corpus saved as /fuzz/yaml_fuzzer-out/corpus-old
[2023-01-11 20:30:08] [INFO] Minimizing corpus /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:30:08] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:30:08] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:30:08] [INFO] Launching atheris: "/yaml_fuzzer.py" "-merge=1" "-artifact_prefix=/fuzz/yaml_fuzzer-out/crashes/" "-close_fd_mask=2" "-verbosity=2" "-detect_leaks=0" "-dict=/fuzz/yaml.dict" "/fuzz/yaml_fuzzer-out/corpus" "/fuzz/yaml_fuzzer-out/corpus-old"
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: 8719 files, 0 in the initial corpus, 0 processed earlier
[2023-01-11 20:30:10] [INFO] MERGE-OUTER: attempt 1
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: successful in 1 attempt(s)
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: the control file has 982127 bytes
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: consumed 0Mb (120Mb rss) to parse the control file
[2023-01-11 20:31:04] [INFO] MERGE-OUTER: 913 new files with 7301 new features added; 1249 new coverage edges
We've narrowed 8719 files to 913 files, nice! Now we can collect the code coverage!
For code coverage we use well-known coverage python module and this instruction from Atheris GitHub. Of course, we've wrapped it into sydr-fuzz pycov
subcommand. Let's get html coverage report:
# sydr-fuzz -c yaml_fuzzer.toml pycov html
[2023-01-11 20:37:47] [INFO] Running pycov html "/fuzz/yaml_fuzzer.toml"
[2023-01-11 20:37:47] [INFO] Collecting coverage data for each file in corpus: /fuzz/yaml_fuzzer-out/corpus
[2023-01-11 20:37:47] [INFO] Saving coverage data to /fuzz/yaml_fuzzer-out/coverage/html/.coverage
[2023-01-11 20:37:47] [INFO] Using LD_PRELOAD="/usr/local/lib/python3.8/dist-packages/asan_with_fuzzer.so"
[2023-01-11 20:37:47] [INFO] ASAN_OPTIONS="abort_on_error=1,detect_leaks=0,malloc_context_size=0,symbolize=0,allocator_may_return_null=1"
[2023-01-11 20:37:47] [INFO] Collecting coverage: "coverage" "run" "/yaml_fuzzer.py" "-atheris_runs=914"
[2023-01-11 20:37:51] [INFO] Running coverage html: "coverage" "html" "-d" "/fuzz/yaml_fuzzer-out/coverage/html" "--data-file=/fuzz/yaml_fuzzer-out/coverage/html/.coverage"
Wrote HTML report to /fuzz/yaml_fuzzer-out/coverage/html/index.html
Good, we've got the coverage, let's look at it and move on further!
As I said before, I'll use casr via sydr-fuzz casr
subcommand for crash triage:
# sydr-fuzz -c yaml_fuzzer.toml casr
You can learn more about casr from it's repository or from my other fuzzing tutorial.
Let's look at casr output:
[2023-01-11 20:47:14] [INFO] Casr-cluster: deduplication of casr reports...
[2023-01-11 20:47:16] [INFO] Reports before deduplication: 407; after: 16
[2023-01-11 20:47:16] [INFO] Casr-cluster: clustering casr reports...
[2023-01-11 20:47:16] [INFO] Reports before clustering: 16. Clusters: 8
[2023-01-11 20:47:16] [INFO] Copying inputs...
[2023-01-11 20:47:16] [INFO] Done!
[2023-01-11 20:47:16] [INFO] ==> <cl1>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl1/crash-bf5829959ccf0211640314bb30de19bc9bafdeb3
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl2>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl2/crash-e126eb63b0bc1aefac72c3f56dea8484577f1007
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: RecursionError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/events.py:78
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> RecursionError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl3>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl3/crash-017ee5d1bb2bee51263f083eb12a60711a3c84f1
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl4>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl4/crash-3637416d80df3c5961e05b0bd459b79009e2a182
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 2
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl5>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl5/crash-01ae8831870d95a5be5898dd17457235b851bfdf
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: KeyError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/resolver.py:361
[2023-01-11 20:47:16] [INFO] Similar crashes: 4
[2023-01-11 20:47:16] [INFO] Cluster summary -> KeyError: 4
[2023-01-11 20:47:16] [INFO] ==> <cl6>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-0ea90a02b95f99e850b036e49419a43103a54149
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:533
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl6/crash-988f305721849b6a75af3b3f424b4593901630c3
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 2
[2023-01-11 20:47:16] [INFO] ==> <cl7>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl7/crash-3f369c580ac61eded9d05eb06bc1ad6d0e90bfe1
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: ValueError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:498
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> ValueError: 1
[2023-01-11 20:47:16] [INFO] ==> <cl8>
[2023-01-11 20:47:16] [INFO] Crash: /fuzz/yaml_fuzzer-out/casr/cl8/crash-05451dc00f42aa97a064d2e08153bb84af113717
[2023-01-11 20:47:16] [INFO] casr-python: UNDEFINED: TypeError: /usr/local/lib/python3.8/dist-packages/ruamel/yaml/constructor.py:273
[2023-01-11 20:47:16] [INFO] Similar crashes: 1
[2023-01-11 20:47:16] [INFO] Cluster summary -> TypeError: 1
[2023-01-11 20:47:16] [INFO] SUMMARY -> RecursionError: 1 KeyError: 11 ValueError: 3 TypeError: 1
[2023-01-11 20:47:16] [INFO] Crashes and Casr reports are saved in /fuzz/yaml_fuzzer-out/casr
After deduplication we have 16 crashes splitted into 8 clusters. Nice, now we can get down to manual analysis. Let's look at some report, for example from cl6
:
An unhandled exception has occurred while converting string to float. Looks like an issue:).
In conclusion I want to say that Atheris is a cool fuzzer for Python code. Sydr-fuzz interface is neat. And of course casr, that can triage crashes for Python, helps a lot!
Andrey Fedotov