DistFuzz is a research fuzzing tool to effectively find bugs in distributed systems. It supports different kinds of distributed system inputs including regular input, faults and timing interval, and uses network message sequences with symmetry-based pruning as fuzzing feedback. It can minimize the initialization time and reproduce bugs with checkpointing and offline record/replay.
- Braft
- NuRaft
- Dqlite
- Redis
- RethinkDB
- AerospikeDB
- ClickHouse
- etcd
- ZooKeeper
- HDFS
We recommend you to run the tool using the Docker image we provide.
docker run -it --cap-add=NET_ADMIN --device=/dev/net/tun --cap-add=SYS_PTRACE --cap-add=SYS_NICE --cap-add=IPC_LOCK --security-opt seccomp=unconfined zouyonghao/distfuzz:artifact
If you want to build the Docker container or install the tool and dependencies yourself, please refer to INSTALL.md.
For each system, there is a folder called ${SYSTEM}_test
, e.g., braft_test
. You need to enter the folder ${SYSTEM}_test/bin
and use the script fuzz.sh
to run the fuzzing test.
# For the systems evaluated in the paper:
# braft
cd DistFuzz/braft_test/bin && ./fuzz.sh
# NuRaft
cd DistFuzz/nuraft_test/bin/ && sudo ./fuzz.sh
# AerospikeDB
cd DistFuzz/aerospike_test/bin && ./fuzz.sh
# Dqlite
cd DistFuzz/dqlite_test/bin && ./fuzz.sh
# Redis
cd DistFuzz/redisraft_test/bin && ./fuzz.sh
# RethinkDB
cd DistFuzz/rethinkdb_test/bin && ./fuzz.sh
# ClickHouse
cd DistFuzz/clickhouse_test/bin && ./fuzz.sh
# etcd
cd DistFuzz/etcd_test/bin && ./fuzz.sh
# ZooKeeper
cd DistFuzz/zookeeper_test/bin && ./fuzz.sh
# HDFS
cd DistFuzz/hdfs_test/bin && ./fuzz.sh
After fuzzing for a while, you can check error cases under DistFuzz/${SYSTEM}_test/bin/test_cases
and fuzzing status in the file DistFuzz/${SYSTEM}_test/bin/output/fuzzer1/plot-curve
(bc:
for coverage).
To reproduce a bug, you need to copy the init_random.txt
file in your target bug folder, e.g., test_cases/100
, to the current folder. Then you need to use rr
to reproduce the bug. The time for reproducing bugs is not guaranteed.
We provide example bug reproduction Docker images we used during our communication with developers. Please note that rr
has requirements for CPU and our recent tested CPUs are Intel i9-10980XE
and Intel(R) Xeon(R) Gold 6248R
.
# For etcd-1 https://github.com/etcd-io/etcd/issues/13493
docker run -it --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined zouyonghao/etcd-13493 rr replay -a /root/test_cases/3169/rr_rec_1_0/
# For etcd-2 https://github.com/etcd-io/raft/issues/18
docker run -it --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined zouyonghao/etcd-10166 rr replay -a /root/10166/rr_rec_2_0/
# For ZooKeeper-[1-8]
docker run -it --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined zouyonghao/distfuzz:zookeeper-rr rr replay -a /home/zyh/zookeeper-[1-8]
core
contains the core libs of DistFuzz including shared memory, common events (like killing node), node manager and the main test logic.fuzz
contains the fuzzer that managers seeds and shared memory.strace
is the default fault injector and network sequence collector.rr
is the modified version of rr-debugger. It is used for record/replay but now it can also directly be used as the fault inject and network sequence collector. So, it actually can replacestrace
in the new version. But it's slower thanstrace
.bpf
can also replacestrace
and has better performance as it uses BPF as the fault injector. But its fault injection feature is limited. For example, it can only delay network messages or file operations within ~50ms. This is due to the limitation of BPF.${SYSTEM}_test/src
and${SYSTEM}_test/include
contain the custom events for the specific system.${SYSTEM}_test/bin
contains the scripts for the specific system.general_test
works as a template for users to understand how to add a new target system to DistFuzz.
Please check the FOUND_BUGS.txt