Skip to content
This repository has been archived by the owner on Dec 13, 2021. It is now read-only.

Commit

Permalink
*: Release v0.2.0 (#149)
Browse files Browse the repository at this point in the history
Release Note: http://osrg.github.io/namazu/post/release-0-2-0/

Changes from v0.1.2:

 * New project name (Earthquake --> Namazu)
 * New feature: process inspector (useful for reproducing flaky xUnit test failures)
 * New feature: filesystem inspector (found YARN-4301)
 * New feature: Container CLI (Docker-like, human-friendly CLI)
 * New experimental feature: Semi-deterministic replaying API

Changes from v0.2.0-rc2:

 * #146: Reduce CPU consumption
 * #142: New project name (Earthquake --> Namazu)
 * #140: Integrate `earthquake-container` to `earthquake`
 * #139: Support static build
 * #137: New experimental feature: Semi-deterministic replaying API
 * #131: New minor feature: Standalone orchestrator
 * Improvement on docs
 * Some minor fixes
  • Loading branch information
AkihiroSuda committed May 20, 2016
1 parent c1d2e65 commit 2b34cff
Show file tree
Hide file tree
Showing 6 changed files with 172 additions and 61 deletions.
120 changes: 79 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,31 +17,59 @@ So Namazu can be also used for testing standalone multi-threaded software.

Basically, Namazu permutes events in a random order, but you can write your [own state exploration policy](doc/arch.md) (in Golang) for finding deep bugs efficiently.

[Namazu (鯰) means a catfish in Japanese](https://en.wiktionary.org/wiki/%E9%AF%B0).
[Namazu (鯰) means a catfish :fish: in Japanese](https://en.wiktionary.org/wiki/%E9%AF%B0).

Blog: [http://osrg.github.io/namazu/](http://osrg.github.io/namazu/)

Twitter: [@NamazuFuzzTest](https://twitter.com/NamazuFuzzTest)

## Found/Reproduced Bugs
* ZooKeeper:
* Found [ZOOKEEPER-2212](https://issues.apache.org/jira/browse/ZOOKEEPER-2212) (race): [blog article](http://osrg.github.io/namazu/post/zookeeper-2212/) ([repro code](example/zk-found-2212.ryu))
* Reproduced [ZOOKEEPER-2080](https://issues.apache.org/jira/browse/ZOOKEEPER-2080) (race): [blog article](http://osrg.github.io/namazu/post/zookeeper-2080/) ([repro code](example/zk-repro-2080.nfqhook))
* etcd:
* Found an etcd command line client (etcdctl) bug [#3517](https://github.com/coreos/etcd/issues/3517) (timing specification), fixed in [#3530](https://github.com/coreos/etcd/pull/3530): ([repro code](example/etcd/3517-reproduce)). The fix also resulted a hint of [#3611](https://github.com/coreos/etcd/pull/3611).
* Reproduced flaky tests {[#4006](https://github.com/coreos/etcd/pull/4006), [#4039](https://github.com/coreos/etcd/issues/4039)} ([repro instruction](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/42))
* YARN:
* Found [YARN-4301](https://issues.apache.org/jira/browse/YARN-4301) (fault tolerance): ([repro code](example/yarn/4301-reproduce))
* Reproduced flaky tests YARN-{[1978](https://issues.apache.org/jira/browse/YARN-1978), [4168](https://issues.apache.org/jira/browse/YARN-4168), [4543](https://issues.apache.org/jira/browse/YARN-4543), [4548](https://issues.apache.org/jira/browse/YARN-4548), [4556](https://issues.apache.org/jira/browse/YARN-4556)} ([repro instruction](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/42))

## Installation
## Found and Reproduced Bugs

:new:=Found, :repeat:=Reproduced

### Flaky integration tests

Issue|Reproducibility<br>(traditional)|Reproducibility<br>(Namazu)|Note
---|---|---|---
:new: [ZOOKEEPER-2212](https://issues.apache.org/jira/browse/ZOOKEEPER-2212)<br>(race)|0%|21.8%|In traditional testing, we could not reproduce the issue in 5,000 runs (60 hours). We newly found the issue and improved its reproducibility using Namazu Ethernet inspector. Note that the reproducibility improvement depends on its configuration(see also [#137](https://github.com/osrg/namazu/pull/137)).<br>[Blog article](http://osrg.github.io/namazu/post/zookeeper-2212/) and repro code ([Ryu SDN version](example/zk-found-2212.ryu) and [Netfilter version](example/zk-found-2212.nfqhook)) are available.

### Flaky xUnit tests (picked out, please see also [#125](https://github.com/osrg/namazu/issues/125))

Issue|Reproducibility<br>(traditional)|Reproducibility<br>(Namazu)|Note
---|---|---|---
:repeat: [YARN-4548](https://issues.apache.org/jira/browse/YARN-4548)|11%|82%|Used Namazu process inspector.
:repeat: [YARN-4556](https://issues.apache.org/jira/browse/YARN-4548)|2%|44%|Used Namazu process inspector.
:repeat: [ZOOKEEPER-2080](https://issues.apache.org/jira/browse/ZOOKEEPER-2080)|14%|62%|Used Namazu Ethernet inspector. [Blog article](http://osrg.github.io/namazu/post/zookeeper-2080/) and [repro code](example/zk-repro-2080.nfqhook) are available.
:repeat: [ZOOKEEPER-2137](https://issues.apache.org/jira/browse/ZOOKEEPER-2137)|2%|16%|Used Namazu process inspector.

We also improved reproducibility of some flaky etcd tests (to be documented).

### Others

Issue|Note
---|---
:new: [YARN-4301](https://issues.apache.org/jira/browse/YARN-4301)<br>(fault tolerance)|Used Namazu filesystem inspector and Namazu API. [Repro code](example/yarn/4301-reproduce) is available.
:new: etcd command line client (etcdctl) [#3517](https://github.com/coreos/etcd/issues/3517)<br>(timing specification)|Used Namazu Ethernet inspector. [Repro code](example/etcd/3517-reproduce) is available.<br>The issue has been fixed in [#3530](https://github.com/coreos/etcd/pull/3530) and it also resulted a hint of [#3611](https://github.com/coreos/etcd/pull/3611).

## Talks

* [ApacheCon Core North America](http://sched.co/6OJU) (May 11-13, 2016, Vancouver) [[slide](http://www.slideshare.net/AkihiroSuda/flaky-tests-and-bugs-in-apache-software-eg-hadoop)]
* [CoreOS Fest](http://sched.co/6Szb) (May 9-10, 2016, Berlin) [[slide](http://www.slideshare.net/mitakeh/namazu-a-debugger-for-distributed-systems-specific-bugs/1)]
* [FOSDEM](https://fosdem.org/2016/schedule/event/nondeterminism_in_hadoop/) (January 30-31, 2016, Brussels) [[slide](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497)]
* The poster session of [ACM Symposium on Cloud Computing (SoCC)](http://acmsocc.github.io/2015/) (August 27-29, 2015, Hawaii) [[poster](http://acmsocc.github.io/2015/posters/socc15posters-final18.pdf)]

## Getting Started
### Installation
The installation process is very simple:

$ sudo apt-get install libzmq3-dev libnetfilter-queue-dev
$ go get github.com/osrg/namazu/nmz

Currently, Namazu is tested with [Go 1.6](https://golang.org/dl/).

## Quick Start (Container mode)
You can also download the latest binary from [here](https://github.com/osrg/namazu/releases).

### Container Mode
The following instruction shows how you can start *Namazu Container*, the simplified, Docker-like CLI for Namazu.

$ sudo nmz container run -it --rm -v /foo:/foo ubuntu bash
Expand Down Expand Up @@ -93,9 +121,9 @@ explorePolicy = "random"
For other parameters, please refer to [`config.go`](nmz/util/config/config.go) and [`randompolicy.go`](nmz/explorepolicy/random/randompolicy.go).


## Quick Start (Non-container mode)
### Non-container Mode

### Process inspector
#### Process inspector

$ sudo nmz inspectors proc -pid $TARGET_PID -watch-interval 1s

Expand All @@ -111,10 +139,10 @@ Note that the process inspector may be not effective for reproducing short-runni
The guide for reproducing flaky Hadoop tests (please use `nmz` instead of `microearthquake`): [FOSDEM slide 42](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/42).


### Filesystem inspector (FUSE)
#### Filesystem inspector (FUSE)

$ mkdir /tmp/{nmzfs-orig,nmzfs}
$ sudo nmz inspectors fs -original-dir /tmp/nmzfs-orig -mount-point /tmp/nmzfs
$ sudo nmz inspectors fs -original-dir /tmp/nmzfs-orig -mount-point /tmp/nmzfs -autopilot config.toml
$ $TARGET_PROGRAM_WHICH_ACCESSES_TMP_NMZFS
$ sudo fusermount -u /tmp/nmzfs

Expand All @@ -124,7 +152,7 @@ By default, all the `read`, `mkdir`, and `rmdir` accesses to the files under `/t

You can also inject faullts (currently just injects `-EIO`) by setting `explorePolicyParam.faultActionProbability` in the config file.

### Ethernet inspector (Linux netfilter_queue)
#### Ethernet inspector (Linux netfilter_queue)

$ iptables -A OUTPUT -p tcp -m owner --uid-owner $(id -u johndoe) -j NFQUEUE --queue-num 42
$ sudo nmz inspectors ethernet -nfq-number 42
Expand All @@ -135,7 +163,7 @@ By default, all the packets for `johndoe` are randomly scheduled (with some opti

You can also inject faults (currently just drop packets) by setting `explorePolicyParam.faultActionProbability` in the config file.

### Ethernet inspector (Openflow 1.3)
#### Ethernet inspector (Openflow 1.3)

You have to install [ryu](https://github.com/osrg/ryu) and [hookswitch](https://github.com/osrg/hookswitch) for this feature.

Expand All @@ -145,10 +173,29 @@ You have to install [ryu](https://github.com/osrg/ryu) and [hookswitch](https://

Please also refer to [doc/how-to-setup-env-full.md](doc/how-to-setup-env-full.md) for this feature.

### Java inspector (AspectJ, byteman)
#### Java inspector (AspectJ, byteman)

To be documented

## How to Contribute
We welcome your contribution to Namazu.
Please feel free to send your pull requests on github!

$ git clone https://github.com/osrg/namazu.git
$ cd namazu
$ git checkout -b your-branch
$ ./build
$ your-editor foo.go
$ ./clean && ./build && go test -race ./...
$ git commit -a -s

## Copyright
Copyright (C) 2015 [Nippon Telegraph and Telephone Corporation](http://www.ntt.co.jp/index_e.html).

Released under [Apache License 2.0](LICENSE).

---------------------------------------
## Advanced Guide
### Distributed execution

Basically please follow these examples: [example/zk-found-2212.ryu](example/zk-found-2212.ryu), [example/zk-found-2212.nfqhook](example/zk-found-2212.nfqhook)
Expand Down Expand Up @@ -205,25 +252,7 @@ If you have [JaCoCo](http://eclemma.org/jacoco/) coverage data, you can run `jav

![doc/img/exec-pattern.png](doc/img/exec-pattern.png)

## Talks

* [ApacheCon Core North America](http://sched.co/6OJU) (May 11-13, 2016, Vancouver) [[slide](http://www.slideshare.net/AkihiroSuda/flaky-tests-and-bugs-in-apache-software-eg-hadoop)]
* [CoreOS Fest](http://sched.co/6Szb) (May 9-10, 2016, Berlin) [[slide](http://www.slideshare.net/mitakeh/namazu-a-debugger-for-distributed-systems-specific-bugs/1)]
* [FOSDEM](https://fosdem.org/2016/schedule/event/nondeterminism_in_hadoop/) (January 30-31, 2016, Brussels) [[slide](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497)]
* The poster session of [ACM Symposium on Cloud Computing (SoCC)](http://acmsocc.github.io/2015/) (August 27-29, 2015, Hawaii) [[poster](http://acmsocc.github.io/2015/posters/socc15posters-final18.pdf)]

## How to Contribute
We welcome your contribution to Namazu.
Please feel free to send your pull requests on github!

## Copyright
Copyright (C) 2015 [Nippon Telegraph and Telephone Corporation](http://www.ntt.co.jp/index_e.html).

Released under [Apache License 2.0](LICENSE).

---------------------------------------

## API for your own exploration policy
### API for your own exploration policy

```go
// implements nmz/explorepolicy/ExplorePolicy interface
Expand Down Expand Up @@ -273,6 +302,15 @@ func main(){
```
Please refer to [example/template](example/template) for further information.

## Known Limitation
### Semi-deterministic replay
If an event structure has `replay_hint` hash string (that does not contain time-dependent/random things),
you can semi-deterministically replay a scenario using `time.Duration(hash(seed,replay_hint) % maxInterval)`.
No record is required for replaying.

We have a PoC for ZOOKEEPER-2212. Please refer to [#137](https://github.com/osrg/namazu/pull/137).

We also implemented a similar thing for Go: [go-replay](https://github.com/AkihiroSuda/go-replay).

### Known Limitation
After running Namazu (process inspector with `exploreParam.procPolicyParam="dirichlet"`) many times, `sched_setattr(2)` can fail with `EBUSY`.
This seems to be a bug of kernel; We're looking into this.
22 changes: 5 additions & 17 deletions doc/blog/content/_index.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
+++
date = "2016-04-27"
date = "2016-05-20"
tags = ["document"]
title = "_index"

Expand All @@ -19,27 +19,15 @@ but we believe that one of the most important reasons is lacking of a good debug
![Overview](/namazu/images/namazu.png)

# Found/Reproduced Bugs
* ZooKeeper:
* Found [ZOOKEEPER-2212](https://issues.apache.org/jira/browse/ZOOKEEPER-2212) (race): [(blog article)]({{< relref "post/zookeeper-2212.md" >}})
* Reproduced [ZOOKEEPER-2080](https://issues.apache.org/jira/browse/ZOOKEEPER-2080) (race): [(blog article)]({{< relref "post/zookeeper-2080.md" >}})

* Etcd:
* Found an etcd command line client (etcdctl) bug [#3517](https://github.com/coreos/etcd/issues/3517) (timing specification), fixed in [#3530](https://github.com/coreos/etcd/pull/3530). The fix also resulted a hint of [#3611](https://github.com/coreos/etcd/issues/3611): To Be Documented
* Reproduced flaky tests {[#4006](https://github.com/coreos/etcd/pull/4006), [#4039](https://github.com/coreos/etcd/issues/4039)}


* YARN:
* Found [YARN-4301](https://issues.apache.org/jira/browse/YARN-4301) (fault tolerance): To Be Documented
* Reproduced flaky tests YARN-{[1978](https://issues.apache.org/jira/browse/YARN-1978), [4168](https://issues.apache.org/jira/browse/YARN-4168), [4543](https://issues.apache.org/jira/browse/YARN-4543), [4548](https://issues.apache.org/jira/browse/YARN-4548), [4556](https://issues.apache.org/jira/browse/YARN-4556)}

The repro codes are located on [namazu/example](https://github.com/osrg/namazu/tree/master/example).
Please refer to [README file](https://github.com/osrg/namazu/blob/master/README.md).

# How to use?
Please refer to [README file](https://github.com/osrg/namazu/blob/master/README.md).

[This article]({{< relref "post/zookeeper-2212.md" >}}) is also a good start point.
[This article](https://github.com/osrg/namazu/tree/master/example/zk-found-2212.ryu) is also a good start point.

[The slides for the presentation at FOSDEM](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/42) might be also helpful.
The slides for recent talks might be also helpful:
[ApacheCon](http://www.slideshare.net/AkihiroSuda/flaky-tests-and-bugs-in-apache-software-eg-hadoop), [CoreOS Fest](http://www.slideshare.net/mitakeh/namazu-a-debugger-for-distributed-systems-specific-bugs/1), and [FOSDEM](http://www.slideshare.net/AkihiroSuda/tackling-nondeterminism-in-hadoop-testing-and-debugging-distributed-systems-with-earthquake-57866497/42).

# Contact
The project is managed on [github](https://github.com/osrg/namazu).
Expand Down
85 changes: 85 additions & 0 deletions doc/blog/content/post/release-0-2-0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
+++
categories = ["blog"]
date = "2016-05-20"
tags = ["document"]
title = "Release Namazu v0.2.0"

+++
Note: we recently renamed _Earthquake_ to _Namazu_.

We are glad to annouce the release of [Namazu](https://github.com/osrg/namazu) v0.2.0.

![Overview](/namazu/images/namazu.png)

Namazu v0.2.0 includes many new features: Process inspector, Filesystem inspector, Container CLI, Semi-deterministic replaying API...

These new features made Namazu much more powerful and simple.
Now, no configuration is needed to get started with Namazu.

You can download the Namazu v0.2.0 binary release from [github](https://github.com/osrg/namazu/releases/tag/v0.2.0).

Or you can also build Namazu manually:

$ sudo apt-get install libzmq3-dev libnetfilter-queue-dev
$ go get github.com/osrg/namazu/nmz

## New features
### Process inspector
The process inspector sets random scheduling priority to threads under a specific Linux process:

$ sudo nmz inspectors proc -pid $TARGET_PID -watch-interval 1s

The process inspector is sometimes useful when you want to reproduce flaky xUnit tests.

The experimental result for Hadoop tests is available in the slide we presented at ApacheCon:
{{< slideshare key="esat324HuI0vud" slide="41" >}}

### Filesystem inspector

The filesystem inspector provides randomized scheduling and fault injection for filesystem using FUSE:

$ mkdir /tmp/{nmzfs-orig,nmzfs}
$ sudo nmz inspectors fs -original-dir /tmp/nmzfs-orig -mount-point /tmp/nmzfs -autopilot config.toml
$ $TARGET_PROGRAM_WHICH_ACCESSES_TMP_NMZFS
$ sudo fusermount -u /tmp/nmzfs

Using filesystem inspector, we successfully found [YARN-4301](https://issues.apache.org/jira/browse/YARN-4301).

{{< slideshare key="esat324HuI0vud" slide="55" >}}

### Container CLI

We introduced *Namazu Container*, a new human-friendly, Docker-like CLI:

$ sudo nmz container run -it --rm -v /foo:/foo ubuntu bash

In *Namazu Container*, you can run arbitrary command that might be *flaky*.
JUnit tests are interesting to try.

nmzc$ git clone something
nmzc$ cd something
nmzc$ for f in $(seq 1 1000);do mvn test; done


By default, only process inspector is enabled in *Namazu Container*.
Please refer to [README file](https://github.com/osrg/namazu/blob/master/README.md) for configuration.

### Semi-deterministic replaying API

Semi-deterministic replayer is an experimental feature:
it determines a delay for an event using a seed value and the hash of the event, rather than just using a random value.

It does not guarantee full determinism, but we believe it is sometimes enough for debugging.

{{< slideshare key="esat324HuI0vud" slide="58" >}}

### Miscellaneous improvements

* Renamed Earthquake to Namazu, and introduced a logo image
* Support static build
* Unit tests for Namazu itself

## Talks
Recently, we made presentations at two events: [CoreOS Fest](http://sched.co/6Szb) (May 10) and [ApacheCon Core NA](http://sched.co/6OJU) (May 12).

Please refer to the [article]({{< relref "post/coreosfest2016-and-apachecon2016.md" >}}) in our blog.
2 changes: 1 addition & 1 deletion misc/analyzer/java/base/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<groupId>net.osrg</groupId>
<artifactId>namazu</artifactId>
<version>0.2.0-SNAPSHOT</version>
<version>0.2.0</version>
<packaging>jar</packaging>

<name>Namazu Analyzer (Java)</name>
Expand Down
2 changes: 1 addition & 1 deletion misc/inspector/java/base/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<groupId>net.osrg</groupId>
<artifactId>namazu</artifactId>
<version>0.2.0-SNAPSHOT</version>
<version>0.2.0</version>
<packaging>jar</packaging>

<name>Namazu Inspector (Java)</name>
Expand Down
2 changes: 1 addition & 1 deletion nmz/util/core/coreutil.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ import (
logutil "github.com/osrg/namazu/nmz/util/log"
)

const NamazuVersion = "0.2.0-SNAPSHOT"
const NamazuVersion = "0.2.0"

// Returns true if NMZ_DEBUG is set
func DebugMode() bool {
Expand Down

0 comments on commit 2b34cff

Please sign in to comment.