Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified Merge-All #33

Merged
merged 43 commits into from
Jan 24, 2018
Merged
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
855b60f
Support copy pasta
Aergonus Oct 29, 2017
1e958a1
KISS
Aergonus Oct 29, 2017
ed47ef1
Clarifying CM Namespace
Aergonus Oct 29, 2017
46c90e5
Including apiserver overload info
Aergonus Oct 29, 2017
0390bc1
Todo --> Contribute
Aergonus Oct 30, 2017
73d6e3e
Updated after logging merge
Aergonus Nov 9, 2017
bdebdee
Spelling Fix
Aergonus Nov 15, 2017
d824e55
Deployment identifier example
Aergonus Dec 20, 2017
c7976f9
Client-go update
Aergonus Dec 27, 2017
970ed42
Glide and vendor update
Aergonus Dec 27, 2017
dcfff06
Add timezone to example config
Aergonus Dec 27, 2017
05412e0
Merge branch 'master' into newbie
Aergonus Dec 27, 2017
a75c85d
Update Logging Conventions
Aergonus Dec 27, 2017
655399e
Refactor CreateClient
Aergonus Jan 1, 2018
ac12218
Templatize for victims
Aergonus Jan 1, 2018
84ca6da
Move deployments to victims factory
Aergonus Jan 1, 2018
166a83d
Convert deployment to victim template
Aergonus Jan 1, 2018
b768751
go fmt codebase
Aergonus Jan 1, 2018
37b4e42
Print time fixes
Aergonus Jan 1, 2018
801e7b0
Upgraded client-go
Aergonus Jan 2, 2018
0d348f0
Add new contribution types
Aergonus Jan 2, 2018
e43f906
Refactor deployments --> victims
Aergonus Jan 2, 2018
dc6ab73
Copy deployments --> statefulsets
Aergonus Jan 2, 2018
9ffa53c
Add statefulsets
Aergonus Jan 2, 2018
fa00c5e
Add Whitelisting Namespaces
Aergonus Jan 2, 2018
a112f81
Update docs for whitelist
Aergonus Jan 2, 2018
c70db49
Added statefulsets
Aergonus Jan 3, 2018
a1ec881
Merge pull request #5 from Spellchaser/log_improvements
Aergonus Jan 3, 2018
1f7cc77
Merge branch 'master' into k8_upgrade
Aergonus Jan 3, 2018
251dcd0
Merge branch 'master' into k8_upgrade
Aergonus Jan 3, 2018
d85c41b
Merge pull request #6 from Spellchaser/k8_upgrade
Aergonus Jan 3, 2018
6bb53b2
Merge branch 'master' into templatize
Aergonus Jan 3, 2018
973d618
Merge pull request #7 from Spellchaser/templatize
Aergonus Jan 3, 2018
f04edfe
Merge pull request #8 from Spellchaser/timezone-print
Aergonus Jan 3, 2018
6475c6b
Merge pull request #9 from Spellchaser/statefulsets
Aergonus Jan 3, 2018
99d1de8
Merge pull request #10 from Spellchaser/whitelist
Aergonus Jan 3, 2018
95a06f0
Merge branch 'master' into newbie
Aergonus Jan 3, 2018
d94a5fe
Merge pull request #11 from Spellchaser/newbie
Aergonus Jan 3, 2018
ac6db8f
version++
Aergonus Jan 3, 2018
811c3ca
Merge pull request #12 from Spellchaser/version++
Aergonus Jan 3, 2018
c3b6f52
glide up --v && v0.2.0
Aergonus Jan 3, 2018
e48eecd
Merge pull request #13 from Spellchaser/version++
Aergonus Jan 3, 2018
eeb2aae
Merge branch 'master' of https://github.com/asobti/kube-monkey
Aergonus Jan 24, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
all: build

ENVVAR = GOOS=linux GOARCH=amd64 CGO_ENABLED=0
TAG = v0.1.0
TAG = v0.2.0

.PHONY: all build container clean

@@ -28,5 +28,8 @@ endif
endif
endif

gofmt:
find . -path ./vendor -prune -o -name '*.go' -print | xargs -L 1 -I % gofmt -s -w %

clean:
rm -f kube-monkey
125 changes: 90 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,27 @@
## kube-monkey
kube-monkey is an implementation of [Netflix's Chaos Monkey](https://github.com/Netflix/chaosmonkey) for [Kubernetes](http://kubernetes.io/)
clusters. It randomly deletes Kubernetes pods in the cluster encouraging and validating the development of failure-resilient
services.
# kube-monkey
kube-monkey is an implementation of [Netflix's Chaos Monkey](https://github.com/Netflix/chaosmonkey) for [Kubernetes](http://kubernetes.io/) clusters. It randomly deletes Kubernetes pods in the cluster encouraging and validating the development of failure-resilient services.

--
---

kube-monkey runs at a pre-configured hour (`run_hour`, defaults to 8am) on weekdays, and builds a schedule of deployments that will face a random
Pod death sometime during the same day. The time-range during the day when the random pod Death might occur is configurable and
defaults to 10am to 4pm.
Pod death sometime during the same day. The time-range during the day when the random pod Death might occur is configurable and defaults to 10am to 4pm.

kube-monkey can be configured with a list of namespaces to blacklist - any deployments within a blacklisted namespace will not
be touched.
kube-monkey can be configured with a list of namespaces
* to blacklist (any deployments within a blacklisted namespace will not be touched)
* to whitelist (only deployments within a whitelisted namespace that are not blacklisted will be scheduled)
The blacklist overrides the whitelist. The config will be populated with default behavior (blacklist `kube-system` and whitelist `default`). To disable either the blacklist or whitelist provide `[""]` to the respective config.param

## Opting-In to Chaos

kube-monkey works on an opt-in model and will only schedule terminations for Deployments that have explicitly agreed
to have their pods terminated by kube-monkey.
kube-monkey works on an opt-in model and will only schedule terminations for k8 apps that have explicitly agreed to have their pods terminated by kube-monkey.

Opt-in is done by setting the following labels on a Kubernetes Deployment:
Opt-in is done by setting the following labels on a Kubernetes k8 app:

**`kube-monkey/enabled`**: Set to **`"enabled"`** to opt-in to kube-monkey
**`kube-monkey/mtbf`**: Mean time between failure (in days). For example, if set to **`"3"`**, the Deployment can expect to have a Pod
**`kube-monkey/mtbf`**: Mean time between failure (in days). For example, if set to **`"3"`**, the k8 app can expect to have a Pod
killed approximately every third weekday.
**`kube-monkey/identifier`**: A unique identifier for the deployment (eg. the deployment's name). This is used to identify the pods
that belong to a Deployment as Pods inherit labels from their Deployment.
**`kube-monkey/identifier`**: A unique identifier for the k8 app (eg. the k8 app's name). This is used to identify the pods
that belong to a k8 app as Pods inherit labels from their k8 app.
**`kube-monkey/kill-all`**: Set this label's value to `"kill-all"` if you want kube-monkey to kill ALL of your pods. Default behavior in the absence of this label is to kill only ONE pod. **Use this label carefully.**


@@ -41,70 +39,127 @@ spec:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
kube-monkey/identifier: monkey-victim-pods
kube-monkey/mtbf: '2'
[... omitted ...]
```

For newer versions of kubernetes you may need to add the labels to the k8 app metadata as well.

```yaml
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: monkey-victim
namespace: app-namespace
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
kube-monkey/mtbf: '2'
spec:
template:
metadata:
labels:
kube-monkey/enabled: enabled
kube-monkey/identifier: monkey-victim
[... omitted ...]
```

### Overriding the apiserver
#### Use cases:
* Since client-go does not support [cluster dns](https://github.com/kubernetes/client-go/blob/master/rest/config.go#L336) explicitly with a `// TODO: switch to using cluster DNS.` note in the code, you may need to override the apiserver.
* If you are running an unauthenticated system, you may need to force the http apiserver enpoint.

#### To override the apiserver specify in the config.toml file
```toml
[kubernetes]
host="https://your-apiserver-url.com"
```

## How kube-monkey works

#### Scheduling time
Scheduling happens once a day on Weekdays - this is when a schedule for terminations for the current day is generated.
During scheduling, kube-monkey will:
1. Generate a list of eligible deployments (deployments that have opted-in and are not blacklisted)
2. For each eligible deployment, flip a biased coin (bias determined by `kube-monkey/mtbf`) to determine if a pod for that deployment should be killed today
1. Generate a list of eligible k8 apps (k8 apps that have opted-in and are not blacklisted)
2. For each eligible k8 app, flip a biased coin (bias determined by `kube-monkey/mtbf`) to determine if a pod for that k8 app should be killed today
3. For each victim, calculate a random time when a pod will be killed

#### Termination time
This is the randomly generated time during the day when a victim Deployment will have a pod killed.
This is the randomly generated time during the day when a victim k8 app will have a pod killed.
At termination time, kube-monkey will:
1. Check if the deployment is still eligible (has not opted-out or been blacklisted since scheduling)
2. Get a list of running pods for the deployment
1. Check if the k8 app is still eligible (has not opted-out or been blacklisted since scheduling)
2. Get a list of running pods for the k8 app
3. Select one random pod and delete it

## Building

Clone the repository and build the container.

```
$ go get github.com/asobti/kube-monkey
$ cd $GOPATH/src/github.com/asobti/kube-monkey
$ make container
```bash
go get github.com/asobti/kube-monkey
cd $GOPATH/src/github.com/asobti/kube-monkey
make container
```

## Configuring
kube-monkey is configured by a toml file placed at `/etc/kube-monkey/config.toml`.
Configuration keys and descriptions can be found in [`config/param/param.go`](https://github.com/asobti/kube-monkey/blob/master/config/param/param.go)
kube-monkey is configured by a toml file placed at `/etc/kube-monkey/config.toml` and expects the configmap to exist before the kubemonkey deployment.

#### Example config file
Configuration keys and descriptions can be found in [`config/param/param.go`](https://github.com/asobti/kube-monkey/blob/master/config/param/param.go)

#### Example config.toml file
```toml
[kubemonkey]
dry_run = true # Terminations are only logged
run_hour = 8 # Run scheduling at 8am on weekdays
start_hour = 10 # Don't schedule any pod deaths before 10am
end_hour = 16 # Don't schedule any pod deaths after 4pm
blacklisted_namespaces = ["kube-system"] # Critical deployments live here
blacklisted_namespaces = ["kube-system"] # Critical apps live here
time_zone = "America/New_York" # Set tzdata timezone example. Note the field is time_zone not timezone
```

## Deploying

Run kube-monkey as a Deployment within the Kubernetes cluster, in a namespace that has permissions to kill Pods
in other namespaces (eg. `kube-system`).
1. First deploy the expected `kube-monkey-config-map` configmap in the namespace you intend to run kube-monkey in (for example, the `kube-system` namespace). Make sure to define the keyname as `config.toml`

> For example `kubectl create configmap km-config --from-file=config.toml=km-config.toml`

2. Run kube-monkey as a k8 app within the Kubernetes cluster, in a namespace that has permissions to kill Pods in other namespaces (eg. `kube-system`).

See dir [`examples/`](https://github.com/asobti/kube-monkey/tree/master/examples) for example Kubernetes yaml files.

## Logging

kube-monkey uses glog and supports all command-line features for glog. To specify a custom v level or a custom log directory on the pod, see `args: ["-v=5", "-log_dir=/path/to/custom/log"]` in the [example deployment file](https://github.com/asobti/kube-monkey/tree/master/examples/deployment.yaml)

> **Standardized glog levels `grep -r V\([0-9]\) *`**
>
> L0: None
>
> L1: Highest Level current status info and Errors with Terminations
>
> L2: Successful terminations
>
> L3: More detailed schedule status info
>
> L4: Debugging verbose schedule and config info
>
> L5: Auto-resolved inconsequential issues

More resources: See the [k8 logging page](https://kubernetes.io/docs/concepts/cluster-administration/logging/) suggesting [community conventions for logging severity](https://github.com/kubernetes/community/blob/master/contributors/devel/logging.md)

## Compatibility with Kubernetes

kube-monkey is built using v1.5 of [kubernetes/client-go](https://github.com/kubernetes/client-go). Refer to the
kube-monkey is built using v6.0 of [kubernetes/client-go](https://github.com/kubernetes/client-go). Refer to the
[Compatibility Matrix](https://github.com/kubernetes/client-go#compatibility-matrix) to see which
versions of Kubernetes are compatible.

## To do
## Ways to contribute

- Add tests
- Use a logging library like [glog](https://github.com/golang/glog)
- Add unit [tests](https://golang.org/pkg/testing/)
- Support more k8 types
- ~~deployments~~
- ~~statefulsets~~
- dameonsets
- etc
10 changes: 5 additions & 5 deletions calendar/calendar.go
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
package calendar

import (
"time"
"math/rand"

"time"

"github.com/golang/glog"
)

@@ -17,7 +17,7 @@ func isWeekday(t time.Time) bool {
}

glog.Fatalf("Unrecognized day of the week: %s", t.Weekday().String())

panic("Explicit Panic to avoid compiler error: missing return at end of function")
}

@@ -50,7 +50,7 @@ func NextRuntime(loc *time.Location, r int) time.Time {
}

// Returns a random time within the range specified by startHour and endHour
func RandomTimeInRange(startHour int, endHour int, location *time.Location) time.Time {
func RandomTimeInRange(startHour int, endHour int, loc *time.Location) time.Time {
// calculate the number of minutes in the range
minutesInRange := (endHour - startHour) * 60

@@ -62,6 +62,6 @@ func RandomTimeInRange(startHour int, endHour int, location *time.Location) time
// Add the minute offset to the start of the range to get a random
// time within the range
year, month, date := time.Now().Date()
rangeStart := time.Date(year, month, date, startHour, 0, 0, 0, location)
rangeStart := time.Date(year, month, date, startHour, 0, 0, 0, loc)
return rangeStart.Add(offsetDuration)
}
Loading