Skip to content

A Kubernetes Operator for managing and recovering replicated workflows

License

Notifications You must be signed in to change notification settings

beekhof/rss-operator

Repository files navigation

Replicated Stateful Set (RSS) Operator

Build Status

Project status: beta

Most major planned features have been completed and while no breaking API changes are currently planned, we reserve the right to address bugs and API changes in a backwards incompatible way before the project is declared stable. See upgrade guide for safe upgrade process.

We expect to consider the operator stable soon; backwards incompatible changes will not be made once the project reaches stability.

See the roadmap.

Overview

The replication operator manages application clusters deployed to Kubernetes and automates tasks related to seeding, and operating a cluster. It has been adapted from the etcd operator from the CoreOS folks.

There are examples of different applications and specs for driving them

Read why the operator exists and how replication is managed.

Read RBAC docs for how to setup RBAC rules for the replication operator if RBAC is in place.

Read Developer Guide for setting up development environment if you want to contribute.

See the Resources and Labels doc for an overview of the resources created by the rss-operator.

Requirements

  • Kubernetes 1.9+

Demo - Replicated Database

asciicast

Getting started

Build the image

The demo uses a CentOS 7 based image which contains a number of scripts that wrap the galera OCF resource agent.

You will need to create an image from https://github.com/beekhof/galera-container and push it to docker/quay.

Once complete, modify example/cluster.yaml to point to the image you just published.

Deploy the replication operator

See instructions on how to install/uninstall replication operator .

Create and destroy a cluster

$ kubectl create -f example/cluster.yaml

A 3 member cluster will be created.

$ kubectl get pods
NAME            READY     STATUS    RESTARTS   AGE
rss-example-0   1/1       Running   0          3m
rss-example-1   1/1       Running   0          3m
rss-example-2   1/1       Running   0          3m
rss-operator    1/1       Running   0          6m

In addition to the pod status, we can check the state of the application by examining the Kubernetes Custom Resource representing the cluster:

$ kubectl -n dummy get rss/example -o=jsonpath='{"Primaries: "}{.status.members.primary}{"\n"}{"Members:   "}{.status.members.ready}{"\n"}'
Primaries: [rss-example-0 rss-example-2]
Members:   [rss-example-0 rss-example-1 rss-example-2]

See client service for how to access clusters created by the replication operator.

Destroy a replicated cluster:

$ kubectl delete -f example/cluster.yaml

Resize a cluster

Create a cluster:

$ kubectl create -f example/cluster.yaml

In example/cluster.yaml the initial cluster size is 3. Modify the file and change size from 3 to 5.

$ cat example/cluster.yaml
apiVersion: clusterlabs.org/v1alpha1
kind: ReplicatedStatefulSet
metadata:
  name: example
spec:
  replicas: 5
  reconcileInterval: 30s
  pod:
    antiAffinity: true
    commands:
      sequence: 
        command: ["/sequence.sh"]
      primary: 
        command: ["/start.sh"]
      seed: 
        command: ["/seed.sh"]
      status: 
        timeout: 60s
        command: ["/check.sh"]
      stop: 
        command: ["/stop.sh"]
    containers:
    - name: dummy
      image: quay.io/beekhof/dummy:latest

Apply the size change to the cluster CR:

$ kubectl apply -f example/cluster.yaml

The replicated cluster will scale to 5 members (5 pods):

$ kubectl get pods
NAME            READY     STATUS    RESTARTS   AGE
rss-example-0   1/1       Running   0          5m
rss-example-1   1/1       Running   0          5m
rss-example-2   1/1       Running   0          5m
rss-example-3   1/1       Running   0          2m
rss-example-4   1/1       Running   0          2m
rss-operator    1/1       Running   0          9m

Similarly we can decrease the size of cluster from 5 back to 3 by changing the size field again and reapplying the change.

$ cat example/cluster.yaml
apiVersion: clusterlabs.org/v1alpha1
kind: ReplicatedStatefulSet
metadata:
  name: example
spec:
  replicas: 3
  reconcileInterval: 30s
  pod:
    antiAffinity: true
    commands:
      sequence: 
        command: ["/sequence.sh"]
      primary: 
        command: ["/start.sh"]
      seed: 
        command: ["/seed.sh"]
      status: 
        timeout: 60s
        command: ["/check.sh"]
      stop: 
        command: ["/stop.sh"]
    containers:
    - name: dummy
      image: quay.io/beekhof/dummy:latest
$ kubectl apply -f example/cluster.yaml

We should see that the cluster will eventually reduce to 3 pods:

$ kubectl get pods
NAME            READY     STATUS    RESTARTS   AGE
rss-example-0   1/1       Running   0          8m
rss-example-1   1/1       Running   0          8m
rss-example-2   1/1       Running   0          8m
rss-operator    1/1       Running   0          11m

Failover

If any members crash, the replication operator will automatically recover the failure. Let's walk through in the following steps.

Create a cluster:

$ kubectl create -f example/cluster.yaml

Wait until all three members are up. Simulate a member failure by deleting a pod:

$ kubectl delete pod rss-example-0 --now

The replication operator will recover the failure by recreating the pod rss-example-0 :

$ kubectl get pods
NAME            READY     STATUS    RESTARTS   AGE
rss-example-0   1/1       Running   0          41s
rss-example-1   1/1       Running   0          10m
rss-example-2   1/1       Running   0          10m
rss-operator    1/1       Running   0          13m

Destroy dummy cluster:

$ kubectl delete -f example/cluster.yaml

Replication operator recovery

If the replication operator restarts, it can recover its previous state. Let's walk through in the following steps.

$ kubectl create -f example/cluster.yaml

Wait until all three members are up. Then

$ kubectl delete -f example/operator.yaml
pod "rss-operator" deleted

$ kubectl delete pod rss-example-0  --now
pod "rss-example-0 " deleted

Then restart the replication operator. It should recover itself and the clusters it manages.

$ kubectl create -f example/operator.yaml
pod "rss-operator" created

$ kubectl get pods
NAME            READY     STATUS    RESTARTS   AGE
rss-example-0   1/1       Running   0          4m
rss-example-1   1/1       Running   0          13m
rss-example-2   1/1       Running   0          13m
rss-operator    1/1       Running   0          4m

Limitations

  • The replication operator only manages clusters created in the same namespace. Users need to create multiple operators in different namespaces to manage clusters in different namespaces.

  • Lights-out recovery of the replication operator currently requires shared storage. Backup and restore capability will be added in the future if there is sufficient interest.

About

A Kubernetes Operator for managing and recovering replicated workflows

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published