Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add architecture document #359

Merged
merged 3 commits into from
Oct 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ Run tests via:
make test
```

# Architecture

For a high-level overview of the architecture, see [docs/architecture.md](./docs/architecture.md).

# Release Process

Prereq: Write access to the repo.
Expand Down
64 changes: 0 additions & 64 deletions controllers/suite_test.go
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test wasn't doing anything and, frankly, an e2e test with cosmos nodes is going to be really tough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to have a staging/dev CosmosFullNode that's continuously delivered off main. And we monitor it for any strangeness.

This file was deleted.

163 changes: 163 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# Cosmos Operator Architecture

This is a high-level overview of the architecture of the Cosmos Operator. It is intended to be a reference for
developers.

## Overview

The operator was written with the [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder) framework.

Kubebuilder simplifies and provides abstractions for creating a Kubernetes controller.

In a nutshell, an operator observes
a [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). Its job is to match
cluster state with the desired state in the CRD. It
continually watches for changes and updates the cluster accordingly - a "control loop" pattern.

Each controller implements a Reconcile method:

```go
Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)
```

Unlike "built-in" controllers like Deployments or StatefulSets, operator controllers are visible in the cluster - one pod
backed by a Deployment under the cosmos-operator-system namespace.

A controller can watch resources outside of the CRD it manages. For example, CosmosFullNode watches for pod deletions,
so it can spin up new pods if a user deletes one manually.

The watching of resources is in this method for each controller:

```go
SetupWithManager(ctx context.Context, mgr ctrl.Manager) error
```

Refer to kubebuilder docs for more info.

### Makefile

Kubebuilder generated much of the Makefile. It contains common tasks for developers.

### `api` directory

This directory contains the different CRDs.

You should run `make generate manifests` each time you change CRDs.

A CI job should fail if you forget to run this command after modifying the api structs.

### `config` directory

The config directory contains kustomize files generated by Kubebuilder.
Strangelove uses these files to deploy the operator (instead of a helm chart).
A helm chart is on the road map but presents challenges in keeping the kustomize and helm code in sync.

### `controllers` directory

The controllers directory contains every controller.

This directory is not unit tested. The code in controllers should act like `main()` functions where it's mostly wiring
up of dependencies from `internal`.

### `internal` directory

Almost all the business logic lives in `internal` and houses the unit and integration tests.

# CosmosFullNode

This is the flagship CRD of the Cosmos Operator and contains the most complexity.

### Builder, Diff, and Control Pattern

Each resource has its own builder and controller (referred as "control" in this context). For example,
see `pvc_builder.go` and `pvc_control.go` which only manages PVCs. All builders should have file suffix `_builder.go`
and all control objects `_control.go`.

The most complex builder is `pod_builder.go`. There may be opportunities to refactor it.

The "control" pattern was loosely inspired by Kubernetes source code.

Within the controller's `Reconcile(...)` method, the controller determines the order of operations of the separate
Control objects.

On process start, each Control is initialized with a Diff and a Builder.

On each reconcile loop:

1. The Builder builds the desired resources from the CRD.
2. Control fetches a list of existing resources.
3. Control uses Diff to compute a diff of the existing to the desired.
4. Control makes changes based on what Diff reports.

The Control tests are *integration tests* where we mock out the Kubernetes API, but not the Builder or Diff. The
tests run quickly (like unit tests) because we do not make any network calls.

The Diff object (`type Diff[T client.Object] struct`) took several iterations to get right. There is probably little
need to tweak it further.

The hardest problem with diffing is determining updates. Essentially, Diff looks for a `Revision() string` method on the
resource and sets a revision annotation. The revision is a simple fnv hash. It compares `Revision` to the existing annotation.
If different, we know it's an update. We cannot compare equality of existing resources directly because Kubernetes adds additional
annotations and fields.

Builders return a `diff.Resource[T]` which Diff can use. Therefore, Control does not need to adapt resources.

The fnv hash is computed from a resource's JSON representation, which has proven to be stable.

### Special Note on Updating Status

There are several controllers that update a
CosmosFullNode's [status subresource](https://book-v1.book.kubebuilder.io/basics/status_subresource):

* CosmosFullNode
* ScheduledVolumeSnapshot
* SelfHealing

Each update to the status subresource triggers another reconcile loop. We found multiple controllers updating status
caused race conditions. Updates were not applied or applied incorrectly.
Some controllers read the status to take action, so it's important to preserve the integrity of the status.

Therefore, you must use the special `SyncUpdate(...)` method from `fullnode.StatusClient`. It ensures updates are
performed serially per CosmosFullNode.

### Sentries

Sentries are special because you should not include a readiness probe due to the way Tendermint/Comet remote
signing works.

The remote signer reaches out to the sentry on the privval port. This is the inverse of what you'd expect, the sentry
reaching out to the remote signer.

If the sentry does not detect a remote signer connection, it crashes. And the stable way to connect to a pod is through
a Kube Service. So we have a chicken or egg problem. The sentry must be "ready" to be added to the Service, but the
remote signer must connect to the sentry through the Service so it doesn't crash.

Therefore, the CosmosFullNode controller inspects Tendermint/Comet as part of its rolling update strategy - not just
pod readiness state.

### CacheController

The CacheController is special in that it does not manage a CRD.

It periodically polls every pod for its Tendermint/Comet status such as block height. The polling is done in
the background. It's a controller because it needs the reconcile loop to update which pods it needs to poll.

The CacheController prevents slow reconcile loops. Previously, we queried this status on every reconcile loop.

When other controllers want Comet status, they always hit the cache controller.

# Scheduled Volume Snapshot

Scheduled Volume Snapshot takes periodic backups.

To preserve data integrity, it will temporarily delete a pod, so it can capture a PVC snapshot without any process
writing to it.

It uses a finite state machine pattern in the main reconcile loop.

# StatefulJob

StatefulJob periodically runs a job on an interval (crontab not supported yet). The purpose is to run a job that
attaches to a PVC created from a VolumeSnapshot.

It's the least developed of the CRDs.
Loading