strangelove-ventures · agouin · Oct 14, 2023 · Sep 21, 2023 · Sep 22, 2023 · Sep 22, 2023
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -7,6 +7,10 @@ Run tests via:
 make test
 ```
 
+# Architecture
+
+For a high-level overview of the architecture, see [docs/architecture.md](./docs/architecture.md).
+
 # Release Process
 
 Prereq: Write access to the repo.

diff --git a/controllers/suite_test.go b/controllers/suite_test.go
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -0,0 +1,163 @@
+# Cosmos Operator Architecture
+
+This is a high-level overview of the architecture of the Cosmos Operator. It is intended to be a reference for
+developers.
+
+## Overview
+
+The operator was written with the [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder) framework.
+
+Kubebuilder simplifies and provides abstractions for creating a Kubernetes controller.
+
+In a nutshell, an operator observes
+a [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). Its job is to match
+cluster state with the desired state in the CRD. It
+continually watches for changes and updates the cluster accordingly - a "control loop" pattern.
+
+Each controller implements a Reconcile method:
+
+```go
+Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)
+```
+
+Unlike "built-in" controllers like Deployments or StatefulSets, operator controllers are visible in the cluster - one pod
+backed by a Deployment under the cosmos-operator-system namespace.
+
+A controller can watch resources outside of the CRD it manages. For example, CosmosFullNode watches for pod deletions,
+so it can spin up new pods if a user deletes one manually.
+
+The watching of resources is in this method for each controller:
+
+```go
+SetupWithManager(ctx context.Context, mgr ctrl.Manager) error
+```
+
+Refer to kubebuilder docs for more info.
+
+### Makefile
+
+Kubebuilder generated much of the Makefile. It contains common tasks for developers.
+
+### `api` directory
+
+This directory contains the different CRDs.
+
+You should run `make generate manifests` each time you change CRDs.
+
+A CI job should fail if you forget to run this command after modifying the api structs.
+
+### `config` directory
+
+The config directory contains kustomize files generated by Kubebuilder. 
+Strangelove uses these files to deploy the operator (instead of a helm chart). 
+A helm chart is on the road map but presents challenges in keeping the kustomize and helm code in sync.
+
+### `controllers` directory
+
+The controllers directory contains every controller.
+
+This directory is not unit tested. The code in controllers should act like `main()` functions where it's mostly wiring
+up of dependencies from `internal`.
+
+### `internal` directory
+
+Almost all the business logic lives in `internal` and houses the unit and integration tests.
+
+# CosmosFullNode
+
+This is the flagship CRD of the Cosmos Operator and contains the most complexity.
+
+### Builder, Diff, and Control Pattern
+
+Each resource has its own builder and controller (referred as "control" in this context). For example,
+see `pvc_builder.go` and `pvc_control.go` which only manages PVCs. All builders should have file suffix `_builder.go`
+and all control objects `_control.go`.
+
+The most complex builder is `pod_builder.go`. There may be opportunities to refactor it.
+
+The "control" pattern was loosely inspired by Kubernetes source code.
+
+Within the controller's `Reconcile(...)` method, the controller determines the order of operations of the separate 
+Control objects.
+
+On process start, each Control is initialized with a Diff and a Builder.
+
+On each reconcile loop:
+
+1. The Builder builds the desired resources from the CRD.
+2. Control fetches a list of existing resources.
+3. Control uses Diff to compute a diff of the existing to the desired.
+4. Control makes changes based on what Diff reports.
+
+The Control tests are *integration tests* where we mock out the Kubernetes API, but not the Builder or Diff. The
+tests run quickly (like unit tests) because we do not make any network calls.
+
+The Diff object (`type Diff[T client.Object] struct`) took several iterations to get right. There is probably little
+need to tweak it further.
+
+The hardest problem with diffing is determining updates. Essentially, Diff looks for a `Revision() string` method on the
+resource and sets a revision annotation. The revision is a simple fnv hash. It compares `Revision` to the existing annotation. 
+If different, we know it's an update. We cannot compare equality of existing resources directly because Kubernetes adds additional 
+annotations and fields.
+
+Builders return a `diff.Resource[T]` which Diff can use. Therefore, Control does not need to adapt resources.
+
+The fnv hash is computed from a resource's JSON representation, which has proven to be stable.
+
+### Special Note on Updating Status
+
+There are several controllers that update a
+CosmosFullNode's [status subresource](https://book-v1.book.kubebuilder.io/basics/status_subresource):
+
+* CosmosFullNode
+* ScheduledVolumeSnapshot
+* SelfHealing
+
+Each update to the status subresource triggers another reconcile loop. We found multiple controllers updating status
+caused race conditions. Updates were not applied or applied incorrectly. 
+Some controllers read the status to take action, so it's important to preserve the integrity of the status.
+
+Therefore, you must use the special `SyncUpdate(...)` method from `fullnode.StatusClient`. It ensures updates are
+performed serially per CosmosFullNode.
+
+### Sentries
+
+Sentries are special because you should not include a readiness probe due to the way Tendermint/Comet remote
+signing works.
+
+The remote signer reaches out to the sentry on the privval port. This is the inverse of what you'd expect, the sentry
+reaching out to the remote signer.
+
+If the sentry does not detect a remote signer connection, it crashes. And the stable way to connect to a pod is through
+a Kube Service. So we have a chicken or egg problem. The sentry must be "ready" to be added to the Service, but the 
+remote signer must connect to the sentry through the Service so it doesn't crash.
+
+Therefore, the CosmosFullNode controller inspects Tendermint/Comet as part of its rolling update strategy - not just 
+pod readiness state. 
+
+### CacheController
+
+The CacheController is special in that it does not manage a CRD.
+
+It periodically polls every pod for its Tendermint/Comet status such as block height. The polling is done in 
+the background. It's a controller because it needs the reconcile loop to update which pods it needs to poll.
+
+The CacheController prevents slow reconcile loops. Previously, we queried this status on every reconcile loop.
+
+When other controllers want Comet status, they always hit the cache controller.
+
+# Scheduled Volume Snapshot
+
+Scheduled Volume Snapshot takes periodic backups.
+
+To preserve data integrity, it will temporarily delete a pod, so it can capture a PVC snapshot without any process
+writing to it.
+
+It uses a finite state machine pattern in the main reconcile loop.
+
+# StatefulJob
+
+StatefulJob periodically runs a job on an interval (crontab not supported yet). The purpose is to run a job that
+attaches to a PVC created from a VolumeSnapshot.
+
+It's the least developed of the CRDs.