diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 801d0767..97078ee0 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -7,6 +7,10 @@ Run tests via: make test ``` +# Architecture + +For a high-level overview of the architecture, see [docs/architecture.md](./docs/architecture.md). + # Release Process Prereq: Write access to the repo. diff --git a/controllers/suite_test.go b/controllers/suite_test.go deleted file mode 100644 index 3e9d77eb..00000000 --- a/controllers/suite_test.go +++ /dev/null @@ -1,64 +0,0 @@ -/* -Copyright 2022 Strangelove Ventures LLC. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package controllers - -import ( - "path/filepath" - "testing" - - "github.com/stretchr/testify/require" - "k8s.io/client-go/kubernetes/scheme" - "sigs.k8s.io/controller-runtime/pkg/client" - "sigs.k8s.io/controller-runtime/pkg/envtest" - - cosmosv1 "github.com/strangelove-ventures/cosmos-operator/api/v1" - cosmosv1alpha1 "github.com/strangelove-ventures/cosmos-operator/api/v1alpha1" - //+kubebuilder:scaffold:imports -) - -func TestAPIs(t *testing.T) { - t.Skip("TODO: Implement test. Always skipping because of dependency issues.") - // unable to start control plane itself: failed to start the controlplane. retried 5 times: fork/exec /usr/local/kubebuilder/bin/etcd: no such file or directory - - testEnv := &envtest.Environment{ - CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")}, - ErrorIfCRDPathMissing: true, - } - - t.Cleanup(func() { - if err := testEnv.Stop(); err != nil { - t.Errorf("failed to stop test environment: %v", err) - } - }) - - cfg, err := testEnv.Start() - - require.NoError(t, err) - require.NotNil(t, cfg) - - err = cosmosv1.AddToScheme(scheme.Scheme) - require.NoError(t, err) - - err = cosmosv1alpha1.AddToScheme(scheme.Scheme) - require.NoError(t, err) - - //+kubebuilder:scaffold:scheme - - k8sClient, err := client.New(cfg, client.Options{Scheme: scheme.Scheme}) - require.NoError(t, err) - require.NotNil(t, k8sClient) -} diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..4e4f0958 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,163 @@ +# Cosmos Operator Architecture + +This is a high-level overview of the architecture of the Cosmos Operator. It is intended to be a reference for +developers. + +## Overview + +The operator was written with the [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder) framework. + +Kubebuilder simplifies and provides abstractions for creating a Kubernetes controller. + +In a nutshell, an operator observes +a [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). Its job is to match +cluster state with the desired state in the CRD. It +continually watches for changes and updates the cluster accordingly - a "control loop" pattern. + +Each controller implements a Reconcile method: + +```go +Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) +``` + +Unlike "built-in" controllers like Deployments or StatefulSets, operator controllers are visible in the cluster - one pod +backed by a Deployment under the cosmos-operator-system namespace. + +A controller can watch resources outside of the CRD it manages. For example, CosmosFullNode watches for pod deletions, +so it can spin up new pods if a user deletes one manually. + +The watching of resources is in this method for each controller: + +```go +SetupWithManager(ctx context.Context, mgr ctrl.Manager) error +``` + +Refer to kubebuilder docs for more info. + +### Makefile + +Kubebuilder generated much of the Makefile. It contains common tasks for developers. + +### `api` directory + +This directory contains the different CRDs. + +You should run `make generate manifests` each time you change CRDs. + +A CI job should fail if you forget to run this command after modifying the api structs. + +### `config` directory + +The config directory contains kustomize files generated by Kubebuilder. +Strangelove uses these files to deploy the operator (instead of a helm chart). +A helm chart is on the road map but presents challenges in keeping the kustomize and helm code in sync. + +### `controllers` directory + +The controllers directory contains every controller. + +This directory is not unit tested. The code in controllers should act like `main()` functions where it's mostly wiring +up of dependencies from `internal`. + +### `internal` directory + +Almost all the business logic lives in `internal` and houses the unit and integration tests. + +# CosmosFullNode + +This is the flagship CRD of the Cosmos Operator and contains the most complexity. + +### Builder, Diff, and Control Pattern + +Each resource has its own builder and controller (referred as "control" in this context). For example, +see `pvc_builder.go` and `pvc_control.go` which only manages PVCs. All builders should have file suffix `_builder.go` +and all control objects `_control.go`. + +The most complex builder is `pod_builder.go`. There may be opportunities to refactor it. + +The "control" pattern was loosely inspired by Kubernetes source code. + +Within the controller's `Reconcile(...)` method, the controller determines the order of operations of the separate +Control objects. + +On process start, each Control is initialized with a Diff and a Builder. + +On each reconcile loop: + +1. The Builder builds the desired resources from the CRD. +2. Control fetches a list of existing resources. +3. Control uses Diff to compute a diff of the existing to the desired. +4. Control makes changes based on what Diff reports. + +The Control tests are *integration tests* where we mock out the Kubernetes API, but not the Builder or Diff. The +tests run quickly (like unit tests) because we do not make any network calls. + +The Diff object (`type Diff[T client.Object] struct`) took several iterations to get right. There is probably little +need to tweak it further. + +The hardest problem with diffing is determining updates. Essentially, Diff looks for a `Revision() string` method on the +resource and sets a revision annotation. The revision is a simple fnv hash. It compares `Revision` to the existing annotation. +If different, we know it's an update. We cannot compare equality of existing resources directly because Kubernetes adds additional +annotations and fields. + +Builders return a `diff.Resource[T]` which Diff can use. Therefore, Control does not need to adapt resources. + +The fnv hash is computed from a resource's JSON representation, which has proven to be stable. + +### Special Note on Updating Status + +There are several controllers that update a +CosmosFullNode's [status subresource](https://book-v1.book.kubebuilder.io/basics/status_subresource): + +* CosmosFullNode +* ScheduledVolumeSnapshot +* SelfHealing + +Each update to the status subresource triggers another reconcile loop. We found multiple controllers updating status +caused race conditions. Updates were not applied or applied incorrectly. +Some controllers read the status to take action, so it's important to preserve the integrity of the status. + +Therefore, you must use the special `SyncUpdate(...)` method from `fullnode.StatusClient`. It ensures updates are +performed serially per CosmosFullNode. + +### Sentries + +Sentries are special because you should not include a readiness probe due to the way Tendermint/Comet remote +signing works. + +The remote signer reaches out to the sentry on the privval port. This is the inverse of what you'd expect, the sentry +reaching out to the remote signer. + +If the sentry does not detect a remote signer connection, it crashes. And the stable way to connect to a pod is through +a Kube Service. So we have a chicken or egg problem. The sentry must be "ready" to be added to the Service, but the +remote signer must connect to the sentry through the Service so it doesn't crash. + +Therefore, the CosmosFullNode controller inspects Tendermint/Comet as part of its rolling update strategy - not just +pod readiness state. + +### CacheController + +The CacheController is special in that it does not manage a CRD. + +It periodically polls every pod for its Tendermint/Comet status such as block height. The polling is done in +the background. It's a controller because it needs the reconcile loop to update which pods it needs to poll. + +The CacheController prevents slow reconcile loops. Previously, we queried this status on every reconcile loop. + +When other controllers want Comet status, they always hit the cache controller. + +# Scheduled Volume Snapshot + +Scheduled Volume Snapshot takes periodic backups. + +To preserve data integrity, it will temporarily delete a pod, so it can capture a PVC snapshot without any process +writing to it. + +It uses a finite state machine pattern in the main reconcile loop. + +# StatefulJob + +StatefulJob periodically runs a job on an interval (crontab not supported yet). The purpose is to run a job that +attaches to a PVC created from a VolumeSnapshot. + +It's the least developed of the CRDs.