From 6746d1f04707a6fb95ed92fbc8033090166b7e1f Mon Sep 17 00:00:00 2001 From: David Nix Date: Thu, 21 Sep 2023 16:30:45 -0600 Subject: [PATCH 1/3] docs: Add architecture document --- docs/architecture.md | 135 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 135 insertions(+) create mode 100644 docs/architecture.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 00000000..53fca5d8 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,135 @@ +# Cosmos Operator Architecture + +This is a high-level overview of the architecture of the Cosmos Operator. It is intended to be a reference for +developers. + +## Overview + +The operator was written with the [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder) framework. + +Kubebuilder simplifies and provides abstractions for creating a controller. + +In a nutshell, an operator observes +a [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). Its job is to match +cluster state with the desired state in the CRD. It +continually watches for changes and updates the cluster accordingly - a "control loop" pattern. + +Each controller implements a Reconcile method: + +```go +Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) +``` + +Unlike "built-in" controllers like Deployments or StatefulSets, operators are visible in the cluster. It is one pod +backed by a Deployment under the cosmos-operator-system namespace. + +A controller can watch resources outside of the CRD it manages. For example, CosmosFullNode watches for pod deletions, +so +it can spin up new pods if a user deletes one manually. + +The watching of resources happens in this method for each controller: + +```go +SetupWithManager(ctx context.Context, mgr ctrl.Manager) error +``` + +Refer to kubebuilder docs for more info. + +### Makefile + +Kubebuilder generated much of the Makefile. It contains common tasks for developers. + +### `api` directory + +This directory contains the different CRDs. + +You should run `make generate manifests` each time you change CRDs. + +### `config` directory + +The config directory contains kustomize files. Strangelove uses these files to deploy the operator (instead of a helm +chart). A helm chart is still pending but presents challenges in keeping the kustomize and helm code in sync. + +### `controllers` directory + +The controllers directory contains every controller. + +This directory is not unit tested. The code in controllers should act like `main()` functions where it's mostly wiring +up of dependencies from `internal`. + +Kubebuilder includes an integration test suite which you can see in `controllers/suite_test.go`. So far we ignore it. +Integration or e2e tests within the context of the test suite will be challenging given the dependency on a +blockchain network, such as syncing from peers, downloading snapshot, etc. Instead, I recommend a monitored staging +environment +which has continuous delivery. + +### `internal` directory + +Almost all the business logic lives in `internal` and contains unit tests. + +# CosmosFullNode + +This is the flagship CRD of the Cosmos Operator and contains the most complexity. + +### Builder, Diff, and Control Pattern + +Each resource has its own builder and controller (referred as "control" in this context). For example, +see `pvc_builder.go` and `pvc_control.go` which only manages PVCs. All builders should have file suffix `_builder.go` +and all control objects `_control.go`. + +The "control" +pattern was loosely inspired by Kubernetes source code. + +On each reconcile loop: + +1. (On process start) control is initialized with a Diff and a Builder. +2. The builder builds the desired resources from the CRD. +3. Control fetches a list of existing resources. +4. Control uses Diff to compute a diff of the existing to the desired. +5. Control makes changes based on what Diff reports. + +The "control" tests are **integration tests** where we mock out the Kubernetes API, but not the Builder or Diff. The +tests run quickly (like unit tests) because we do not make any network calls. + +The Diff object (`type Diff[T client.Object] struct`) took several iterations to get right. There is probably little +need to tweak it further. + +The hardest problem with diffing is determining updates. Essentially, Diff looks for a `Revision() string` method on the +resource and sets a revision annotation. The revision is a simple fnv hash. It compares `Revision` to the existing annotation. +If different, we know it's an update. + +Builders return a `diff.Resource[T]` which Diff can use. Therefore, Control does not need to adapt resources. + +The fnv hash is computed from a resource's JSON representation, which has proven to be stable. + +### Special Note on Updating Status + +There are several controllers that update a +CosmosFullNode's [status subresource](https://book-v1.book.kubebuilder.io/basics/status_subresource): + +* CosmosFullNode +* ScheduledVolumeSnapshot +* SelfHealing + +Each update to the status subresource triggers another reconcile loop. We found multiple controllers updating status +caused race conditions. Updates were not applied or applied incorrectly. +Some controllers read the status to take action, so it's important to preserve the integrity of the status. + +Therefore, you must use the special `SyncUpdate(...)` method from `fullnode.StatusClient`. It ensures updates are +performed serially per CosmosFullNode. + +# Scheduled Volume Snapshot + +Scheduled Volume Snapshot takes periodic backups. + +To preserve data integrity, it will temporarily delete a pod, so it can capture a PVC snapshot without any process +writing to it. + +It uses a finite state machine pattern in the main reconcile loop. + +# StatefulJob + +StatefulJob periodically runs a job on an interval (crontab not supported yet). The purpose is to run a job that +attaches to a PVC created from a VolumeSnapshot. + +It's the least developed of the CRDs. From 4fb574420a6069b785c5b80b12767625d37fc5a8 Mon Sep 17 00:00:00 2001 From: David Nix Date: Fri, 22 Sep 2023 12:26:52 -0600 Subject: [PATCH 2/3] Delete e2e test --- controllers/suite_test.go | 64 --------------------------------------- 1 file changed, 64 deletions(-) delete mode 100644 controllers/suite_test.go diff --git a/controllers/suite_test.go b/controllers/suite_test.go deleted file mode 100644 index 3e9d77eb..00000000 --- a/controllers/suite_test.go +++ /dev/null @@ -1,64 +0,0 @@ -/* -Copyright 2022 Strangelove Ventures LLC. - -Licensed under the Apache License, Version 2.0 (the "License"); -you may not use this file except in compliance with the License. -You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - -Unless required by applicable law or agreed to in writing, software -distributed under the License is distributed on an "AS IS" BASIS, -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -See the License for the specific language governing permissions and -limitations under the License. -*/ - -package controllers - -import ( - "path/filepath" - "testing" - - "github.com/stretchr/testify/require" - "k8s.io/client-go/kubernetes/scheme" - "sigs.k8s.io/controller-runtime/pkg/client" - "sigs.k8s.io/controller-runtime/pkg/envtest" - - cosmosv1 "github.com/strangelove-ventures/cosmos-operator/api/v1" - cosmosv1alpha1 "github.com/strangelove-ventures/cosmos-operator/api/v1alpha1" - //+kubebuilder:scaffold:imports -) - -func TestAPIs(t *testing.T) { - t.Skip("TODO: Implement test. Always skipping because of dependency issues.") - // unable to start control plane itself: failed to start the controlplane. retried 5 times: fork/exec /usr/local/kubebuilder/bin/etcd: no such file or directory - - testEnv := &envtest.Environment{ - CRDDirectoryPaths: []string{filepath.Join("..", "config", "crd", "bases")}, - ErrorIfCRDPathMissing: true, - } - - t.Cleanup(func() { - if err := testEnv.Stop(); err != nil { - t.Errorf("failed to stop test environment: %v", err) - } - }) - - cfg, err := testEnv.Start() - - require.NoError(t, err) - require.NotNil(t, cfg) - - err = cosmosv1.AddToScheme(scheme.Scheme) - require.NoError(t, err) - - err = cosmosv1alpha1.AddToScheme(scheme.Scheme) - require.NoError(t, err) - - //+kubebuilder:scaffold:scheme - - k8sClient, err := client.New(cfg, client.Options{Scheme: scheme.Scheme}) - require.NoError(t, err) - require.NotNil(t, k8sClient) -} From 8ee43d5e15ab1db593afd90de7dac94eec78cec6 Mon Sep 17 00:00:00 2001 From: David Nix Date: Fri, 22 Sep 2023 12:26:58 -0600 Subject: [PATCH 3/3] wip --- CONTRIBUTING.md | 4 +++ docs/architecture.md | 74 ++++++++++++++++++++++++++++++-------------- 2 files changed, 55 insertions(+), 23 deletions(-) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 801d0767..97078ee0 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -7,6 +7,10 @@ Run tests via: make test ``` +# Architecture + +For a high-level overview of the architecture, see [docs/architecture.md](./docs/architecture.md). + # Release Process Prereq: Write access to the repo. diff --git a/docs/architecture.md b/docs/architecture.md index 53fca5d8..4e4f0958 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -7,7 +7,7 @@ developers. The operator was written with the [kubebuilder](https://github.com/kubernetes-sigs/kubebuilder) framework. -Kubebuilder simplifies and provides abstractions for creating a controller. +Kubebuilder simplifies and provides abstractions for creating a Kubernetes controller. In a nutshell, an operator observes a [CRD](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/). Its job is to match @@ -20,14 +20,13 @@ Each controller implements a Reconcile method: Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) ``` -Unlike "built-in" controllers like Deployments or StatefulSets, operators are visible in the cluster. It is one pod +Unlike "built-in" controllers like Deployments or StatefulSets, operator controllers are visible in the cluster - one pod backed by a Deployment under the cosmos-operator-system namespace. A controller can watch resources outside of the CRD it manages. For example, CosmosFullNode watches for pod deletions, -so -it can spin up new pods if a user deletes one manually. +so it can spin up new pods if a user deletes one manually. -The watching of resources happens in this method for each controller: +The watching of resources is in this method for each controller: ```go SetupWithManager(ctx context.Context, mgr ctrl.Manager) error @@ -45,10 +44,13 @@ This directory contains the different CRDs. You should run `make generate manifests` each time you change CRDs. +A CI job should fail if you forget to run this command after modifying the api structs. + ### `config` directory -The config directory contains kustomize files. Strangelove uses these files to deploy the operator (instead of a helm -chart). A helm chart is still pending but presents challenges in keeping the kustomize and helm code in sync. +The config directory contains kustomize files generated by Kubebuilder. +Strangelove uses these files to deploy the operator (instead of a helm chart). +A helm chart is on the road map but presents challenges in keeping the kustomize and helm code in sync. ### `controllers` directory @@ -57,15 +59,9 @@ The controllers directory contains every controller. This directory is not unit tested. The code in controllers should act like `main()` functions where it's mostly wiring up of dependencies from `internal`. -Kubebuilder includes an integration test suite which you can see in `controllers/suite_test.go`. So far we ignore it. -Integration or e2e tests within the context of the test suite will be challenging given the dependency on a -blockchain network, such as syncing from peers, downloading snapshot, etc. Instead, I recommend a monitored staging -environment -which has continuous delivery. - ### `internal` directory -Almost all the business logic lives in `internal` and contains unit tests. +Almost all the business logic lives in `internal` and houses the unit and integration tests. # CosmosFullNode @@ -77,18 +73,23 @@ Each resource has its own builder and controller (referred as "control" in this see `pvc_builder.go` and `pvc_control.go` which only manages PVCs. All builders should have file suffix `_builder.go` and all control objects `_control.go`. -The "control" -pattern was loosely inspired by Kubernetes source code. +The most complex builder is `pod_builder.go`. There may be opportunities to refactor it. + +The "control" pattern was loosely inspired by Kubernetes source code. + +Within the controller's `Reconcile(...)` method, the controller determines the order of operations of the separate +Control objects. + +On process start, each Control is initialized with a Diff and a Builder. On each reconcile loop: -1. (On process start) control is initialized with a Diff and a Builder. -2. The builder builds the desired resources from the CRD. -3. Control fetches a list of existing resources. -4. Control uses Diff to compute a diff of the existing to the desired. -5. Control makes changes based on what Diff reports. +1. The Builder builds the desired resources from the CRD. +2. Control fetches a list of existing resources. +3. Control uses Diff to compute a diff of the existing to the desired. +4. Control makes changes based on what Diff reports. -The "control" tests are **integration tests** where we mock out the Kubernetes API, but not the Builder or Diff. The +The Control tests are *integration tests* where we mock out the Kubernetes API, but not the Builder or Diff. The tests run quickly (like unit tests) because we do not make any network calls. The Diff object (`type Diff[T client.Object] struct`) took several iterations to get right. There is probably little @@ -96,7 +97,8 @@ need to tweak it further. The hardest problem with diffing is determining updates. Essentially, Diff looks for a `Revision() string` method on the resource and sets a revision annotation. The revision is a simple fnv hash. It compares `Revision` to the existing annotation. -If different, we know it's an update. +If different, we know it's an update. We cannot compare equality of existing resources directly because Kubernetes adds additional +annotations and fields. Builders return a `diff.Resource[T]` which Diff can use. Therefore, Control does not need to adapt resources. @@ -118,6 +120,32 @@ Some controllers read the status to take action, so it's important to preserve t Therefore, you must use the special `SyncUpdate(...)` method from `fullnode.StatusClient`. It ensures updates are performed serially per CosmosFullNode. +### Sentries + +Sentries are special because you should not include a readiness probe due to the way Tendermint/Comet remote +signing works. + +The remote signer reaches out to the sentry on the privval port. This is the inverse of what you'd expect, the sentry +reaching out to the remote signer. + +If the sentry does not detect a remote signer connection, it crashes. And the stable way to connect to a pod is through +a Kube Service. So we have a chicken or egg problem. The sentry must be "ready" to be added to the Service, but the +remote signer must connect to the sentry through the Service so it doesn't crash. + +Therefore, the CosmosFullNode controller inspects Tendermint/Comet as part of its rolling update strategy - not just +pod readiness state. + +### CacheController + +The CacheController is special in that it does not manage a CRD. + +It periodically polls every pod for its Tendermint/Comet status such as block height. The polling is done in +the background. It's a controller because it needs the reconcile loop to update which pods it needs to poll. + +The CacheController prevents slow reconcile loops. Previously, we queried this status on every reconcile loop. + +When other controllers want Comet status, they always hit the cache controller. + # Scheduled Volume Snapshot Scheduled Volume Snapshot takes periodic backups.