Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SCHEMATIC-138] SigNoz cold storage and backups #47

Merged
merged 136 commits into from
Nov 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
e000497
Reset back to working config
BryanFauble Nov 6, 2024
53732c4
Move stack to development space
BryanFauble Nov 7, 2024
7cfa68a
Try with import after manually creating spacelift space
BryanFauble Nov 7, 2024
21f8651
Move admin stack to root
BryanFauble Nov 7, 2024
cd45c11
change smtp outputs to empty strings
BWMac Nov 7, 2024
1c4c0c3
Deploy fluxcd as an alternative to ArgoCD for kubernetes deployments
BryanFauble Nov 7, 2024
3f0325c
Deploy flux to stack
BryanFauble Nov 7, 2024
7ea14bc
correct directories
BryanFauble Nov 7, 2024
cc17c33
correct versions
BryanFauble Nov 7, 2024
82e10cc
authenticate to github oci
BryanFauble Nov 7, 2024
7b2de59
Update variables filefor ghcr token
BryanFauble Nov 7, 2024
59f9532
Correct helm chart version
BryanFauble Nov 7, 2024
37c8e43
Swap away from OCI for flux
BryanFauble Nov 7, 2024
7fb70f1
Deploy kustomization resource separately
BryanFauble Nov 7, 2024
5e3e694
Correct my flipped logic
BryanFauble Nov 7, 2024
8454113
Attempt deployment of signoz through fluxcd
BryanFauble Nov 7, 2024
4644aa4
relative file reference
BryanFauble Nov 7, 2024
ea30927
TRY ANOTHER FILE FORMAT
BryanFauble Nov 7, 2024
981c863
Deploy weave and correct signoz values
BryanFauble Nov 7, 2024
0e6e868
deploy configmaps to namespace
BryanFauble Nov 7, 2024
33c4255
Create temp password hash
BryanFauble Nov 7, 2024
54d45de
adds clickhouse backup w/ s3 bucket
BWMac Nov 8, 2024
0e1f95e
specify region
BWMac Nov 8, 2024
4a32bf5
assign smtp_ vars
BWMac Nov 8, 2024
045c78f
sets variables in yaml
BWMac Nov 8, 2024
b24df34
revert first attempt - start fresh
BWMac Nov 11, 2024
0997a75
adds s3-bucket stack
BWMac Nov 11, 2024
c541d39
fixes module path
BWMac Nov 11, 2024
70770ce
move clickhouse backup to k8s deployments
BWMac Nov 11, 2024
8b0791a
fixes module path
BWMac Nov 11, 2024
be39eb1
give bucket unique name
BWMac Nov 11, 2024
9c50599
adds IAM role for s3 access
BWMac Nov 11, 2024
87519d6
adds clickhouse backup job
BWMac Nov 11, 2024
63b50b4
removes redundant aws secrets
BWMac Nov 11, 2024
82b3956
passes through aws account id
BWMac Nov 11, 2024
8aae4dd
adds aws account id var to signoz-fluxcd module
BWMac Nov 11, 2024
7d8d4b6
updates cronjob config
BWMac Nov 12, 2024
61c5855
try adding default
BWMac Nov 12, 2024
d294250
create clickhouse config
BWMac Nov 12, 2024
ae851ca
try combining config maps
BWMac Nov 13, 2024
d0ba693
creates test pod for S3 connection
BWMac Nov 13, 2024
c500a62
fixes service account name
BWMac Nov 13, 2024
7121fdc
updates iam role access
BWMac Nov 13, 2024
ac90e5b
updates IAM role access
BWMac Nov 13, 2024
575d9f6
revert role resource name change
BWMac Nov 13, 2024
d0f57cf
adds OIDC provider
BWMac Nov 13, 2024
0ec3d46
uses exustung OIDC provider
BWMac Nov 13, 2024
9de97da
creates oidc provider
BWMac Nov 13, 2024
76e0d7e
hard-code OIDC url for testing
BWMac Nov 13, 2024
c45595b
try using eks module output
BWMac Nov 13, 2024
85aa485
try sage-aws-eks module
BWMac Nov 13, 2024
936319c
adds cluster_id to deployment for OIDC provider
BWMac Nov 13, 2024
9da27e3
changes to using oidc provider arn directly (infra)
BWMac Nov 13, 2024
d923645
changes to using OIDC arn directly (deployments)
BWMac Nov 13, 2024
2f45ef4
remove oidc creation
BWMac Nov 13, 2024
0e62084
cleans up code
BWMac Nov 14, 2024
72cf224
revert bucket name
BWMac Nov 14, 2024
08a11c2
reapply new bucket name
BWMac Nov 14, 2024
015bb44
test clickhouse backup deploy
BWMac Nov 14, 2024
63f836d
adds resource patch
BWMac Nov 14, 2024
3bb64cd
adds missing - indicator
BWMac Nov 14, 2024
eb52f79
fix patch indenting
BWMac Nov 14, 2024
02fd043
updates clickhouse bucket
BWMac Nov 14, 2024
c31de32
reverts clickhouse backup - try to get back to stable
BWMac Nov 14, 2024
6525d08
updates test command
BWMac Nov 14, 2024
ac745b9
Try deploying out lovely-plugin as argocd side car
BryanFauble Nov 14, 2024
62e1c1c
Revert "Try deploying out lovely-plugin as argocd side car"
BryanFauble Nov 14, 2024
82caef2
revert aws command change
BWMac Nov 14, 2024
e26e827
test deployment update
BWMac Nov 14, 2024
a97c5fc
revert service change
BWMac Nov 14, 2024
ed0a9d1
try adding sidecar with postrenders
BWMac Nov 14, 2024
04392c6
removes comments
BWMac Nov 14, 2024
eca6bc3
try add deploy
BWMac Nov 14, 2024
96e8704
fixes indent
BWMac Nov 14, 2024
615e056
updates target to statefulset
BWMac Nov 14, 2024
cc6eb7c
targets ClickHouseInstallation
BWMac Nov 15, 2024
72ecbb7
try adding in service account patch
BWMac Nov 15, 2024
a1c6214
removes s3-test pod
BWMac Nov 15, 2024
fc9a494
fixes pvc labelling
BWMac Nov 15, 2024
4e15386
removes service account for now
BWMac Nov 15, 2024
6fc9812
reverts to last successful HelmRelease
BWMac Nov 15, 2024
7961869
change pvc name
BWMac Nov 15, 2024
8199163
frees up ports
BWMac Nov 15, 2024
387efd4
removes additional volume patch
BWMac Nov 15, 2024
5c6ef4a
simplifies patch
BWMac Nov 15, 2024
d4eadc1
removes password overwrite
BWMac Nov 15, 2024
eb9808a
remove incorrect command
BWMac Nov 15, 2024
4e78698
updates backup config
BWMac Nov 15, 2024
7abaf9d
moves annotation to helmrelease block
BWMac Nov 15, 2024
3bc4934
sets WATCH_INTERVAL = 8h
BWMac Nov 15, 2024
8b5d11b
Remove weave gitops in favor of a simpler deployment of capacitor
BryanFauble Nov 15, 2024
f1283e9
configures cold storage
BWMac Nov 15, 2024
42bbf6c
enable role for coldStorage
BWMac Nov 15, 2024
9928204
Swap to revision strategy
BryanFauble Nov 15, 2024
ed4f3a4
Try flipped off helm hooks
BryanFauble Nov 15, 2024
291643a
adds --watch flag
BWMac Nov 18, 2024
3cd8635
adds CopyObject permissions to IAM role
BWMac Nov 18, 2024
6337f7c
Add victoria metrics service scrape on clickhouse operator
BryanFauble Nov 18, 2024
1b43ece
Correct ref
BryanFauble Nov 18, 2024
40f3fe4
Covert over related SMTP config to FluxCD deployment
BryanFauble Nov 18, 2024
35a0cc5
revert permission update
BWMac Nov 18, 2024
1b41132
Point to use east
BryanFauble Nov 19, 2024
30f42d5
adds region to storage config
BWMac Nov 19, 2024
9e8d682
patches xml file
BWMac Nov 19, 2024
9ae1528
updates path
BWMac Nov 19, 2024
11b2d63
try replacing storage.xml
BWMac Nov 19, 2024
98aa032
remove region from endpoint
BWMac Nov 19, 2024
b6fca8b
Patch kustomize path
BryanFauble Nov 19, 2024
7c6208a
updates coldstorage folder structure, adds S3_OBJECT_DISK_PATH env
BWMac Nov 19, 2024
831f6df
sets BACKUPS_TO_KEEP_LOCAL to 1
BWMac Nov 20, 2024
6ac4780
updates signoz readme with backup info
BWMac Nov 20, 2024
64b854e
Remove unused moved blocks
BryanFauble Nov 6, 2024
46dce57
[IBCDPE-1095] Use scope based authorization on telemetry upload route…
BryanFauble Nov 19, 2024
a2240c1
Default to empty string
BryanFauble Nov 20, 2024
29bd314
updates documentation for fluxcd
BWMac Nov 20, 2024
a835490
Remove stack destructors as theyre broken on the free tier
BryanFauble Nov 20, 2024
4959fd2
adds s3-bucket readme
BWMac Nov 20, 2024
2ab1a75
Delete temp stack, make s3 module generic, mirgrate signoz module
BryanFauble Nov 20, 2024
530186a
updates s3 readme
BWMac Nov 20, 2024
e5f7869
Correct resource references
BryanFauble Nov 20, 2024
1495779
Correct arn reference
BryanFauble Nov 20, 2024
bcc95d9
Shorten iam role name
BryanFauble Nov 20, 2024
a676d85
Correct ingress patches
BryanFauble Nov 20, 2024
6a0bc34
Point all to local branch
BryanFauble Nov 20, 2024
c577362
Remove extra service account that is not needed
BryanFauble Nov 20, 2024
50ad21f
Support disabled bucket versioning
BryanFauble Nov 20, 2024
142b78f
Notes for setup
BryanFauble Nov 20, 2024
4f92063
Point back to main
BryanFauble Nov 20, 2024
92031b9
Update comment
BryanFauble Nov 20, 2024
4997198
[IBCDPE-1095] Use scope based authorization on telemetry upload route…
BryanFauble Nov 19, 2024
4da45d6
Merge branch 'signoz-testing' into schematic-138-cold-storage-and-bac…
BryanFauble Nov 20, 2024
27097fa
Generic policy name
BryanFauble Nov 20, 2024
8b725ab
Default bucket tags
BryanFauble Nov 20, 2024
99c45ff
Correct path
BryanFauble Nov 20, 2024
6204892
fixes minor typos
BWMac Nov 21, 2024
ab12fc9
updates s3 bucket docs
BWMac Nov 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ allow us to review for any security advisories.

### Deploying an application to the kubernetes cluster
Deployment of applications to the kubernetes cluster is handled through the combination
of terraform (.tf) scripts, spacelift (CICD tool), and ArgoCd (Declarative definitions
of terraform (.tf) scripts, spacelift (CICD tool), and ArgoCd or Flux CD (Declarative definitions
for applications).

To start of the deployment journey the first step is to create a new terraform module
Expand Down
32 changes: 1 addition & 31 deletions deployments/spacelift/dpe-k8s/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ locals {
pod_to_node_dns_sg_id = "TF_VAR_pod_to_node_dns_sg_id"
smtp_user = "TF_VAR_smtp_user"
smtp_password = "TF_VAR_smtp_password"
cluster_oidc_provider_arn = "TF_VAR_cluster_oidc_provider_arn"
}
}

Expand Down Expand Up @@ -178,31 +179,6 @@ resource "spacelift_stack_dependency_reference" "cluster-name" {
# stack_id = spacelift_stack.k8s-stack.id
# }

resource "spacelift_stack_destructor" "k8s-stack-deployments-destructor" {
depends_on = [
spacelift_stack.k8s-stack,
spacelift_aws_integration_attachment.k8s-deployments-aws-integration-attachment,
spacelift_context_attachment.k8s-kubeconfig-hooks,
spacelift_stack_dependency_reference.cluster-name,
spacelift_stack_dependency_reference.region-name,
spacelift_environment_variable.k8s-stack-deployments-environment-variables
]

stack_id = spacelift_stack.k8s-stack-deployments.id
}

resource "spacelift_stack_destructor" "k8s-stack-destructor" {
depends_on = [
spacelift_aws_integration_attachment.k8s-aws-integration-attachment,
spacelift_context_attachment.k8s-kubeconfig-hooks,
spacelift_stack_dependency_reference.cluster-name,
spacelift_stack_dependency_reference.region-name,
spacelift_environment_variable.k8s-stack-environment-variables
]

stack_id = spacelift_stack.k8s-stack.id
}

resource "spacelift_aws_integration_attachment" "k8s-aws-integration-attachment" {
integration_id = var.aws_integration_id
stack_id = spacelift_stack.k8s-stack.id
Expand Down Expand Up @@ -244,12 +220,6 @@ resource "spacelift_stack" "auth0" {
]
}

resource "spacelift_stack_destructor" "auth0-stack-destructor" {
count = var.deploy_auth0 ? 1 : 0
stack_id = spacelift_stack.auth0[0].id
}


resource "spacelift_environment_variable" "auth0-stack-environment-variables" {
depends_on = [
spacelift_stack.auth0
Expand Down
67 changes: 43 additions & 24 deletions deployments/stacks/dpe-k8s-deployments/main.tf
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
locals {
git_revision = var.git_revision
}
module "sage-aws-eks-autoscaler" {
source = "spacelift.io/sagebionetworks/sage-aws-eks-autoscaler/aws"
version = "0.9.0"
Expand Down Expand Up @@ -26,13 +29,19 @@ module "argo-cd" {
source = "../../../modules/argo-cd"
}

module "flux-cd" {
depends_on = [module.sage-aws-eks-autoscaler]
source = "../../../modules/flux-cd"
}

module "victoria-metrics" {
depends_on = [module.argo-cd]
source = "spacelift.io/sagebionetworks/victoria-metrics/aws"
version = "0.4.8"
depends_on = [module.argo-cd]
# source = "spacelift.io/sagebionetworks/victoria-metrics/aws"
# version = "0.4.8"
source = "../../../modules/victoria-metrics"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
git_revision = local.git_revision
}

module "trivy-operator" {
Expand All @@ -41,7 +50,7 @@ module "trivy-operator" {
version = "0.3.2"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
git_revision = local.git_revision
}

module "airflow" {
Expand All @@ -50,7 +59,7 @@ module "airflow" {
version = "0.4.0"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
git_revision = local.git_revision
namespace = "airflow"
}

Expand All @@ -60,7 +69,7 @@ module "postgres-cloud-native-operator" {
version = "0.4.0"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
git_revision = local.git_revision
}

module "postgres-cloud-native-database" {
Expand All @@ -69,30 +78,40 @@ module "postgres-cloud-native-database" {
version = "0.5.0"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
git_revision = local.git_revision
namespace = "airflow"
argo_deployment_name = "airflow-postgres-cloud-native"
}

module "clickhouse-backup-bucket" {
source = "../../../modules/s3-bucket"
bucket_name = "clickhouse-backup-${var.aws_account_id}-${var.cluster_name}"
enable_versioning = false
aws_account_id = var.aws_account_id
cluster_name = var.cluster_name
cluster_oidc_provider_arn = var.cluster_oidc_provider_arn
}

module "signoz" {
depends_on = [module.argo-cd]
# source = "spacelift.io/sagebionetworks/postgres-cloud-native-database/aws"
# version = "0.5.0"
source = "../../../modules/signoz"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
namespace = "signoz"
argo_deployment_name = "signoz"
enable_otel_ingress = var.enable_otel_ingress && var.enable_cluster_ingress
gateway_namespace = "envoy-gateway"
cluster_name = var.cluster_name
auth0_jwks_uri = var.auth0_jwks_uri
smtp_password = var.smtp_password
smtp_user = var.smtp_user
smtp_from = var.smtp_from
auth0_identifier = var.auth0_identifier
source = "../../../modules/signoz"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = local.git_revision
namespace = "signoz"
argo_deployment_name = "signoz"
enable_otel_ingress = var.enable_otel_ingress && var.enable_cluster_ingress
gateway_namespace = "envoy-gateway"
cluster_name = var.cluster_name
auth0_jwks_uri = var.auth0_jwks_uri
smtp_password = var.smtp_password
smtp_user = var.smtp_user
smtp_from = var.smtp_from
auth0_identifier = var.auth0_identifier
s3_backup_bucket_name = module.clickhouse-backup-bucket.bucket_name
s3_access_role_arn = module.clickhouse-backup-bucket.access_role_arn
}

module "envoy-gateway" {
Expand All @@ -103,7 +122,7 @@ module "envoy-gateway" {
source = "../../../modules/envoy-gateway"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
git_revision = local.git_revision
namespace = "envoy-gateway"
argo_deployment_name = "envoy-gateway"
cluster_issuer_name = "lets-encrypt-prod"
Expand All @@ -118,7 +137,7 @@ module "cert-manager" {
source = "../../../modules/cert-manager"
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_revision
git_revision = local.git_revision
namespace = "cert-manager"
argo_deployment_name = "cert-manager"
}
5 changes: 5 additions & 0 deletions deployments/stacks/dpe-k8s-deployments/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ variable "cluster_name" {
type = string
}

variable "cluster_oidc_provider_arn" {
description = "EKS cluster ARN for the oidc provider"
type = string
}

variable "spotinst_account" {
description = "Spot.io account"
type = string
Expand Down
8 changes: 6 additions & 2 deletions deployments/stacks/dpe-k8s/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,15 @@ output "cluster_name" {
value = module.sage-aws-eks.cluster_name
}

output "cluster_oidc_provider_arn" {
value = module.sage-aws-eks.cluster_oidc_provider_arn
}

output "smtp_user" {
value = length(module.sage-aws-ses) > 0 ? module.sage-aws-ses[0].smtp_user : null
value = length(module.sage-aws-ses) > 0 ? module.sage-aws-ses[0].smtp_user : ""
}

output "smtp_password" {
sensitive = true
value = length(module.sage-aws-ses) > 0 ? module.sage-aws-ses[0].smtp_password : null
value = length(module.sage-aws-ses) > 0 ? module.sage-aws-ses[0].smtp_password : ""
}
56 changes: 56 additions & 0 deletions modules/flux-cd/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Purpose
This module is used to deploy the `Flux CD` [helm chart](https://fluxcd-community.github.io/helm-charts) to the cluster. [`Flux CD`](https://fluxcd.io/) is a GitOps tool used to manage the application lifecycle on a Kubernetes cluster. It was originally deployed because unlike `Argo CD`, it supports the use of `postRenderers` which are used to apply any additional changes to the application after it has been deployed, and were needed to be used to deploy the `clickhouse-backup` sidecar container to the `signoz` helm release. We do not plan to move all existing applications to using `Flux CD` at this time, but it is available and preferred to be used for any new applications added to the cluster.

## What resources are being deployed through this module
In addition to a `helm_release` which deploys the `Flux CD` helm chart, this module also creates a `capacitor` resource which is used as the frontend for `Flux CD`.

## Accessing the Flux CD UI
To access the `Flux CD` UI, you only need to port-forward the `capacitor` pod and access it in your browser.

# Deploying an application with Flux CD
To deploy an application with `Flux CD`, you will need to create a `HelmRepository` resource which points to the helm chart you want to deploy. In that resource definition, you will set the `apiVersion` to `source.toolkit.fluxcd.io/v1` and the `kind` to `HelmRepository`. For example (code from the `signoz` module):

```
resource "kubectl_manifest" "signoz-helm-repo" {
depends_on = [kubernetes_namespace.signoz]

yaml_body = <<YAML
apiVersion: source.toolkit.fluxcd.io/v1
kind: HelmRepository
metadata:
name: signoz
namespace: ${var.namespace}
spec:
interval: 24h
url: https://charts.signoz.io
YAML
}
```

In your `Deployment` or `HelmRelease` resource, you will need to add a similar configuration, for example (again from the `signoz` module):
```
resource "kubectl_manifest" "signoz-helm-release" {
depends_on = [kubernetes_namespace.signoz]

yaml_body = <<YAML
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: signoz
namespace: ${var.namespace}
spec:
interval: 10m
chart:
spec:
chart: signoz
version: '0.55.1'
sourceRef:
kind: HelmRepository
name: signoz
namespace: ${var.namespace}
interval: 10m
reconcileStrategy: Revision
...
YAML
}
```
56 changes: 56 additions & 0 deletions modules/flux-cd/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
resource "kubernetes_namespace" "flux-system" {
metadata {
name = "flux-system"
}
}

resource "helm_release" "fluxcd" {
name = "flux2"
repository = "https://fluxcd-community.github.io/helm-charts"
chart = "flux2"
namespace = "flux-system"
version = "2.14.0"
depends_on = [kubernetes_namespace.flux-system]

values = [templatefile("${path.module}/templates/values.yaml", {})]
}

resource "kubectl_manifest" "capacitor" {
depends_on = [helm_release.fluxcd]

yaml_body = <<YAML
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: OCIRepository
metadata:
name: capacitor
namespace: flux-system
spec:
interval: 12h
url: oci://ghcr.io/gimlet-io/capacitor-manifests
ref:
semver: ">=0.1.0"
YAML
}

resource "kubectl_manifest" "capacitor-kustomization" {
depends_on = [helm_release.fluxcd]

yaml_body = <<YAML
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: capacitor
namespace: flux-system
spec:
targetNamespace: flux-system
interval: 1h
retryInterval: 2m
timeout: 5m
wait: true
prune: true
path: "./"
sourceRef:
kind: OCIRepository
name: capacitor
YAML
}
Loading