Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signoz alert manager setup for SMTP #41

Merged
merged 33 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f6670a8
Set up SMTP settings with test email
BryanFauble Oct 9, 2024
e01bc24
Try wrapping port as string
BryanFauble Oct 9, 2024
bb98de7
Correct port
BryanFauble Oct 9, 2024
7879aa4
Set temporary data
BryanFauble Oct 9, 2024
1f483c0
Adding SES tf configuration
BryanFauble Oct 25, 2024
74ce072
Move module import
BryanFauble Oct 25, 2024
4df7140
Remove comma
BryanFauble Oct 25, 2024
a602155
Remvoe tag ref
BryanFauble Oct 25, 2024
d1aa349
Comment out import for now
BryanFauble Oct 25, 2024
f571fb0
Add imports back
BryanFauble Oct 25, 2024
0c528fd
Create smtp user and pass
BryanFauble Oct 25, 2024
5e1ca66
Pass along smtp user and password
BryanFauble Oct 25, 2024
229693b
Move to local
BryanFauble Oct 25, 2024
1762ba2
Correct var
BryanFauble Oct 25, 2024
80aa670
figure out what is empty string
BryanFauble Oct 25, 2024
27e3067
Correct passing along user id
BryanFauble Oct 25, 2024
2e28ad4
remove output
BryanFauble Oct 25, 2024
3e2295d
Check content of var
BryanFauble Oct 25, 2024
0a6dbb5
Correct ref
BryanFauble Oct 25, 2024
622e3d6
Correct variable passing
BryanFauble Oct 25, 2024
63aa1cd
Remove reading secret manager
BryanFauble Oct 25, 2024
77b002c
Correct to use STARTTLS port
BryanFauble Oct 25, 2024
d2d976c
Correct boolean/string
BryanFauble Oct 25, 2024
da51b70
Add to readme for aws ses
BryanFauble Oct 25, 2024
3ea4803
Update documentation on directory structure
BryanFauble Oct 25, 2024
b8c21e1
Remove self from SES
BryanFauble Oct 25, 2024
71699e9
Merge branch 'ibcdpe-1095-cluster-ingress-signoz' into signoz-alert-m…
BryanFauble Oct 30, 2024
b63caf2
[IBCDPE-1111] upgrade airflow (#42)
BryanFauble Oct 31, 2024
8586b1d
Point to updated config value for airflow regex deserialization (#43)
BryanFauble Oct 31, 2024
e77153e
Merge branch 'main' into signoz-alert-manager
BryanFauble Nov 4, 2024
9d18e49
Remove the need for email domains to be set
BryanFauble Nov 5, 2024
d616dc7
Move token lifecycle to 1.25 days from 1 day
BryanFauble Nov 5, 2024
556a4ba
[IBCDPE-1095] Signoz move to lets encrypt (#45)
BryanFauble Nov 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 33 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,32 @@ This repo is used to deploy an EKS cluster to AWS. CI/CD is managed through Spac
│ └── policies: Rego policies that can be attached to 0..* spacelift stacks
├── dev: Development/sandbox environment
│ ├── spacelift: Terraform scripts to manage spacelift resources
│ │ └── dpe-sandbox: Spacelift specific resources to manage the CI/CD pipeline
│ │ └── dpe-k8s/dpe-sandbox: Spacelift specific resources to manage the CI/CD pipeline
│ └── stacks: The deployable cloud resources
│ ├── dpe-auth0: Stack used to provision and setup auth0 IDP (Identity Provider) settings
│ ├── dpe-sandbox-k8s: K8s + supporting AWS resources
│ └── dpe-sandbox-k8s-deployments: Resources deployed inside of a K8s cluster
└── modules: Templatized collections of terraform resources that are used in a stack
├── apache-airflow: K8s deployment for apache airflow
│ └── templates: Resources used during deployment of airflow
├── argo-cd: K8s deployment for Argo CD, a declarative, GitOps continuous delivery tool for Kubernetes.
│ └── templates: Resources used during deployment of this helm chart
├── trivy-operator: K8s deployment for trivy, along with a few supporting charts for security scanning
│ └── templates: Resources used during deployment of these helm charts
├── victoria-metrics: K8s deployment for victoria metrics, a promethus like tool for cluster metric collection
│ └── templates: Resources used during deployment of these helm charts
├── cert-manager: Handles provisioning TLS certificates for the cluster
── envoy-gateway: API Gateway for the cluster securing and providing secure traffic into the cluster
├── postgres-cloud-native: Used to provision a postgres instance
── postgres-cloud-native-operator: Operator that manages the lifecycle of postgres instances on the cluster
├── demo-network-policies: K8s deployment for a demo showcasing how to use network policies
├── demo-pod-level-security-groups-strict: K8s deployment for a demo showcasing how to use pod level security groups in strict mode
├── sage-aws-eks: Sage specific EKS cluster for AWS
├── sage-aws-eks-addons: Sets up additional resources that need to be installed post creation of the EKS cluster
├── sage-aws-k8s-node-autoscaler: K8s node autoscaler using spotinst ocean
└── sage-aws-vpc: Sage specific VPC for AWS
├── sage-aws-ses: AWS SES (Simple email service) setup
├── sage-aws-vpc: Sage specific VPC for AWS
├── signoz: SigNoz provides APM, logs, traces, metrics, exceptions, & alerts in a single tool
├── trivy-operator: K8s deployment for trivy, along with a few supporting charts for security scanning
│ └── templates: Resources used during deployment of these helm charts
├── victoria-metrics: K8s deployment for victoria metrics, a promethus like tool for cluster metric collection
│ └── templates: Resources used during deployment of these helm charts
```

This root `main.tf` contains all the "Things" that are going to be deployed.
Expand Down Expand Up @@ -283,10 +291,27 @@ This document describes the abbreviated process below:
"iam:*PolicyVersion",
"iam:*OpenIDConnectProvider",
"iam:*InstanceProfile",
"iam:ListPolicyVersions"
"iam:ListPolicyVersions",
"iam:ListGroupsForUser",
"iam:ListAttachedUserPolicies"
],
"Resource": "*"
}
},
{
"Effect": "Allow",
"Action": [
"iam:CreateUser",
"iam:AttachUserPolicy",
"iam:ListPolicies",
"iam:TagUser",
"iam:GetUser",
"iam:DeleteUser",
"iam:CreateAccessKey",
"iam:ListAccessKeys",
"iam:DeleteAccessKeys"
],
"Resource": "arn:aws:iam::{{AWS ACCOUNT ID}}:user/smtp_user"
}
]
}
```
Expand Down
7 changes: 6 additions & 1 deletion deployments/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,11 @@ module "dpe-sandbox-spacelift-development" {

enable_cluster_ingress = true
enable_otel_ingress = true
ssl_hostname = "a09a38cc5a8d6497ea69c6bf6318701b-1974793757.us-east-1.elb.amazonaws.com"
ssl_hostname = "dev.sagedpe.org"
auth0_jwks_uri = "https://dev-sage-dpe.us.auth0.com/.well-known/jwks.json"
ses_email_identities = ["aws-dpe-dev@sagebase.org"]
# Defines the email address that will be used as the sender of the email alerts
smtp_from = "aws-dpe-dev@sagebase.org"
}

module "dpe-sandbox-spacelift-production" {
Expand Down Expand Up @@ -115,4 +118,6 @@ module "dpe-sandbox-spacelift-production" {
enable_otel_ingress = false
ssl_hostname = ""
auth0_jwks_uri = ""

ses_email_identities = []
}
12 changes: 8 additions & 4 deletions deployments/spacelift/dpe-k8s/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ locals {
private_subnet_cidrs_eks_worker_nodes = var.private_subnet_cidrs_eks_worker_nodes
azs_eks_control_plane = var.azs_eks_control_plane
azs_eks_worker_nodes = var.azs_eks_worker_nodes
ses_email_identities = var.ses_email_identities
}

k8s_stack_deployments_variables = {
Expand All @@ -25,12 +26,13 @@ locals {
enable_otel_ingress = var.enable_otel_ingress
ssl_hostname = var.ssl_hostname
auth0_jwks_uri = var.auth0_jwks_uri
smtp_from = var.smtp_from
}

auth0_stack_variables = {
cluster_name = var.cluster_name
auth0_domain = var.auth0_domain
auth0_clients = var.auth0_clients
cluster_name = var.cluster_name
auth0_domain = var.auth0_domain
auth0_clients = var.auth0_clients
}

# Variables to be passed from the k8s stack to the deployments stack
Expand All @@ -39,6 +41,8 @@ locals {
private_subnet_ids_eks_worker_nodes = "TF_VAR_private_subnet_ids_eks_worker_nodes"
node_security_group_id = "TF_VAR_node_security_group_id"
pod_to_node_dns_sg_id = "TF_VAR_pod_to_node_dns_sg_id"
smtp_user = "TF_VAR_smtp_user"
smtp_password = "TF_VAR_smtp_password"
}
}

Expand Down Expand Up @@ -250,4 +254,4 @@ resource "spacelift_environment_variable" "auth0-stack-environment-variables" {
name = "TF_VAR_${each.key}"
value = try(tostring(each.value), jsonencode(each.value))
write_only = false
}
}
25 changes: 24 additions & 1 deletion deployments/spacelift/dpe-k8s/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -180,4 +180,27 @@ variable "auth0_clients" {
description = string
app_type = string
}))
}
}

variable "ses_email_identities" {
type = list(string)
description = "List of email identities to be added to SES"
}

variable "smtp_user" {
description = "The SMTP user. Required if smtp_user, smtp_password, and smtp_from are set"
type = string
default = ""
}

variable "smtp_password" {
description = "The SMTP password. Required if smtp_user, smtp_password, and smtp_from are set"
type = string
default = ""
}

variable "smtp_from" {
description = "The SMTP from address. Required if smtp_user, smtp_password, and smtp_from are set"
type = string
default = ""
}
7 changes: 5 additions & 2 deletions deployments/stacks/dpe-auth0/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,11 @@ resource "auth0_resource_server" "k8s-cluster-telemetry" {
identifier = "${var.cluster_name}-telemetry"
signing_alg = "RS256"

allow_offline_access = false
token_lifetime = 86400
allow_offline_access = false
# 108000 seconds = 1.25 days
# An offset of 1.25 days allows a daily token refresh to occur by simple cronjob
# for the services that use the token
token_lifetime = 108000
skip_consent_for_verifiable_first_party_clients = true
# https://registry.terraform.io/providers/auth0/auth0/latest/docs/resources/resource_server_scopes
# Says to use the following, however it errors out:
Expand Down
10 changes: 5 additions & 5 deletions deployments/stacks/dpe-k8s-deployments/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,9 @@ module "signoz" {
gateway_namespace = "envoy-gateway"
cluster_name = var.cluster_name
auth0_jwks_uri = var.auth0_jwks_uri
smtp_password = var.smtp_password
smtp_user = var.smtp_user
smtp_from = var.smtp_from
}

module "envoy-gateway" {
Expand All @@ -102,11 +105,8 @@ module "envoy-gateway" {
git_revision = var.git_revision
namespace = "envoy-gateway"
argo_deployment_name = "envoy-gateway"
cluster_issuer_name = "selfsigned"
# To determine more elegant ways to fill in these values, for example, if we have
# a pre-defined DNS name for the cluster (https://sagebionetworks.jira.com/browse/IT-3931)
ssl_hostname = var.ssl_hostname
auth0_jwks_uri = var.auth0_jwks_uri
cluster_issuer_name = "lets-encrypt-prod"
ssl_hostname = var.ssl_hostname
}

module "cert-manager" {
Expand Down
18 changes: 18 additions & 0 deletions deployments/stacks/dpe-k8s-deployments/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -85,3 +85,21 @@ variable "auth0_jwks_uri" {
description = "The JWKS URI for Auth0"
type = string
}

variable "smtp_user" {
description = "The SMTP user. Required if smtp_user, smtp_password, and smtp_from are set"
type = string
default = ""
}

variable "smtp_password" {
description = "The SMTP password. Required if smtp_user, smtp_password, and smtp_from are set"
type = string
default = ""
}

variable "smtp_from" {
description = "The SMTP from address. Required if smtp_user, smtp_password, and smtp_from are set"
type = string
default = ""
}
6 changes: 6 additions & 0 deletions deployments/stacks/dpe-k8s/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -36,3 +36,9 @@ module "sage-aws-eks" {
private_subnet_ids_eks_control_plane = module.sage-aws-vpc.private_subnet_ids_eks_control_plane
private_subnet_ids_eks_worker_nodes = module.sage-aws-vpc.private_subnet_ids_eks_worker_nodes
}

module "sage-aws-ses" {
source = "../../../modules/sage-aws-ses"

email_identities = var.ses_email_identities
}
9 changes: 9 additions & 0 deletions deployments/stacks/dpe-k8s/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,12 @@ output "region" {
output "cluster_name" {
value = module.sage-aws-eks.cluster_name
}

output "smtp_user" {
value = module.sage-aws-ses.smtp_user
}

output "smtp_password" {
sensitive = true
value = module.sage-aws-ses.smtp_password
}
5 changes: 5 additions & 0 deletions deployments/stacks/dpe-k8s/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -54,3 +54,8 @@ variable "azs_eks_worker_nodes" {
type = list(string)
description = "Availability Zones for the EKS worker nodes"
}

variable "ses_email_identities" {
type = list(string)
description = "List of email identities to be added to SES"
}
18 changes: 17 additions & 1 deletion modules/apache-airflow/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,20 @@ YAML
## Accessing the web UI
An `admin` user is created for airflow via the `airflow-admin-user-secret` secret that
is added to the namespace. Decode the base64 encoded password/username and use it for
the UI.
the UI.

## Building a new image for airflow
The deployment of our airflow instance depends on a custom apache airflow image being
created and pushed to a public available GCHR url. The image is created from the
`orca-recipes` git repo: <https://github.com/Sage-Bionetworks-Workflows/orca-recipes/tree/main>

1. Update the dockerfile within the orca-recipes repo
2. Build the new image `docker build .`
3. Tag the build image with the tag you want to use `docker tag sha256:... ghcr.io/sage-bionetworks-workflows/orca-recipes:0.0.1`
4. Push to GCHR `docker push ghcr.io/sage-bionetworks-workflows/orca-recipes:0.0.1` (May require an admin of the repo to push this)
5. Update the `values.yaml` file in this `modules/apache-airflow/templates` directory.

Transitive dependencies may also need to be updated when building a new image for
airflow, for example `py-orca` was updated in this example PR: <https://github.com/Sage-Bionetworks-Workflows/py-orca/pull/45>.
Additionally, this PR covers what was completed in order to update the
requirements/dockerfile: <https://github.com/Sage-Bionetworks-Workflows/orca-recipes/pull/71>.
2 changes: 1 addition & 1 deletion modules/apache-airflow/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ spec:
sources:
- repoURL: 'https://airflow.apache.org'
chart: airflow
targetRevision: 1.11.0
targetRevision: 1.15.0
helm:
releaseName: airflow
valueFiles:
Expand Down
Loading