Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IBCDPE-1095] Secure cluster ingress for telemetry data #40

Merged
merged 82 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from 70 commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
b141b73
Shrink VPC size and create subnets specifically for worker nodes that…
BryanFauble Oct 17, 2024
948004d
Add back var
BryanFauble Oct 17, 2024
5449ea5
Correct cidr block
BryanFauble Oct 17, 2024
750d79f
Update cidr blocks
BryanFauble Oct 17, 2024
dd796a1
Correct node lengths
BryanFauble Oct 18, 2024
6f7ab29
Correct array slicing
BryanFauble Oct 18, 2024
9e1c36f
Correct indexing
BryanFauble Oct 18, 2024
ac3303e
Update default eks cluster version
BryanFauble Oct 18, 2024
55344c3
Shrink EKS control plane subnet range
BryanFauble Oct 21, 2024
a38e7bc
Set range back
BryanFauble Oct 21, 2024
877e506
Enable otel collector ingress with Auth0
BryanFauble Oct 21, 2024
8da96b6
Set ssl hostname
BryanFauble Oct 21, 2024
da9b28a
Try a reference grant with http route
BryanFauble Oct 21, 2024
4a365ad
Try without url re-write
BryanFauble Oct 21, 2024
8da3838
Try exact url matching
BryanFauble Oct 21, 2024
41488c7
Correct service name in reference grant
BryanFauble Oct 21, 2024
370a4b7
Set up merged gateway deployment
BryanFauble Oct 22, 2024
c2a4bf8
Correct vars
BryanFauble Oct 22, 2024
7d32e8f
Remove namespaced deployment of http-route
BryanFauble Oct 22, 2024
5f77316
Move ingress to signoz specific deployment
BryanFauble Oct 22, 2024
b4b9814
Reference paramref by ns
BryanFauble Oct 22, 2024
ef64277
Point to updated hostname
BryanFauble Oct 22, 2024
6807d1c
Test deploying a temp merged gateway
BryanFauble Oct 22, 2024
cc3006d
Try with url rewrite for /telemetry
BryanFauble Oct 22, 2024
f83e9c7
Go back to referenceGrant approach
BryanFauble Oct 22, 2024
8c0c3a4
Leave filter off for now
BryanFauble Oct 22, 2024
42689d4
Deploy public subnet to all AZs
BryanFauble Oct 22, 2024
2d90747
juggle cidrs
BryanFauble Oct 22, 2024
b12f897
Move resources
BryanFauble Oct 22, 2024
883cbea
Try import (dev)
BryanFauble Oct 22, 2024
360bfa8
Update subnet id
BryanFauble Oct 22, 2024
ce6181f
Update import format
BryanFauble Oct 22, 2024
7803ea1
let rtbs be re-created
BryanFauble Oct 22, 2024
a4aa6b5
Update rtbassoc imports
BryanFauble Oct 22, 2024
ae6941d
import migration for dpe prod stack
BryanFauble Oct 22, 2024
6aa1d87
Update to reference service in another namespace
BryanFauble Oct 22, 2024
bae392a
Centralize the envoy gateway module
BryanFauble Oct 22, 2024
9d9a0d4
Update dns name
BryanFauble Oct 22, 2024
8e3cbbe
url rewrite, moving configuration up, adding to readme
BryanFauble Oct 22, 2024
bc0d932
Update url-rewrite
BryanFauble Oct 22, 2024
89205fc
Cleanup
BryanFauble Oct 22, 2024
7b4ee55
Add count for cert-manager
BryanFauble Oct 22, 2024
629a87c
Test example oauth tf provider integration
BryanFauble Oct 22, 2024
9205aab
moved blocks
BryanFauble Oct 22, 2024
551e708
remove moved blocks
BryanFauble Oct 22, 2024
902cfe8
Test other resource
BryanFauble Oct 22, 2024
b957e93
add client
BryanFauble Oct 22, 2024
07b27dd
Set up resource server
BryanFauble Oct 22, 2024
de5546d
Create client grant to resource server
BryanFauble Oct 22, 2024
2f58a9d
Remove count
BryanFauble Oct 22, 2024
b7bdb04
use resource scopes
BryanFauble Oct 22, 2024
39daf08
remove clients
BryanFauble Oct 22, 2024
82719ac
correct name
BryanFauble Oct 22, 2024
1d6a9bf
Create separate auth0 stack
BryanFauble Oct 23, 2024
bebd8c0
Add scopes
BryanFauble Oct 23, 2024
824f496
Add version back to support removal of resources
BryanFauble Oct 23, 2024
736afb7
Correct stack id
BryanFauble Oct 23, 2024
ea1c577
Correct reference
BryanFauble Oct 23, 2024
a10152b
Correct reference
BryanFauble Oct 23, 2024
d8ef6b5
comment out
BryanFauble Oct 23, 2024
3907416
correct array
BryanFauble Oct 23, 2024
a0485d6
Create security policy to enforce jwt scopes
BryanFauble Oct 23, 2024
e1b1d28
Apply to all cidrs
BryanFauble Oct 23, 2024
db314d7
Revert "Apply to all cidrs"
BryanFauble Oct 23, 2024
d1c5470
Revert "Create security policy to enforce jwt scopes"
BryanFauble Oct 23, 2024
e99fae8
Swap to audience filtering over scope filtering for now
BryanFauble Oct 23, 2024
cb1ae33
Correct variable reference
BryanFauble Oct 23, 2024
7df3de0
Output to sensitive
BryanFauble Oct 23, 2024
7d037bd
Remove outputs
BryanFauble Oct 23, 2024
c1bee35
Correct mistake
BryanFauble Oct 23, 2024
a083e0f
Update policy name
BryanFauble Oct 24, 2024
7133f17
Correct node lengths
BryanFauble Oct 18, 2024
3011207
Correct indexing
BryanFauble Oct 18, 2024
ff180d6
Merge branch 'signoz-testing' into ibcdpe-1095-cluster-ingress-signoz
BryanFauble Oct 28, 2024
64035b4
Point to updated auth0 domain
BryanFauble Oct 29, 2024
547bc9e
Updated auth0 jwks uri
BryanFauble Oct 29, 2024
0cfe041
Add notes to readme
BryanFauble Oct 29, 2024
3e31d87
Remove auth0 from versions
BryanFauble Oct 29, 2024
957109e
Move token lifecycle to 1.25 days from 1 day
BryanFauble Nov 5, 2024
dfc12c0
[IBCDPE-1111] upgrade airflow (#42)
BryanFauble Oct 31, 2024
4161baa
Point to updated config value for airflow regex deserialization (#43)
BryanFauble Oct 31, 2024
903bc50
Signoz alert manager setup for SMTP (#41)
BryanFauble Nov 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 52 additions & 8 deletions deployments/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,46 @@ module "dpe-sandbox-spacelift-development" {
k8s_stack_deployments_name = "DPE DEV Kubernetes Deployments"
k8s_stack_deployments_project_root = "deployments/stacks/dpe-k8s-deployments"

auth0_stack_name = "DPE DEV Auth0"
auth0_stack_project_root = "deployments/stacks/dpe-auth0"
auth0_domain = "dev-57n3awu5je6q653y.us.auth0.com"
auth0_clients = [
{
name = "bfauble - automation"
description = "App for testing signoz"
app_type = "non_interactive"
},
{
name = "schematic - Github Actions"
description = "Client for Github Actions to export telemetry data"
app_type = "non_interactive"
},
{
name = "schematic - Dev"
description = "Client for schematic deployed to AWS DEV to export telemetry data"
app_type = "non_interactive"
},
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using the Auth0 free tier. Since this is working for our purposes I will be filling out a request to get a paid license for the service. That way we will be able to apply this to prod too.

aws_account_id = "631692904429"
region = "us-east-1"

cluster_name = "dpe-k8-sandbox"
vpc_name = "dpe-sandbox"

vpc_cidr_block = "10.51.0.0/16"
public_subnet_cidrs = ["10.51.1.0/24", "10.51.2.0/24"]
private_subnet_cidrs = ["10.51.4.0/24", "10.51.5.0/24"]
azs = ["us-east-1a", "us-east-1b"]
vpc_cidr_block = "10.52.16.0/20"
# A public subnet is required for each AZ in which the worker nodes are deployed
public_subnet_cidrs = ["10.52.16.0/24", "10.52.17.0/24", "10.52.19.0/24"]
private_subnet_cidrs_eks_control_plane = ["10.52.18.0/28", "10.52.18.16/28"]
azs_eks_control_plane = ["us-east-1a", "us-east-1b"]

private_subnet_cidrs_eks_worker_nodes = ["10.52.28.0/22", "10.52.24.0/22", "10.52.20.0/22"]
azs_eks_worker_nodes = ["us-east-1c", "us-east-1b", "us-east-1a"]

enable_cluster_ingress = true
enable_otel_ingress = true
ssl_hostname = "a09a38cc5a8d6497ea69c6bf6318701b-1974793757.us-east-1.elb.amazonaws.com"
auth0_jwks_uri = "https://dev-57n3awu5je6q653y.us.auth0.com/.well-known/jwks.json"
}

module "dpe-sandbox-spacelift-production" {
Expand All @@ -61,14 +91,28 @@ module "dpe-sandbox-spacelift-production" {
k8s_stack_deployments_name = "DPE Kubernetes Deployments"
k8s_stack_deployments_project_root = "deployments/stacks/dpe-k8s-deployments"

auth0_stack_name = "DPE Auth0"
auth0_stack_project_root = "deployments/stacks/dpe-auth0"
auth0_domain = ""
auth0_clients = []

aws_account_id = "766808016710"
region = "us-east-1"

cluster_name = "dpe-k8"
vpc_name = "dpe-k8"

vpc_cidr_block = "10.52.0.0/16"
public_subnet_cidrs = ["10.52.1.0/24", "10.52.2.0/24"]
private_subnet_cidrs = ["10.52.4.0/24", "10.52.5.0/24"]
azs = ["us-east-1a", "us-east-1b"]
vpc_cidr_block = "10.52.0.0/20"
# A public subnet is required for each AZ in which the worker nodes are deployed
public_subnet_cidrs = ["10.52.0.0/24", "10.52.1.0/24", "10.52.3.0/24"]
private_subnet_cidrs_eks_control_plane = ["10.52.2.0/28", "10.52.2.16/28"]
azs_eks_control_plane = ["us-east-1a", "us-east-1b"]

private_subnet_cidrs_eks_worker_nodes = ["10.52.12.0/22", "10.52.8.0/22", "10.52.4.0/22"]
azs_eks_worker_nodes = ["us-east-1c", "us-east-1b", "us-east-1a"]

enable_cluster_ingress = false
enable_otel_ingress = false
ssl_hostname = ""
auth0_jwks_uri = ""
}
92 changes: 72 additions & 20 deletions deployments/spacelift/dpe-k8s/main.tf
Original file line number Diff line number Diff line change
@@ -1,32 +1,44 @@
locals {
k8s_stack_environment_variables = {
aws_account_id = var.aws_account_id
region = var.region
pod_security_group_enforcing_mode = var.pod_security_group_enforcing_mode
cluster_name = var.cluster_name
vpc_name = var.vpc_name
vpc_cidr_block = var.vpc_cidr_block
public_subnet_cidrs = var.public_subnet_cidrs
private_subnet_cidrs = var.private_subnet_cidrs
azs = var.azs
aws_account_id = var.aws_account_id
region = var.region
pod_security_group_enforcing_mode = var.pod_security_group_enforcing_mode
cluster_name = var.cluster_name
vpc_name = var.vpc_name
vpc_cidr_block = var.vpc_cidr_block
public_subnet_cidrs = var.public_subnet_cidrs
private_subnet_cidrs_eks_control_plane = var.private_subnet_cidrs_eks_control_plane
private_subnet_cidrs_eks_worker_nodes = var.private_subnet_cidrs_eks_worker_nodes
azs_eks_control_plane = var.azs_eks_control_plane
azs_eks_worker_nodes = var.azs_eks_worker_nodes
}

k8s_stack_deployments_variables = {
spotinst_account = var.spotinst_account
vpc_cidr_block = var.vpc_cidr_block
cluster_name = var.cluster_name
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_branch
aws_account_id = var.aws_account_id
spotinst_account = var.spotinst_account
vpc_cidr_block = var.vpc_cidr_block
cluster_name = var.cluster_name
auto_deploy = var.auto_deploy
auto_prune = var.auto_prune
git_revision = var.git_branch
aws_account_id = var.aws_account_id
enable_cluster_ingress = var.enable_cluster_ingress
enable_otel_ingress = var.enable_otel_ingress
ssl_hostname = var.ssl_hostname
auth0_jwks_uri = var.auth0_jwks_uri
}

auth0_stack_variables = {
cluster_name = var.cluster_name
auth0_domain = var.auth0_domain
auth0_clients = var.auth0_clients
}

# Variables to be passed from the k8s stack to the deployments stack
k8s_stack_to_deployment_variables = {
vpc_id = "TF_VAR_vpc_id"
private_subnet_ids = "TF_VAR_private_subnet_ids"
node_security_group_id = "TF_VAR_node_security_group_id"
pod_to_node_dns_sg_id = "TF_VAR_pod_to_node_dns_sg_id"
vpc_id = "TF_VAR_vpc_id"
private_subnet_ids_eks_worker_nodes = "TF_VAR_private_subnet_ids_eks_worker_nodes"
node_security_group_id = "TF_VAR_node_security_group_id"
pod_to_node_dns_sg_id = "TF_VAR_pod_to_node_dns_sg_id"
}
}

Expand Down Expand Up @@ -199,3 +211,43 @@ resource "spacelift_aws_integration_attachment" "k8s-deployments-aws-integration
read = true
write = true
}


resource "spacelift_stack" "auth0" {
github_enterprise {
namespace = "Sage-Bionetworks-Workflows"
id = "sage-bionetworks-workflows-gh"
}

depends_on = [
spacelift_space.dpe-space
]

administrative = false
autodeploy = var.auto_deploy
branch = var.git_branch
description = "Stack used to create and manage Auth0 for authentication"
name = var.auth0_stack_name
project_root = var.auth0_stack_project_root
repository = "eks-stack"
terraform_version = var.opentofu_version
terraform_workflow_tool = "OPEN_TOFU"
space_id = spacelift_space.dpe-space.id
additional_project_globs = [
"deployments/"
]
}

resource "spacelift_stack_destructor" "auth0-stack-destructor" {
stack_id = spacelift_stack.auth0.id
}


resource "spacelift_environment_variable" "auth0-stack-environment-variables" {
for_each = local.auth0_stack_variables

stack_id = spacelift_stack.auth0.id
name = "TF_VAR_${each.key}"
value = try(tostring(each.value), jsonencode(each.value))
write_only = false
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a new stack within Spacelift that is responsible for handling Auth0 stuff only; It lowers the concerns each stack is responsible for.

62 changes: 58 additions & 4 deletions deployments/spacelift/dpe-k8s/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -118,12 +118,66 @@ variable "public_subnet_cidrs" {
description = "Public Subnet CIDR values"
}

variable "private_subnet_cidrs" {
variable "private_subnet_cidrs_eks_control_plane" {
type = list(string)
description = "Private Subnet CIDR values"
description = "Private Subnet CIDR values for the EKS control plane"
}

variable "azs" {
variable "private_subnet_cidrs_eks_worker_nodes" {
type = list(string)
description = "Availability Zones"
description = "Private Subnet CIDR values for the EKS worker nodes"
}

variable "azs_eks_control_plane" {
type = list(string)
description = "Availability Zones for the EKS control plane"
}

variable "azs_eks_worker_nodes" {
type = list(string)
description = "Availability Zones for the EKS worker nodes"
}

variable "enable_cluster_ingress" {
description = "Enable cluster ingress"
type = bool
}

variable "enable_otel_ingress" {
description = "Enable OpenTelemetry ingress, used to send traces to SigNoz"
type = bool
}

variable "ssl_hostname" {
description = "The hostname to use for the SSL certificate"
type = string
}

variable "auth0_jwks_uri" {
description = "The JWKS URI for Auth0"
type = string
}

variable "auth0_stack_name" {
description = "Name of the auth0 stack"
type = string
}

variable "auth0_stack_project_root" {
description = "Project root of the auth0 stack"
type = string
}

variable "auth0_domain" {
description = "Auth0 domain"
type = string
}

variable "auth0_clients" {
description = "List of clients to create in Auth0."
type = list(object({
name = string
description = string
app_type = string
}))
}
43 changes: 43 additions & 0 deletions deployments/stacks/dpe-auth0/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Used to create the Auth0 resources for the DPE stack
resource "auth0_resource_server" "k8s-cluster-telemetry" {
name = "${var.cluster_name}-telemetry"
identifier = "${var.cluster_name}-telemetry"
signing_alg = "RS256"

allow_offline_access = false
token_lifetime = 86400
skip_consent_for_verifiable_first_party_clients = true
# https://registry.terraform.io/providers/auth0/auth0/latest/docs/resources/resource_server_scopes
# Says to use the following, however it errors out:
# This object has no argument, nested block, or exported attribute named "scopes".
# lifecycle {
# ignore_changes = [scopes]
# }
}

resource "auth0_client" "oauth2_clients" {
for_each = { for client in var.auth0_clients : client.name => client }

name = each.value.name
description = each.value.description
app_type = each.value.app_type

jwt_configuration {
alg = "RS256"
}
}

resource "auth0_client_credentials" "client_secrets" {
for_each = { for client in auth0_client.oauth2_clients : client.name => client }

client_id = auth0_client.oauth2_clients[each.key].id
authentication_method = "client_secret_post"
}

resource "auth0_client_grant" "access_to_k8s_cluster" {
for_each = { for client in var.auth0_clients : client.name => client }

client_id = auth0_client.oauth2_clients[each.key].id
audience = auth0_resource_server.k8s-cluster-telemetry.identifier
scopes = []
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a few items in Auth0.

an API that each client is granted access to:
image

The applications where each has a unique client ID and client secret. These will be added to the machines where data is going to be exported out of in order to implement the client credential exchange:
image

10 changes: 10 additions & 0 deletions deployments/stacks/dpe-auth0/provider.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Requires manually setting id and secret in the stack environment variables in the Spacelift UI
# These come from auth0 > Applications > Applications > API Explorer Application > Settings
# TF_VAR_auth0_client_id
# TF_VAR_auth0_client_secret
# TF_VAR_auth0_domain
provider "auth0" {
domain = var.auth0_domain
client_id = var.auth0_client_id
client_secret = var.auth0_client_secret
}
28 changes: 28 additions & 0 deletions deployments/stacks/dpe-auth0/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
variable "cluster_name" {
description = "EKS cluster name"
type = string
}

variable "auth0_domain" {
description = "Auth0 domain"
type = string
}

variable "auth0_client_id" {
description = "Auth0 client ID"
type = string
}

variable "auth0_client_secret" {
description = "Auth0 client secret"
type = string
}

variable "auth0_clients" {
description = "List of clients to create in Auth0."
type = list(object({
name = string
description = string
app_type = string
}))
}
8 changes: 8 additions & 0 deletions deployments/stacks/dpe-auth0/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
terraform {
required_providers {
auth0 = {
source = "auth0/auth0"
version = "1.7.1"
}
}
}
Loading