Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IBCDPE-1095] Secure cluster ingress for telemetry data #40

Merged
merged 82 commits into from
Nov 5, 2024

Conversation

BryanFauble
Copy link
Contributor

@BryanFauble BryanFauble commented Oct 22, 2024

Problem:

  1. Previously in order to ingest data into the otel collector running in the EKS cluster you either needed to be running in the cluster, or, port-forward to the service running the otel collectors.

Solution:

  1. Adopting the usage of envoy gateway (https://gateway.envoyproxy.io/docs/concepts/concepts_overview/) to create an ingress for the otel collector.
  2. Using Auth0 for JWT verification and requirement to submit data to the endpoint.
  3. Adopting the usage of the Auth0 terraform provider to handle automatic configuration of Auth0 resources: https://registry.terraform.io/providers/auth0/auth0/latest/docs

Testing:

  1. I verified that by setting a few environment variables in SchematicAPI that I could export telemetry data with an auth header, over HTTPS/TLS 1.3:
    image
  2. I also verified that sending data without a JWT returns an expected HTTP 401 error
  3. I verified that sending data with a valid JWT, but an invalid audience returns an expected HTTP 403 error.

TODO:

@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/dpe-dev-kubernetes-deployments October 22, 2024 17:32 Inactive
Comment on lines 1 to 43
# Used to create the Auth0 resources for the DPE stack
resource "auth0_resource_server" "k8s-cluster-telemetry" {
name = "${var.cluster_name}-telemetry"
identifier = "${var.cluster_name}-telemetry"
signing_alg = "RS256"

allow_offline_access = false
token_lifetime = 86400
skip_consent_for_verifiable_first_party_clients = true
# https://registry.terraform.io/providers/auth0/auth0/latest/docs/resources/resource_server_scopes
# Says to use the following, however it errors out:
# This object has no argument, nested block, or exported attribute named "scopes".
# lifecycle {
# ignore_changes = [scopes]
# }
}

resource "auth0_client" "oauth2_clients" {
for_each = { for client in var.auth0_clients : client.name => client }

name = each.value.name
description = each.value.description
app_type = each.value.app_type

jwt_configuration {
alg = "RS256"
}
}

resource "auth0_client_credentials" "client_secrets" {
for_each = { for client in auth0_client.oauth2_clients : client.name => client }

client_id = auth0_client.oauth2_clients[each.key].id
authentication_method = "client_secret_post"
}

resource "auth0_client_grant" "access_to_k8s_cluster" {
for_each = { for client in var.auth0_clients : client.name => client }

client_id = auth0_client.oauth2_clients[each.key].id
audience = auth0_resource_server.k8s-cluster-telemetry.identifier
scopes = []
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a few items in Auth0.

an API that each client is granted access to:
image

The applications where each has a unique client ID and client secret. These will be added to the machines where data is going to be exported out of in order to implement the client credential exchange:
image

Comment on lines 105 to 109
cluster_issuer_name = "selfsigned"
# To determine more elegant ways to fill in these values, for example, if we have
# a pre-defined DNS name for the cluster (https://sagebionetworks.jira.com/browse/IT-3931)
ssl_hostname = var.ssl_hostname
auth0_jwks_uri = var.auth0_jwks_uri
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on https://sagebionetworks.jira.com/browse/IT-3931 to move away from selfsigned certs

@BryanFauble BryanFauble requested a review from zaro0508 October 23, 2024 19:42
@BryanFauble BryanFauble marked this pull request as ready for review October 23, 2024 19:44
@BryanFauble BryanFauble requested a review from a team as a code owner October 23, 2024 19:44
Base automatically changed from ibcdpe-1097-shrink-vpc-subnets to signoz-testing October 25, 2024 16:26
Comment on lines 40 to 74
vpc_cidr_block = "10.52.16.0/20"
public_subnet_cidrs = ["10.52.16.0/24", "10.52.17.0/24"]
vpc_cidr_block = "10.52.16.0/20"
# A public subnet is required for each AZ in which the worker nodes are deployed
public_subnet_cidrs = ["10.52.16.0/24", "10.52.17.0/24", "10.52.19.0/24"]
private_subnet_cidrs_eks_control_plane = ["10.52.18.0/28", "10.52.18.16/28"]
azs_eks_control_plane = ["us-east-1a", "us-east-1b"]

private_subnet_cidrs_eks_worker_nodes = ["10.52.20.0/22", "10.52.24.0/22", "10.52.28.0/22"]
azs_eks_worker_nodes = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnet_cidrs_eks_worker_nodes = ["10.52.28.0/22", "10.52.24.0/22", "10.52.20.0/22"]
azs_eks_worker_nodes = ["us-east-1c", "us-east-1b", "us-east-1a"]

enable_cluster_ingress = true
enable_otel_ingress = true
ssl_hostname = "a09a38cc5a8d6497ea69c6bf6318701b-1974793757.us-east-1.elb.amazonaws.com"
auth0_jwks_uri = "https://dev-57n3awu5je6q653y.us.auth0.com/.well-known/jwks.json"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The juggling of this is an unfortunate side affect of how the VPC module creates subnets in specific AZs. By doing this it allows us to:

  1. Create 3 public subnets in each AZ
  2. Create 2 private subnets in a, and b for the eks control plane
  3. Create 3 private subnets in each AZ for the private worker nodes.

Why this is needed:

  1. In order for the AWS load balancer to forward traffic to a private subnet, there needs to be a public subnet in that AZ.
  2. Worker nodes can be deployed to all 3 AZs, and because of this when a pod was scheduled into the c subnet, the load balancer could not forward traffic to that instance.

aws_account_id = "631692904429"
region = "us-east-1"

cluster_name = "dpe-k8-sandbox"
vpc_name = "dpe-sandbox"

vpc_cidr_block = "10.52.16.0/20"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment keeps getting deleted, here is a link explaining why this change was made: #40 (comment)

@BryanFauble BryanFauble merged commit 5c5654f into signoz-testing Nov 5, 2024
2 of 10 checks passed
@BryanFauble BryanFauble deleted the ibcdpe-1095-cluster-ingress-signoz branch November 5, 2024 20:38
@BryanFauble
Copy link
Contributor Author

Merging this PR to collapse this mess of PR dependencies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant