Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IBCDPE-938] Deploy Signoz (OTEL Visualization) to kubernetes cluster #35

Merged
merged 86 commits into from
Nov 21, 2024

Conversation

BryanFauble
Copy link
Contributor

@BryanFauble BryanFauble commented Oct 1, 2024

Document describing the overall design of the solution and some included network diagrams at the bottom:
https://sagebionetworks.jira.com/wiki/spaces/DPE/pages/3351773274/IBCDPE-938+OpenTelemetry+Data+collection+Visualization

Problem:

  1. We did not have a way to view telemetry data produced by services like schematic
  2. We were deploying EC2 instances to a single AZ, public subnets were not deployed to all AZs.
  3. We did not have a north-south gateway to handle traffic into the cluster (From internet into cluster)
  4. We did not have AWS SES to send out emails
  5. We did not have any TLS set up for secure messaging into the cluster
  6. We did not have authentication mechanisms set up to prevent unauthenticated traffic from getting ingested into the cluster

Solution:

  1. Deploy the signoz helm chart: https://github.com/SigNoz/charts/tree/main/charts/signoz - Which include the clickhouse operator for data storage
  2. Availability zone: Updating AZ deployment to deploy worker nodes on 3 private subnets, create 3 public subnets, and deploy EKS control plane on 2 AZs
  3. envoy-gateway is added to the cluster to handle north-south traffic: https://gateway.envoyproxy.io/docs/ - This also gives us an easy way to secure the endpoints with auth filters/expose future services to the internet.
  4. cert-manager is added to the cluster to handle TLS/cert generation
  5. AWS SES module created to configure SES settings to allow SigNoz alert manager component to send out emails.
  6. Setting up Auth0 to allow clients to request an access token from Auth0 that can be verified at the gateway within the k8s cluster.

Testing:

  1. I have tested the deployment of signoz to the sandbox cluster in the DNT dev account
  2. I have tested the north-south traffic provided through envoy-gateway
  3. I have tested with auth0 that JWT verification works through the envoy-gateway
  4. I tested sending the data to a backend service in the k8s cluster to verify traffic makes it to the pod
  5. I tested and provisioned a cert from let's encrypt with cert-manager for HTTPS/TLS
  6. I tested that pods are started in on-demand instances when they have the nodeSelector set
  7. I tested AWS SES and have successfully requested production level access for the SES account in our sandbox cluster

Untitled

Follow up tickets:

@@ -66,7 +66,7 @@ spec:
sources:
- repoURL: 'https://airflow.apache.org'
chart: airflow
targetRevision: 1.11.0
targetRevision: 1.15.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not new changes (Git is showing them in the diff for some reason). Please ignore changes here.

@@ -25,6 +25,20 @@ fullnameOverride: ""
# Provide a name to substitute for the name of the chart
nameOverride: ""

# Use standard naming for all resources using airflow.fullname template
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not new changes (Git is showing them in the diff for some reason). Please ignore changes here.

Comment on lines +236 to +238
nodeSelector: {
spotinst.io/node-lifecycle: "od"
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not new changes (Git is showing them in the diff for some reason). Please ignore changes here.

@BryanFauble BryanFauble requested a review from zaro0508 November 5, 2024 21:01
modules/signoz/main.tf Outdated Show resolved Hide resolved
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/dpe-dev-kubernetes-infrastructure November 6, 2024 22:25 Inactive
@spacelift-int-sagebionetworks spacelift-int-sagebionetworks bot temporarily deployed to spacelift/root-spacelift-administrative-stack November 6, 2024 22:59 Inactive
@BryanFauble BryanFauble merged commit 66acc7b into main Nov 21, 2024
6 of 14 checks passed
@BryanFauble BryanFauble deleted the signoz-testing branch November 21, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants