Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IBCDPE-938] Deploy Signoz (OTEL Visualization) to kubernetes cluster #35
[IBCDPE-938] Deploy Signoz (OTEL Visualization) to kubernetes cluster #35
Changes from 44 commits
0f93061
73bb182
4e3c472
6b457dd
5b694bb
2f0636a
a002873
b3768c5
d525af3
81a3adc
8c89bc6
23ef64e
ef8bd3f
502276b
7a864e7
c0e2a4c
6bf8506
8b8be2e
4bf72d9
4983fb2
81f24ef
e4f3fbb
75a3063
875010b
41e9147
8814411
0eaafed
6c82806
74c83f3
d4fae1f
1dfa08a
8dfedba
be902b3
ac6a1e3
bf53696
19190c1
fd8fe4f
2f6bae7
5ed270d
a61bdd0
bec8d9d
5b0aa64
aefa2e1
58b2de4
e26e7e4
a501f32
f284709
5aa954b
eebfca1
835de37
e8f989f
5ac8424
5947065
e457ef1
139dd6a
f3f7647
1dab275
63a54ad
d4c79d7
204b2ff
34e27cb
373b800
1b6170e
3482837
5c5654f
f314bde
d4bf895
ae8eacb
fc53860
a295e48
1db7b42
7f652ec
24bc617
b8509df
fe1e37b
4a4da49
5774102
77f2c22
767ac82
908201b
afb6f6e
2cfcaee
436908f
501b1d3
74f33bf
dbe7f70
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am leaving this on a 2 AZ deployment (Since it cannot be changed after the k8s cluster is created). I did set the single AZ part of
deployments/stacks/dpe-k8s-deployments/main.tf
to false so that the spot io cluster autoscaler can deploy EC2 instances to either private subnetThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an issue for Airflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, This should work in conjunction with this change to limit the deployment to nodes in a specific zone, while still allowing nodes to spin up in either zone:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also created https://sagebionetworks.jira.com/browse/IBCDPE-1097 as a follow-up here: Move worker node subnet off EKS cluster subnets
From these docs: https://aws.github.io/aws-eks-best-practices/networking/subnets/#vpc-configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am liking the usage of this relative source import, especially for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the changes in this values.yaml file to update replicas from 2 to 1. Running these services with on-demand instances will get us towards the stability we wanted. Running these workloads on spot instances was not the right decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This node selector defined how the spot io cluster autoscaler requests EC2 instances to spin up for the cluster. This makes sure that an on-demand instance is created and that the pods are running on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for later we can think about other spot setup configuraitons that would make Airflow more resilient so we actually gain the benefit of the spot instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For airflow I think we can take advantage of this if we move over to kubernetes workers instead of celery works. The reason is that the individual tasks can run on spot instances, but everything to kick off those tasks don't do well to be interrupted on spot instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I was testing some helm charts that use the
oci://
prefix, this was needed.