Computing 930 2021 Tracks

Goals

Burst scheduling support
- QPS 40 per 10K cluster
- 1TP/1RP QPS >= 40
- 50K cluster QPS 200: 5TP/5RP, 3TP/4RP possible < 200
Minimal management cost for 50K cluster
- Number of TP <= 5, number of RP <= 5
Service Support in scale out cluster
Daemonset handling in RP (on hold in favor of service support)
- System tenant only
- Multi-tenancy daemonset out of scope
System partition pod handling raw design

Current status

Release 0.8

5TP/5RP 50K cluster 20 QPS, pod start up latency <= 6s (p99)

Current Work in Progress (9/16)

1. Burst scheduling support & minimal management cost for 50K cluster

a. 1TP/1RP maximal nodes - 1x30K

Date	Cluster Size	QPS	p50(s)	p90(s)	p99(s)	Changes	Note
8/26	1x25K	100/5	1.82	2.65	5.00	Reduced list/watch pods in perf test
9/01	1x25K	100/5	1.82	2.65	4.88	Using event receiving time as watch time in perf test
9/08	1x25K	100/5	1.81	2.62	4.51	Index pod by label selector in perf test
9/14	1x25K	150/25	1.82	2.66	5.53	* Increased cache size for 25K cluster (previous cachesize is for 10K cluster) * Send bookmark event to client * Saturation pod QPS 100 -> 150, Latency pod QPS 5->25
9/15	1x30K	150/30	1.83	2.76	6.96	* Cachesize increased to 30K cluster * Latency pod QPS 30
9/16	1x35K	150/35	1.87	2.89	8.63	* Cachesize increased to 35K cluster * Latency pod QPS 35	* KCM 500 error

b. 2TP/2RP 2x25K=50K node

Date	Cluster Size	QPS	p50(s)	p90(s)	p99(s)	Changes
8/17	2x25K	2x100	1.87 1.87	2.74 2.73	6.59 6.29	Could be incorrect as misconfiguration caused watchers number are much less than usually (using as a reference)
9/02	2x25K	2x100	1.87 1.87	2.90 2.84	9.14 8.27	Same as 9/1 1x25K cluster run
9/09	2x25K	2x100	1.83 1.85	2.81 2.80	9.89 9.01	9/8 + increased cache size for pod to accomodate 25K cluster
9/13	2x25K	2x100	1.79 1.79	2.54 2.54	2.99 2.96	9/9 + send bookmark event to client

c. 930 release test plan

Scale up/scale out/1TP1RP/2TP2RP 50K
Density/Load
Cluster QPS/Latency pod QPS

d. 50K cluster tp99 improvement thoughts

Reduce # of secret watcher - Yunwen (TBD)

2. Service support in arktos - Hongwei/Carl

Implemented components, need to enable & verify (WIP)
1. kubernetes service entries: have to be network specific, instead of kubernetes global (done)
2. kube-dns (in kube-system namespace) service entries: have to be network specific; each network should have its own deployment (done)
3. Make flannel working in arktos
  1. Scale up (done)
  2. Scale out (WIP)
4. Start dns pods in arktos
  1. Scale up (done)
  2. Scale out (WIP)
5. Arktos network controller: whenever a teannt is created, the default network object should be created automatically, plus its kubernetes and kube-dns service entry; for flat type network, it should also take care of kube-dns deployment.
kubelet: when initializing pod sandbox, should provision /etc/resolv.conf with proper kube-dns_{network} service IP
Make Kube proxy aware multi-tenancy
Simple on/off feature gate(s)
Containerize network controller
Data entry in Prometheus (and solve previous 404 issue)

3. Allow new node join RP - manually set up full arktos scale out cluster - Carl

Add new node into scale up cluster, update manual - DONE
Add new node into scale out cluster, update manual - TBD

4. Security alert (TBD)

Issue 1126 - github dependabot alerts
1. github.com/gorilla/websocket to v1.4.1: code/PR ready: https://github.com/CentaurusInfra/arktos/pull/1127 - (DONE - Sonya)
2. containerd to v1.4.8: As dependency, k8s.io/utils upgrade is necessary; tracked by issue https://github.com/CentaurusInfra/arktos/issues/924
3. runc to v1.0.0-rc95: As dependency, k8s.io/utils upgrade is necessary; tracked by issue https://github.com/CentaurusInfra/arktos/issues/924

5. Scalability improvement thoughts

Reduce perf test duration
1. Increase QPS for latency pod creation - perhaps in cluster loader (saturation 20->100, latency 5->25?)
Fine tuning
1. Evaluate all list requests from clients that go to ETCD directly (on demand)

Completed Task

Burst scheduling support
1. 1TP/1RP 10K cluster 100 QPS - 6/21, pod start up latency <= 3s (p99)
2. 1TP/1PR 15K cluster 100 QPS - 7/26, pod start up latency p50 1.8s, p90 2.6s, p99 4.2s
3. 1TP/1PR 20K cluster 100 QPS - 7/29, pod start up latency p50 1.8s, p90 2.7s, p99 5.5s
4. 1TP/1RP 25K cluster 100 QPS - 8/26, pod start up latency p50 1.8s, p90 2.7s, p99 5.0s
5. 1.18.5 15K cluster 100 QPS (1 API server, 1 ETCD) - 7/1, pod start up latency 1.41s, 2.17s, 5.25s
  1. Diff between pod_start_up and run_to_watch is whole seconds: 0s (9192), 1s(4961), 2s(449), 3s(211), 4s, 5s(34)
6. 1.21 15K cluster 100 QPS - 7/15, pod start up latency 1.54s, 2.6s1, 5.65s
7. 1.21 20K cluster 100 QPS - 7/16. Scheduler restarted multiple times due to leader election lost
8. 1.21 20K cluster 100 QPS - 7/22. p50 1.7s, p90 3.3s, p99 9.3s, saturation latency bad (p50 926s)
9. Set up Promethus for k8s 1.18.5 & 1.21 (7/12)
10. Identify 1.18 perf improvement changes
  1. Node controller has expensive list pods, switched to watch PR 1129, PR 1151 (Issue 77733)
  2. Reduce cachesize for event in apiserver (https://github.com/kubernetes/kubernetes/pull/96117)
11. Arktos perf change
  1. Reduce kubelet getting node PR 835
  2. Increased watch timeout from 5min mean to 30 min mean (reverted - watch cannot be longer than 10 min)
  3. Reduce list pods from perf test PR 1163
  4. Add indexer to perf test PR 1169
  5. Increase pod cache size - YingH PR 1175
  6. Send fake bookmark event to client to reduce size of initEvents - YingH PR 1179
Bug fix
1. Fix user agent of event client - PR 1120 https://github.com/CentaurusInfra/arktos/pull/1120
Minimal management cost for 50K cluster
1. Start TP in parallel, start RP in parallel - Done 7/8 PR 1113
Daemonset handling in RP
1. Design - Hongwei (Done 7/6)
Reduce perf test duration
1. Skip garbage collection step - (Done 8/23)

Parking Tasks

Promethus support for k8s perf test
1. Automatically preserve historical promethus data
2. Periodically pulling profiling data automatically
API server performance: log, code analysis
1. Kubelet container died - Yunwen
2. Pod creation event diff in 1.18&arktos audit log - YingH (Parking)
  1. 1.18.5 behave the same as arktos in local cluster up
  2. 1.18.5 uses v1 for event in kube up, no audit log
  3. Arktos use v1beta1 for event in kube up (same as local cluster up)
Scan all K8s performance improvment - Carl - still necessary?
1. Current focus on watch improvement
Daemonset handling
1. Implementation - TBD
Scalability improvement thoughts
1. Utilization of audit log (post 930)
  1. Enable apiserver audit in local dev env
  2. Auto scan audit log, summarize all request type, resources, duration, etc. (Look for existing tools)
2. Enable api server request latency (post 930)
  1. Migrate to scalability metrics framework PR 980
3. Start cluster in parallel
  1. Start TP/RP in parallel - needs a lot of work, does not have significant improvement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computing 930 2021 Tracks

Goals

Current status

Release 0.8

Current Work in Progress (9/16)

1. Burst scheduling support & minimal management cost for 50K cluster

2. Service support in arktos - Hongwei/Carl

3. Allow new node join RP - manually set up full arktos scale out cluster - Carl

4. Security alert (TBD)

5. Scalability improvement thoughts

Completed Task

Parking Tasks

Clone this wiki locally