Release v0.9
Pre-release
Pre-release
This release focuses on doubling the throughput of large Arktos scale-out cluster, minimizing management cost, as well as enabling service support.
Some highlights include:
- Arktos now supports 50, 000 nodes in a cluster with only two tenant partitions (TP) and two resource partitions (RP) and with similar pod start up latency. This significantly reduces the management cost of 50,000 node Arktos cluster.
- This release also doubles Arktos system throughput, thanks to many optimizations in API Server, Controller Manager, as well as in Kubelet. (Taking the 60% management cost reduction into consideration, each TP actually has 5 times system throughput with 2.5 times cluster size increase.)
- Service is now supported in Arktos scale-out and scale-up architecture. Customers now can create and deploy services, and associate pods with the service in Arktos.
- Pod start up latency and system throughput:
Release | v0.8 (June 2021) | v0.9 (September 2021) | |||
System Scalability (Cluster Size) | 50K (nodes in a cluster) | 50K | 25K | ||
System Architecture Partition (Cost) | 5 Tenant Partition (TP) & 5 Resource Partition (RP) | 5x5 | 2x2 | 1x1 | |
System Throughput (Combined QPS) | 100 QPS in Server / 25 QPS in Client | 200/50 | 200/50 | 100/25 | |
Latency/Performance (Pod Startup Latency in seconds) |
P50 | 1.8278 | 1.7879 | 1.8307 | 1.7987 |
P90 | 2.7846 | 2.5756 | 2.7759 | 2.6265 | |
P99 | 5.7178 | 3.7062 | 7.3256 | 4.9631 |
Features/Improvements/Bug fixes:
Service support:
- Scale-out cluster is able to use flannel cni
- Service support is enabled in local dev cluster by default
Scalability and performance tuning changes:
- Avoid GET node for each node PATCH in kubelet (PR 835)
- Refresh resource version with idle watchers upon watch session renewal (PR 1183)
- Reduce pod list requests in perf test (PR 1187)
- Cherry pick performance related community changes:
- Cherry pick perf test changes:
- Add channel for events to PodStartupLatency (PR 1187)
Perf test tool changes:
- Decouple proxy operation in kube-up and kubemark (PR 1105)
- Fix Prometheus config to include HAProxy metrics (PR 1103)
- Kubemark cluster starts partition servers in parallel (PR 1113)
- Support skipping pod deletion phase in perf test (PR 1159)
- Perf test config for large cluster (PR 1187)
Security fixes:
- Bump gorilla/websocket to v1.14.2 (PR 1127)
Bug fixes:
- Fix a bug that event client was created with wrong user agent (PR 1120)
- Set user agent for clients when talking to API server in another partition (PR 1125, 1186)
Others:
- Update ETCD to 3.4.3 (PR 1188)