Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure CNI: timed out locking store #2817

Closed
behzad-mir opened this issue Jun 26, 2024 · 3 comments
Closed

Azure CNI: timed out locking store #2817

behzad-mir opened this issue Jun 26, 2024 · 3 comments
Assignees
Labels
cni Related to CNI. stale Stale due to inactivity.

Comments

@behzad-mir
Copy link
Contributor

When large scale of pods ( >150) will be created in parallel Azure CNI will fail with this error:
Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to setup network for sandbox "c99e647e8519b46b4c492d3076ea2da13e861a39ccb982aca89eb5ec5cafae55": plugin type="azure-vnet" failed (add): Failed to initialize key-value store of network plugin: error Acquiring store lock: timed out locking store" pod="default/k8-parallel-100-joblh8pf-gvn6s"

@behzad-mir
Copy link
Contributor Author

behzad-mir commented Jun 26, 2024

The issue is due to the serialized approach of Azure CNI during the pod creation. each CNI process will acquire lock at the beginning of the process and release it at the end and when CNI add calls take place in parallel in large numbers some of them will fail waiting behind the lock. The issue is seen more in Windows.

To address the issue a new CNI version called Statless CNI has been designed and implemented that enable paralle pod creation and removes the process locks.
#2276

The first target is for Windows AKS Swift Scenario and rollout has started for K8s 1.30

@behzad-mir behzad-mir self-assigned this Jun 26, 2024
@behzad-mir behzad-mir reopened this Jun 26, 2024
@behzad-mir behzad-mir added the cni Related to CNI. label Jun 26, 2024
@behzad-mir behzad-mir changed the title Azure CNN: timed out locking store Azure CNI: timed out locking store Jun 26, 2024
Copy link

This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Jul 13, 2024
Copy link

Issue closed due to inactivity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cni Related to CNI. stale Stale due to inactivity.
Projects
None yet
Development

No branches or pull requests

1 participant