-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race when creating azure-vnet lock directory with permissions #2818
Comments
securityContext:
capabilities:
drop:
- ALL
add:
- NET_ADMIN # only necessary for delegated IPAM/Cilium
- NET_RAW # only necessary for delegated IPAM/Cilium and make sure that we have a test that's rebooting the Node and verifying CNS functionality afterwards |
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods sounds like it would let the k8s chmod the directory when the CNS Pod mounts it as a Volume? |
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
Node Rebooting Test added in: #2901 |
Backport PR can be created for |
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
@QxBytes looks like this was fully backported, is it good to resolve? |
#2901 did not get backported to azure-container-networking/.pipelines/singletenancy/aks/e2e-step-template.yaml Lines 62 to 75 in 73a0919
|
This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days |
Issue closed due to inactivity. |
What happened:
On the first boot, no CNI binary is on the node, and so k8s creates the /var/run/azure-vnet directory with 0755 permissions automatically because it is a mount part of the azure-cns daemonset. Then the CNI is deployed.
The /var/run directory is not preserved between reboots.
Then, when the VM reboots, the CNI binary may run before k8s creates the /var/run/azure-vnet directory. When the CNI binary runs first, it creates the directory with 0644 permissions. This causes permission denied errors for the cns. Even if k8s creates/mounts the /var/run/azure-vnet directory later, it will see it already exists and won't recreate the directory with the 0755 permissions.
What you expected to happen:
The CNI binary should create the directory with 0755 permissions.
How to reproduce it:
Reboot the VM with the cns capabilities security context dropping all capabilities (so it doesn't bypass permission checks). There is a chance that the azure-cns pod will get stuck in crash loop backoff.
Orchestrator and Version (e.g. Kubernetes, Docker):
Operating System (Linux/Windows):
Kernel (e.g.
uanme -a
for Linux or$(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion
for Windows):Anything else we need to know?:
[Miscellaneous information that will assist in solving the issue.]
The text was updated successfully, but these errors were encountered: