Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race when creating azure-vnet lock directory with permissions #2818

Open
QxBytes opened this issue Jun 26, 2024 · 3 comments · Fixed by #2819
Open

Race when creating azure-vnet lock directory with permissions #2818

QxBytes opened this issue Jun 26, 2024 · 3 comments · Fixed by #2819
Assignees
Labels
bug cni Related to CNI. needs-backport Change needs to be backported to previous release trains

Comments

@QxBytes
Copy link
Contributor

QxBytes commented Jun 26, 2024

What happened:
On the first boot, no CNI binary is on the node, and so k8s creates the /var/run/azure-vnet directory with 0755 permissions automatically because it is a mount part of the azure-cns daemonset. Then the CNI is deployed.
The /var/run directory is not preserved between reboots.
Then, when the VM reboots, the CNI binary may run before k8s creates the /var/run/azure-vnet directory. When the CNI binary runs first, it creates the directory with 0644 permissions. This causes permission denied errors for the cns. Even if k8s creates/mounts the /var/run/azure-vnet directory later, it will see it already exists and won't recreate the directory with the 0755 permissions.

What you expected to happen:
The CNI binary should create the directory with 0755 permissions.

How to reproduce it:
Reboot the VM with the cns capabilities security context dropping all capabilities (so it doesn't bypass permission checks). There is a chance that the azure-cns pod will get stuck in crash loop backoff.

Orchestrator and Version (e.g. Kubernetes, Docker):

Operating System (Linux/Windows):

Kernel (e.g. uanme -a for Linux or $(Get-ItemProperty -Path "C:\windows\system32\hal.dll").VersionInfo.FileVersion for Windows):

Anything else we need to know?:
[Miscellaneous information that will assist in solving the issue.]

@QxBytes QxBytes added bug cni Related to CNI. labels Jun 26, 2024
@QxBytes QxBytes self-assigned this Jun 26, 2024
@QxBytes QxBytes added the needs-backport Change needs to be backported to previous release trains label Jun 26, 2024
@rbtr
Copy link
Contributor

rbtr commented Jun 27, 2024

  • @jpayne3506 can you update the CNS specs that we use in CI with
securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_ADMIN # only necessary for delegated IPAM/Cilium
      - NET_RAW # only necessary for delegated IPAM/Cilium

and make sure that we have a test that's rebooting the Node and verifying CNS functionality afterwards

@rbtr
Copy link
Contributor

rbtr commented Jun 27, 2024

https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#configure-volume-permission-and-ownership-change-policy-for-pods sounds like it would let the k8s chmod the directory when the CNS Pod mounts it as a Volume?

Copy link

This issue is stale because it has been open for 2 weeks with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale Stale due to inactivity. label Jul 17, 2024
@QxBytes QxBytes removed the stale Stale due to inactivity. label Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug cni Related to CNI. needs-backport Change needs to be backported to previous release trains
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants