You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are deploying applications in AKS clusters from artifacts stored in ghcr.io
In Azure, we are configuring a private AKS cluster, limiting access to internet through a firewall. To retrieve images, we deployed an ACR (Azure Container Registry), and we configure the ACR cache to cache images from ghcr.io. In addition, we use RBAC controls, and the agentpool managed identity has ACRPull and Reader roles on the ACR.
For instance, our image is ghcr.io/company/images/controller:1, our acr is acr1.azurecr.io, and we use image: ac1.azurecr.io/company/images/controller:1.
The ACR cache rule is company/images/* -> ghcr.io/company/images/*
This works as expected, images is pulled from ghcr into the acr when requested, and our pod starts.
To further simplify deployment, we want to keep the image reference in the deployments as ghcr.io/company/images/controller:1 instead of the local ACR name. To do that, we configured containerd with a registry mirror as described in this issue's comment.
The hosts.toml for domain ghcr.io is configured as follow:
server = "https://ghcr.io"
[host."https://acr1.azurecr.io"]
capabilities = ["pull", "resolve"]
However, when doing so, the image pull fails, and apparently, the node tries indeed to pull the image from acr1, but anonymously, without using the managed identity of the agentpool, with the following error in the pod description:
Warning Failed 3s kubelet Failed to pull image "ghcr.io/hqy01/jeep/images/busybox:latest": failed to pull and unpack image "ghcr.io/hqy01/jeep/images/busybox:latest": failed to resolve reference "ghcr.io/hqy01/jeep/images/busybox:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://pue1dev7831pplt08acr |
│ 0001.azurecr.io/oauth2/token?scope=repository%3Ahqy01%2Fjeep%2Fimages%2Fbusybox%3Apull&service=pue1dev7831pplt08acr0001.azurecr.io: 401 Unauthorized
(I did paste the error as I had it, without simplifying names as I did in the explanation)
What you expected to happen:
As pulling the image directly from acr1.azurecr.io is working, and as we configure containerd to rewrite images ghcr.io to the acr, we expect the same behavior, and the node being able to pull the image from acr1.
How to reproduce it (as minimally and precisely as possible):
create an AKS cluster, with managed identity enabled, and no access to ghcr
create and ACR, and a cache rule to pull images from ghcr
grant reader and ACRPull to the AKS agentpool on the ACR
deploy the containerd registry mirror
create a pod that uses the ghcr image reference
Anything else we need to know?:
I am trying to reproduce the issue in an environment with less dependencies and less complexity. I didn't manage so far, will try when times permits.
Environment:
Kubernetes version: v1.29.8
Cloud provider or hardware configuration: AKS cluster in Azure
OS (e.g: cat /etc/os-release): AKSUbuntu/images/2204gen2containerd/versions/202409.23.0
Network plugin and version (if this is a network-related bug): Azure CNI Pod Subnet
The text was updated successfully, but these errors were encountered:
What happened:
We are deploying applications in AKS clusters from artifacts stored in ghcr.io
In Azure, we are configuring a private AKS cluster, limiting access to internet through a firewall. To retrieve images, we deployed an ACR (Azure Container Registry), and we configure the ACR cache to cache images from ghcr.io. In addition, we use RBAC controls, and the agentpool managed identity has
ACRPull
andReader
roles on the ACR.For instance, our image is
ghcr.io/company/images/controller:1
, our acr isacr1.azurecr.io
, and we useimage: ac1.azurecr.io/company/images/controller:1
.The ACR cache rule is
company/images/*
->ghcr.io/company/images/*
This works as expected, images is pulled from ghcr into the acr when requested, and our pod starts.
To further simplify deployment, we want to keep the image reference in the deployments as
ghcr.io/company/images/controller:1
instead of the local ACR name. To do that, we configured containerd with a registry mirror as described in this issue's comment.The
hosts.toml
for domainghcr.io
is configured as follow:However, when doing so, the image pull fails, and apparently, the node tries indeed to pull the image from acr1, but anonymously, without using the managed identity of the agentpool, with the following error in the pod description:
(I did paste the error as I had it, without simplifying names as I did in the explanation)
What you expected to happen:
As pulling the image directly from
acr1.azurecr.io
is working, and as we configure containerd to rewrite imagesghcr.io
to the acr, we expect the same behavior, and the node being able to pull the image from acr1.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
I am trying to reproduce the issue in an environment with less dependencies and less complexity. I didn't manage so far, will try when times permits.
Environment:
cat /etc/os-release
): AKSUbuntu/images/2204gen2containerd/versions/202409.23.0The text was updated successfully, but these errors were encountered: