Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to provision cluster #332

Closed
Tracked by #112
piotrmiskiewicz opened this issue Aug 8, 2024 · 16 comments
Closed
Tracked by #112

Unable to provision cluster #332

piotrmiskiewicz opened this issue Aug 8, 2024 · 16 comments
Assignees

Comments

@piotrmiskiewicz
Copy link
Member

Description

When the Runtime CR is created, the status is set to an error. Given Runtime:

apiVersion: infrastructuremanager.kyma-project.io/v1
kind: Runtime
metadata:
  generation: 1
  labels:
    kyma-project.io/broker-plan-id: 4deee563-e5ec-4731-b9b1-53b42d855f0c
    kyma-project.io/broker-plan-name: azure
    kyma-project.io/global-account-id: e449f875-b5b2-4485-b7c0-98725c0571bf
    kyma-project.io/instance-id: 1ec864c2-b67b-41b6-ba56-eac2974f5c62
    kyma-project.io/region: westeurope
    kyma-project.io/runtime-id: xxx7310b-3a47-4ed5-9009-7925f0b4d166
    kyma-project.io/shoot-name: d7c1cc5
    kyma-project.io/subaccount-id: github-actions-keb-integration
    operator.kyma-project.io/kyma-name: 3a67310b-3a47-4ed5-9009-7925f0b4d166
    kyma-project.io/controlled-by-provisioner: "false"
  name: maper-test-04
  namespace: kcp-system
spec:
  security:
    administrators:
    - test@test.com
    networking:
      filter:
        egress:
          enabled: false
        ingress:
          enabled: false
  shoot:
    controlPlane:
      highAvailability:
        failureTolerance:
          type: node
    kubernetes:
      kubeAPIServer:
        oidcConfig: {}
    name: mapertest03
    networking:
      nodes: 10.250.0.0/22
      pods: 10.96.0.0/13
      services: 10.104.0.0/13
    platformRegion: cf-eu10
    provider:
      type: azure
      workers:
      - machine:
          image:
            name: gardenlinux
            version: 1443.9.0
          type: Standard_D2s_v5
        maxSurge: 3
        maxUnavailable: 0
        maximum: 20
        minimum: 3
        name: ""
        zones:
        - "3"
        - "1"
        - "2"
    purpose: production
    region: westeurope
    secretBindingName: sap-skr-dev-cust-00002-kyma-integration

the status is:

status:
  conditions:
  - lastTransitionTime: "2024-08-08T06:30:31Z"
    message: Gardener API create error
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

Expected result

Cluster is created, status is set to "Ready"

Actual result

Provisioning fails.

Steps to reproduce

Create the given runtime using kubectl

@piotrmiskiewicz
Copy link
Member Author

it looks like the name of the worker is required.
Please enhance error messages for better debugging.

@akgalwas
Copy link
Contributor

akgalwas commented Aug 9, 2024

We will include Gardener's error message.

@piotrmiskiewicz
Copy link
Member Author

piotrmiskiewicz commented Aug 13, 2024

There is still a problem with provisioning, the following Runtime applied by kubectl gets "failed" soon (gardener shoot is not failed):

apiVersion: infrastructuremanager.kyma-project.io/v1
kind: Runtime
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"infrastructuremanager.kyma-project.io/v1","kind":"Runtime","metadata":{"annotations":{},"creationTimestamp":"2024-08-09T12:44:35Z","finalizers":["runtime-controller.infrastructure-manager.kyma-project.io/deletion-hook"],"generation":1,"labels":{"kyma-project.io/broker-plan-id":"4deee563-e5ec-4731-b9b1-53b42d855f0c","kyma-project.io/broker-plan-name":"azure","kyma-project.io/controlled-by-provisioner":"false","kyma-project.io/global-account-id":"e449f875-b5b2-4485-b7c0-98725c0571bf","kyma-project.io/instance-id":"d01b8790-03e6-497a-87a2-6a62ebc59f1f","kyma-project.io/region":"westeurope","kyma-project.io/runtime-id":"00f6c848-60e4-49be-a2a5-d94d923d3e0e","kyma-project.io/shoot-name":"c-741831f","kyma-project.io/subaccount-id":"github-actions-keb-integration","operator.kyma-project.io/kyma-name":"00f6c848-60e4-49be-a2a5-d94d923d3e0e"},"name":"maper-test-kim3","namespace":"kcp-system","resourceVersion":"4033488203","uid":"48995d2a-8d20-48f6-9adf-c1e277249183"},"spec":{"security":{"administrators":["test@test.com"],"networking":{"filter":{"egress":{"enabled":false},"ingress":{"enabled":false}}}},"shoot":{"controlPlane":{"highAvailability":{"failureTolerance":{"type":"node"}}},"kubernetes":{"kubeAPIServer":{"oidcConfig":{"clientID":"9bd05ed7-a930-44e6-8c79-e6defeb7dec9","groupsClaim":"groups","issuerURL":"https://kymatest.accounts400.ondemand.com","signingAlgs":["RS256"],"usernameClaim":"sub","usernamePrefix":"-"}},"version":"1.29"},"name":"c-bbbaaa","networking":{"nodes":"10.250.0.0/22","pods":"10.96.0.0/13","services":"10.104.0.0/13","type":"calico"},"platformRegion":"cf-eu10","provider":{"type":"azure","workers":[{"machine":{"image":{"name":"gardenlinux","version":"1443.9.0"},"type":"Standard_D2s_v5"},"maxSurge":3,"maxUnavailable":0,"maximum":20,"minimum":3,"name":"w1","volume":{"size":"50Gi","type":"Standard_LRS"},"zones":["1","3","2"]}]},"purpose":"production","region":"westeurope","secretBindingName":"sap-skr-dev-cust-00002-kyma-integration"}}}
  creationTimestamp: "2024-08-13T10:49:59Z"
  finalizers:
  - runtime-controller.infrastructure-manager.kyma-project.io/deletion-hook
  generation: 1
  labels:
    kyma-project.io/broker-plan-id: 4deee563-e5ec-4731-b9b1-53b42d855f0c
    kyma-project.io/broker-plan-name: azure
    kyma-project.io/controlled-by-provisioner: "false"
    kyma-project.io/global-account-id: e449f875-b5b2-4485-b7c0-98725c0571bf
    kyma-project.io/instance-id: d01b8790-03e6-497a-87a2-6a62ebc59f1f
    kyma-project.io/region: westeurope
    kyma-project.io/runtime-id: 00f6c848-60e4-49be-a2a5-d94d923d3e0e
    kyma-project.io/shoot-name: c-741831f
    kyma-project.io/subaccount-id: github-actions-keb-integration
    operator.kyma-project.io/kyma-name: 00f6c848-60e4-49be-a2a5-d94d923d3e0e
  name: maper-test-kim3
  namespace: kcp-system
  resourceVersion: "4041029997"
  uid: 67e6542a-2b5f-4ecb-8d71-a14df3ff1a60
spec:
  security:
    administrators:
    - test@test.com
    networking:
      filter:
        egress:
          enabled: false
        ingress:
          enabled: false
  shoot:
    controlPlane:
      highAvailability:
        failureTolerance:
          type: node
    kubernetes:
      kubeAPIServer:
        oidcConfig:
          clientID: 9bd05ed7-a930-44e6-8c79-e6defeb7dec9
          groupsClaim: groups
          issuerURL: https://kymatest.accounts400.ondemand.com
          signingAlgs:
          - RS256
          usernameClaim: sub
          usernamePrefix: '-'
      version: "1.29"
    name: c-bbbaaa
    networking:
      nodes: 10.250.0.0/22
      pods: 10.96.0.0/13
      services: 10.104.0.0/13
      type: calico
    platformRegion: cf-eu10
    provider:
      type: azure
      workers:
      - machine:
          image:
            name: gardenlinux
            version: 1443.9.0
          type: Standard_D2s_v5
        maxSurge: 3
        maxUnavailable: 0
        maximum: 20
        minimum: 3
        name: w1
        volume:
          size: 50Gi
          type: Standard_LRS
        zones:
        - "1"
        - "3"
        - "2"
    purpose: production
    region: westeurope
    secretBindingName: sap-skr-dev-cust-00002-kyma-integration
status:
  conditions:
  - lastTransitionTime: "2024-08-13T10:50:15Z"
    message: Gardener API create error
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

but the shoot was created with a success!

@piotrmiskiewicz
Copy link
Member Author

another example: from Today:

apiVersion: infrastructuremanager.kyma-project.io/v1
kind: Runtime
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"infrastructuremanager.kyma-project.io/v1","kind":"Runtime","metadata":{"annotations":{},"creationTimestamp":"2024-08-14T12:46:01Z","finalizers":["runtime-controller.infrastructure-manager.kyma-project.io/deletion-hook"],"generation":1,"labels":{"kyma-project.io/broker-plan-id":"5cb3d976-b85c-42ea-a636-79cadda109a9","kyma-project.io/broker-plan-name":"preview","kyma-project.io/controlled-by-provisioner":"false","kyma-project.io/global-account-id":"e449f875-b5b2-4485-b7c0-98725c0571bf","kyma-project.io/instance-id":"2fe02d64-1067-4c7c-9cf9-8ade43056d15","kyma-project.io/region":"eu-central-1","kyma-project.io/runtime-id":"eb67e5b3-da61-474c-8c5f-4b77f7d57290","kyma-project.io/shoot-name":"c-maper10","kyma-project.io/subaccount-id":"github-actions-keb-integration","operator.kyma-project.io/kyma-name":"eb67e5b3-da61-474c-8c5f-4b77f7d57290"},"name":"maper010","namespace":"kcp-system"},"spec":{"security":{"administrators":["test@test.com"],"networking":{"filter":{"egress":{"enabled":false},"ingress":{"enabled":false}}}},"shoot":{"controlPlane":{"highAvailability":{"failureTolerance":{"type":"node"}}},"kubernetes":{"kubeAPIServer":{"oidcConfig":{"clientID":"9bd05ed7-a930-44e6-8c79-e6defeb7dec9","groupsClaim":"groups","issuerURL":"https://kymatest.accounts400.ondemand.com","signingAlgs":["RS256"],"usernameClaim":"sub","usernamePrefix":"-"}},"version":"1.29"},"name":"c-maper10","networking":{"nodes":"10.250.0.0/22","pods":"10.96.0.0/13","services":"10.104.0.0/13"},"platformRegion":"cf-eu10","provider":{"type":"aws","workers":[{"machine":{"image":{"name":"gardenlinux","version":"1443.9.0"},"type":"m6i.large"},"maxSurge":3,"maxUnavailable":0,"maximum":20,"minimum":3,"name":"cpu-worker-0","volume":{"size":"50","type":"gp2"},"zones":["eu-central-1c","eu-central-1b","eu-central-1a"]}]},"purpose":"production","region":"eu-central-1","secretBindingName":"sap-aws-skr-dev-cust-00002-kyma-integration"}}}
  creationTimestamp: "2024-08-14T13:33:55Z"
  finalizers:
  - runtime-controller.infrastructure-manager.kyma-project.io/deletion-hook
  generation: 1
  labels:
    kyma-project.io/broker-plan-id: 5cb3d976-b85c-42ea-a636-79cadda109a9
    kyma-project.io/broker-plan-name: preview
    kyma-project.io/controlled-by-provisioner: "false"
    kyma-project.io/global-account-id: e449f875-b5b2-4485-b7c0-98725c0571bf
    kyma-project.io/instance-id: 2fe02d64-1067-4c7c-9cf9-8ade43056d15
    kyma-project.io/region: eu-central-1
    kyma-project.io/runtime-id: eb67e5b3-da61-474c-8c5f-4b77f7d57290
    kyma-project.io/shoot-name: c-maper10
    kyma-project.io/subaccount-id: github-actions-keb-integration
    operator.kyma-project.io/kyma-name: eb67e5b3-da61-474c-8c5f-4b77f7d57290
  name: maper010
  namespace: kcp-system
  resourceVersion: "4043203808"
  uid: 391e2f94-94cd-4450-a849-56f6748e9283
spec:
  security:
    administrators:
    - test@test.com
    networking:
      filter:
        egress:
          enabled: false
        ingress:
          enabled: false
  shoot:
    controlPlane:
      highAvailability:
        failureTolerance:
          type: node
    kubernetes:
      kubeAPIServer:
        oidcConfig:
          clientID: 9bd05ed7-a930-44e6-8c79-e6defeb7dec9
          groupsClaim: groups
          issuerURL: https://kymatest.accounts400.ondemand.com
          signingAlgs:
          - RS256
          usernameClaim: sub
          usernamePrefix: '-'
      version: "1.29"
    name: c-maper10
    networking:
      nodes: 10.250.0.0/22
      pods: 10.96.0.0/13
      services: 10.104.0.0/13
    platformRegion: cf-eu10
    provider:
      type: aws
      workers:
      - machine:
          image:
            name: gardenlinux
            version: 1443.9.0
          type: m6i.large
        maxSurge: 3
        maxUnavailable: 0
        maximum: 20
        minimum: 3
        name: cpu-worker-0
        volume:
          size: "50"
          type: gp2
        zones:
        - eu-central-1c
        - eu-central-1b
        - eu-central-1a
    purpose: production
    region: eu-central-1
    secretBindingName: sap-aws-skr-dev-cust-00002-kyma-integration
status:
  conditions:
  - lastTransitionTime: "2024-08-14T13:33:55Z"
    message: Gardener API create error
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

@akgalwas
Copy link
Contributor

akgalwas commented Aug 16, 2024

The issues described above were caused by the following:

  1. shoot.provider.workers.volume.size has incorrect value ("50" instead "50Gi")
  2. shoot.provider.workers.name property is missing
  3. shoot.provider.networking.type property is missing
  4. shoot.provider.workers.machine.image object must be provided

The problem number 1 is caused by the lack of validations. Here is the following ticket: #229 that will be addressed at some point.

Problems number 2, 3 and 4 must be addressed in KIM. shoot.provider.networking.type default value should be set in the configuration. For the time being you can refer to this PR to see what Runtime CR should currently be passed.

The problems 2,3 and 4 will be fixed in #343

@akgalwas
Copy link
Contributor

There is still a problem with provisioning, the following Runtime applied by kubectl gets "failed" soon (gardener shoot is not failed):

apiVersion: infrastructuremanager.kyma-project.io/v1
kind: Runtime
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"infrastructuremanager.kyma-project.io/v1","kind":"Runtime","metadata":{"annotations":{},"creationTimestamp":"2024-08-09T12:44:35Z","finalizers":["runtime-controller.infrastructure-manager.kyma-project.io/deletion-hook"],"generation":1,"labels":{"kyma-project.io/broker-plan-id":"4deee563-e5ec-4731-b9b1-53b42d855f0c","kyma-project.io/broker-plan-name":"azure","kyma-project.io/controlled-by-provisioner":"false","kyma-project.io/global-account-id":"e449f875-b5b2-4485-b7c0-98725c0571bf","kyma-project.io/instance-id":"d01b8790-03e6-497a-87a2-6a62ebc59f1f","kyma-project.io/region":"westeurope","kyma-project.io/runtime-id":"00f6c848-60e4-49be-a2a5-d94d923d3e0e","kyma-project.io/shoot-name":"c-741831f","kyma-project.io/subaccount-id":"github-actions-keb-integration","operator.kyma-project.io/kyma-name":"00f6c848-60e4-49be-a2a5-d94d923d3e0e"},"name":"maper-test-kim3","namespace":"kcp-system","resourceVersion":"4033488203","uid":"48995d2a-8d20-48f6-9adf-c1e277249183"},"spec":{"security":{"administrators":["test@test.com"],"networking":{"filter":{"egress":{"enabled":false},"ingress":{"enabled":false}}}},"shoot":{"controlPlane":{"highAvailability":{"failureTolerance":{"type":"node"}}},"kubernetes":{"kubeAPIServer":{"oidcConfig":{"clientID":"9bd05ed7-a930-44e6-8c79-e6defeb7dec9","groupsClaim":"groups","issuerURL":"https://kymatest.accounts400.ondemand.com","signingAlgs":["RS256"],"usernameClaim":"sub","usernamePrefix":"-"}},"version":"1.29"},"name":"c-bbbaaa","networking":{"nodes":"10.250.0.0/22","pods":"10.96.0.0/13","services":"10.104.0.0/13","type":"calico"},"platformRegion":"cf-eu10","provider":{"type":"azure","workers":[{"machine":{"image":{"name":"gardenlinux","version":"1443.9.0"},"type":"Standard_D2s_v5"},"maxSurge":3,"maxUnavailable":0,"maximum":20,"minimum":3,"name":"w1","volume":{"size":"50Gi","type":"Standard_LRS"},"zones":["1","3","2"]}]},"purpose":"production","region":"westeurope","secretBindingName":"sap-skr-dev-cust-00002-kyma-integration"}}}
  creationTimestamp: "2024-08-13T10:49:59Z"
  finalizers:
  - runtime-controller.infrastructure-manager.kyma-project.io/deletion-hook
  generation: 1
  labels:
    kyma-project.io/broker-plan-id: 4deee563-e5ec-4731-b9b1-53b42d855f0c
    kyma-project.io/broker-plan-name: azure
    kyma-project.io/controlled-by-provisioner: "false"
    kyma-project.io/global-account-id: e449f875-b5b2-4485-b7c0-98725c0571bf
    kyma-project.io/instance-id: d01b8790-03e6-497a-87a2-6a62ebc59f1f
    kyma-project.io/region: westeurope
    kyma-project.io/runtime-id: 00f6c848-60e4-49be-a2a5-d94d923d3e0e
    kyma-project.io/shoot-name: c-741831f
    kyma-project.io/subaccount-id: github-actions-keb-integration
    operator.kyma-project.io/kyma-name: 00f6c848-60e4-49be-a2a5-d94d923d3e0e
  name: maper-test-kim3
  namespace: kcp-system
  resourceVersion: "4041029997"
  uid: 67e6542a-2b5f-4ecb-8d71-a14df3ff1a60
spec:
  security:
    administrators:
    - test@test.com
    networking:
      filter:
        egress:
          enabled: false
        ingress:
          enabled: false
  shoot:
    controlPlane:
      highAvailability:
        failureTolerance:
          type: node
    kubernetes:
      kubeAPIServer:
        oidcConfig:
          clientID: 9bd05ed7-a930-44e6-8c79-e6defeb7dec9
          groupsClaim: groups
          issuerURL: https://kymatest.accounts400.ondemand.com
          signingAlgs:
          - RS256
          usernameClaim: sub
          usernamePrefix: '-'
      version: "1.29"
    name: c-bbbaaa
    networking:
      nodes: 10.250.0.0/22
      pods: 10.96.0.0/13
      services: 10.104.0.0/13
      type: calico
    platformRegion: cf-eu10
    provider:
      type: azure
      workers:
      - machine:
          image:
            name: gardenlinux
            version: 1443.9.0
          type: Standard_D2s_v5
        maxSurge: 3
        maxUnavailable: 0
        maximum: 20
        minimum: 3
        name: w1
        volume:
          size: 50Gi
          type: Standard_LRS
        zones:
        - "1"
        - "3"
        - "2"
    purpose: production
    region: westeurope
    secretBindingName: sap-skr-dev-cust-00002-kyma-integration
status:
  conditions:
  - lastTransitionTime: "2024-08-13T10:50:15Z"
    message: Gardener API create error
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

but the shoot was created with a success!

We will investigate this case.

@jaroslaw-pieszka
Copy link

The issues described above were caused by the following:

  1. shoot.provider.workers.volume.size has incorrect value ("50" instead "50Gi")
  2. shoot.provider.workers.name property is missing
  3. shoot.provider.networking.type property is missing
  4. shoot.provider.workers.machine.image object must be provided

The problem number 1 is caused by the lack of validations. Here is the following ticket: #229 that will be addressed at some point.

Problems number 2, 3 and 4 must be addressed in KIM. shoot.provider.networking.type default value should be set in the configuration. For the time being you can refer to this PR to see what Runtime CR should currently be passed.

Problem 1 is resolved in KEB PR , currently we use always the same unit Gi but it is added to volume size.

@jaroslaw-pieszka
Copy link

jaroslaw-pieszka commented Aug 19, 2024

After manually changing shoot name again, for a few secs we had state Pending then we got:

  conditions:
  - lastTransitionTime: "2024-08-19T10:38:33Z"
    message: 'Gardener API create error: shoots.core.gardener.cloud "badcafe00" already
      exists'
    reason: GardenerErr
    status: "False"
    type: Provisioned

@jaroslaw-pieszka
Copy link

Test for preview plan on DEV - cryptic error message:

status:
  conditions:
  - lastTransitionTime: "2024-08-20T02:10:48Z"
    message: Gardener API create error
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

Entire Runtime CR:

apiVersion: infrastructuremanager.kyma-project.io/v1
kind: Runtime
metadata:
  creationTimestamp: "2024-08-20T02:10:32Z"
  finalizers:
  - runtime-controller.infrastructure-manager.kyma-project.io/deletion-hook
  generation: 1
  labels:
    kyma-project.io/broker-plan-id: 5cb3d976-b85c-42ea-a636-79cadda109a9
    kyma-project.io/broker-plan-name: preview
    kyma-project.io/controlled-by-provisioner: "false"
    kyma-project.io/global-account-id: e449f875-b5b2-4485-b7c0-98725c0571bf
    kyma-project.io/instance-id: 8eaf1089-cabb-491d-914e-47e19aff1354
    kyma-project.io/region: eu-central-1
    kyma-project.io/runtime-id: aa2a9662-c997-4513-8e37-ab01fa729717
    kyma-project.io/shoot-name: f675a9d
    kyma-project.io/subaccount-id: github-actions-keb-integration
    operator.kyma-project.io/kyma-name: aa2a9662-c997-4513-8e37-ab01fa729717
  name: aa2a9662-c997-4513-8e37-ab01fa729717
  namespace: kcp-system
  resourceVersion: "4054093395"
  uid: dcb95cee-8436-4d8d-b46d-7c80ea10114b
spec:
  security:
    administrators:
    - test@test.com
    networking:
      filter:
        egress:
          enabled: true
        ingress:
          enabled: false
  shoot:
    controlPlane:
      highAvailability:
        failureTolerance:
          type: node
    kubernetes:
      kubeAPIServer:
        oidcConfig:
          clientID: 9bd05ed7-a930-44e6-8c79-e6defeb7dec9
          groupsClaim: groups
          issuerURL: https://kymatest.accounts400.ondemand.com
          signingAlgs:
          - RS256
          usernameClaim: sub
          usernamePrefix: '-'
      version: "1.29"
    name: f675a9d
    networking:
      nodes: 10.250.0.0/22
      pods: 10.96.0.0/13
      services: 10.104.0.0/13
      type: calico
    platformRegion: cf-eu10
    provider:
      type: aws
      workers:
      - machine:
          image:
            name: gardenlinux
            version: 1443.9.0
          type: m6i.large
        maxSurge: 3
        maxUnavailable: 0
        maximum: 20
        minimum: 3
        name: cpu-worker-0
        volume:
          size: 50Gi
          type: gp2
        zones:
        - eu-central-1c
        - eu-central-1b
        - eu-central-1a
    purpose: production
    region: eu-central-1
    secretBindingName: sap-aws-skr-dev-cust-00002-kyma-integration
status:
  conditions:
  - lastTransitionTime: "2024-08-20T02:10:48Z"
    message: Gardener API create error
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

@akgalwas
Copy link
Contributor

After manually changing shoot name again, for a few secs we had state Pending then we got:

  conditions:
  - lastTransitionTime: "2024-08-19T10:38:33Z"
    message: 'Gardener API create error: shoots.core.gardener.cloud "badcafe00" already
      exists'
    reason: GardenerErr
    status: "False"
    type: Provisioned

It should be fixed by the following PR: #350 .

@jaroslaw-pieszka
Copy link

Provisioning of preview plan fails again.

status:
  conditions:
  - lastTransitionTime: "2024-08-22T02:11:10Z"
    message: Gardener API create error
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

@Disper
Copy link
Member

Disper commented Aug 23, 2024

I managed to reproduce the error using the Runtime CR from @jaroslaw-pieszka last failed attempt. I didn't manage to find what is the reason of the failure yet. Sharing the status from the broken shoot.

status:
  conditions:
    - type: APIServerAvailable
      status: 'True'
      lastTransitionTime: '2024-08-23T14:29:18Z'
      lastUpdateTime: '2024-08-23T14:29:18Z'
      reason: HealthzRequestSucceeded
      message: API server /healthz endpoint responded with success status code.
    - type: ControlPlaneHealthy
      status: Progressing
      lastTransitionTime: '2024-08-23T15:11:28Z'
      lastUpdateTime: '2024-08-23T14:54:24Z'
      reason: ControlPlaneUnhealthyReport
      message: >-
        ControlPlane extension (shoot--kyma-dev--md-kim03/md-kim03) reports
        failing health check: deployment "aws-custom-route-controller" in
        namespace "shoot--kyma-dev--md-kim03" is unhealthy: condition
        "Available" has invalid status False (expected True) due to
        MinimumReplicasUnavailable: Deployment does not have minimum
        availability.
    - type: ObservabilityComponentsHealthy
      status: Progressing
      lastTransitionTime: '2024-08-23T15:08:27Z'
      lastUpdateTime: '2024-08-23T14:23:17Z'
      reason: DeploymentMissing
      message: 'Missing required deployments: [kube-state-metrics]'
    - type: EveryNodeReady
      status: Progressing
      lastTransitionTime: '2024-08-23T15:09:28Z'
      lastUpdateTime: '2024-08-23T15:10:28Z'
      reason: WorkerUnhealthyReport
      message: >-
        Worker extension (shoot--kyma-dev--md-kim03/md-kim03) reports failing
        health check: machine
        "shoot--kyma-dev--md-kim03-cpu-worker-0-z1-6f484-8qp6j" failed: Machine
        shoot--kyma-dev--md-kim03-cpu-worker-0-z1-6f484-8qp6j failed to join the
        cluster in 20m0s minutes.
    - type: SystemComponentsHealthy
      status: Progressing
      lastTransitionTime: '2024-08-23T15:11:28Z'
      lastUpdateTime: '2024-08-23T15:12:28Z'
      reason: DeploymentUnhealthy
      message: >-
        Deployment "kube-system/blackbox-exporter" is unhealthy: condition
        "Available" has invalid status False (expected True) due to
        MinimumReplicasUnavailable: Deployment does not have minimum
        availability.
  constraints:
    - type: HibernationPossible
      status: 'True'
      lastTransitionTime: '2024-08-23T14:29:18Z'
      lastUpdateTime: '2024-08-23T14:29:18Z'
      reason: NoProblematicWebhooks
      message: All webhooks are properly configured.
    - type: MaintenancePreconditionsSatisfied
      status: 'True'
      lastTransitionTime: '2024-08-23T14:29:18Z'
      lastUpdateTime: '2024-08-23T14:29:18Z'
      reason: NoProblematicWebhooks
      message: All webhooks are properly configured.
  gardener:
    id: prjl5geENMygoz0C5Il9bTglCe0SvgTqsSaTW36IstsjzDBMltENb3m8EFhhN1et
    name: gardenlet-6c8d99b9d9-zn5ph
    version: v1.101.2
  hibernated: false
  lastOperation:
    description: >-
      Flow "Shoot cluster reconciliation" encountered task errors: [task
      "Waiting until shoot worker nodes have been reconciled" failed: Error
      while waiting for Worker shoot--kyma-dev--md-kim03/md-kim03 to become
      ready: error during reconciliation: Error reconciling Worker: failed while
      waiting for all machine deployments to be ready: machine(s) failed: 1
      error occurred: "shoot--kyma-dev--md-kim03-cpu-worker-0-z1-6f484-c6ktl":
      Machine shoot--kyma-dev--md-kim03-cpu-worker-0-z1-6f484-c6ktl failed to
      join the cluster in 20m0s minutes.] Operation will be retried.
    lastUpdateTime: '2024-08-23T15:09:03Z'
    progress: 89
    state: Error
    type: Create
  lastErrors:
    - description: >-
        task "Waiting until shoot worker nodes have been reconciled" failed:
        Error while waiting for Worker shoot--kyma-dev--md-kim03/md-kim03 to
        become ready: error during reconciliation: Error reconciling Worker:
        failed while waiting for all machine deployments to be ready: machine(s)
        failed: 1 error occurred:
        "shoot--kyma-dev--md-kim03-cpu-worker-0-z1-6f484-c6ktl": Machine
        shoot--kyma-dev--md-kim03-cpu-worker-0-z1-6f484-c6ktl failed to join the
        cluster in 20m0s minutes.
      taskID: Waiting until shoot worker nodes have been reconciled
      lastUpdateTime: '2024-08-23T15:09:03Z'

@Disper
Copy link
Member

Disper commented Aug 26, 2024

@jaroslaw-pieszka could you double check if it can be related to regions & zones configuration? I've been creating clusters with different tiny changes and made this working by changing region to region: eu-central-1 and by setting zones to:

    zones:
    - eu-central-1a
    - eu-central-1b
    - eu-central-1c

Attaching the runtimeCR that has worked for me and the failing one before the modification mentioned above

@Disper
Copy link
Member

Disper commented Aug 27, 2024

@Disper
Copy link
Member

Disper commented Aug 30, 2024

The issue with eu-west-2 was fixed.
I've now tried to reproduce the issue @jaroslaw-pieszka had with this Runtime CR and @piotrmiskiewicz had with this Runtime CR and both clusters were successfully created. However, neither of those examples is related to the recently broken eu-west-2 region.

@piotrmiskiewicz, @jaroslaw-pieszka - could you try again?

@piotrmiskiewicz
Copy link
Member Author

currently KEB is able to create Runtime which becomes Ready (in few minutes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants