Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Runtime is Failed even if Gardener Shoot is "in progress" #346

Closed
Tracked by #112
piotrmiskiewicz opened this issue Aug 16, 2024 · 8 comments
Closed
Tracked by #112

Runtime is Failed even if Gardener Shoot is "in progress" #346

piotrmiskiewicz opened this issue Aug 16, 2024 · 8 comments
Assignees

Comments

@piotrmiskiewicz
Copy link
Member

piotrmiskiewicz commented Aug 16, 2024

Description

Runtime has a state failed

status:
  conditions:
  - lastTransitionTime: "2024-08-16T13:06:19Z"
    message: 'Gardener API create error: shoots.core.gardener.cloud "keb-x-01" already
      exists'
    reason: GardenerErr
    status: "False"
    type: Provisioned
  state: Failed

but this shoot name was not used before

Expected result

Runtime with state "Ready"

Actual result

Runtime is failed even if gardener is still in progress

Steps to reproduce

Troubleshooting

@piotrmiskiewicz
Copy link
Member Author

the Spec:

  security:
    administrators:
    - admin1@test.com
    - admin2@test.com
    networking:
      filter:
        egress:
          enabled: false
        ingress:
          enabled: false
  shoot:
    controlPlane:
      highAvailability:
        failureTolerance:
          type: node
    kubernetes:
      kubeAPIServer:
        oidcConfig:
          clientID: 9bd05ed7-a930-44e6-8c79-e6defeb7dec9
          groupsClaim: groups
          issuerURL: https://kymatest.accounts400.ondemand.com
          signingAlgs:
          - RS256
          usernameClaim: sub
          usernamePrefix: '-'
      version: "1.29"
    name: keb-x-01
    networking:
      nodes: 10.250.0.0/22
      pods: 10.96.0.0/13
      services: 10.104.0.0/13
      type: calico
    platformRegion: platform-region
    provider:
      type: aws
      workers:
      - machine:
          image:
            name: gardenlinux
            version: 1312.3.0
          type: m6i.large
        maxSurge: 1
        maxUnavailable: 0
        maximum: 5
        minimum: 4
        name: cpu-worker-0
        volume:
          size: 50Gi
          type: gp2
        zones:
        - eu-west-2b
    purpose: production
    region: eu-west-2
    secretBindingName: sap-aws-skr-dev-cust-00002-kyma-integration```

@Disper
Copy link
Member

Disper commented Aug 19, 2024

Currently there is such shoot on DEV https://dashboard.garden.canary.k8s.ondemand.com/namespace/garden-kyma-dev/shoots/keb-x-01. @piotrmiskiewicz have you checked if there were no shoots with such name in https://dashboard.garden.canary.k8s.ondemand.com/namespace/garden-kyma-dev/shoots/ when your error has occured?

Some more examples of shootname already exist - #332 (comment)

@jaroslaw-pieszka
Copy link

Another attempts fail with messages

    message: 'Gardener API create error: shoots.core.gardener.cloud "bd0e0e5" already
      exists'
     message: 'Gardener API create error: shoots.core.gardener.cloud "ad0e0e5" already
      exists'

The first attempt was with shoot name generated during E2E tests.
The latter was by changing shoot name in previously created Runtime CR - so there is extremely high probability the shoot name was not used before.

@Disper
Copy link
Member

Disper commented Aug 19, 2024

Logs from DEV connected to shoot bd0e0e5

Z    ERROR   reqID 53158     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:27:17Z    DEBUG   events  Gardener API create error: Shoot.core.gardener.cloud "bd0e0e5" is invalid: spec.networking.type: Required value: networking type must be provided: kcp-system/72a29ca1-65c7-4a5a-8191-a260adbc2de2  {"type": "Warning", "object": {"kind":"Runtime","namespace":"kcp-system","name":"72a29ca1-65c7-4a5a-8191-a260adbc2de2","uid":"7295b4b4-ab35-473c-8ae2-0c20d2e932a6","apiVersion":"infrastructuremanager.kyma-project.io/v1","resourceVersion":"4052677301"}, "reason": "GardenerErr"}
2024-08-19T09:27:36Z    ERROR   reqID 53162     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:27:55Z    ERROR   reqID 53166     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:28:12Z    ERROR   reqID 53170     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:28:32Z    ERROR   reqID 53174     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:28:55Z    ERROR   reqID 53178     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:29:17Z    ERROR   reqID 53182     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:29:36Z    ERROR   reqID 53186     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:29:59Z    ERROR   reqID 53190     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:30:21Z    ERROR   reqID 53194     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:30:41Z    ERROR   reqID 53198     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:31:03Z    ERROR   reqID 53202     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:31:21Z    ERROR   reqID 53206     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:31:41Z    ERROR   reqID 53210     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:32:01Z    ERROR   reqID 53214     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:32:21Z    ERROR   reqID 53218     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:32:41Z    ERROR   reqID 53222     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:33:02Z    ERROR   reqID 53226     Failed to create new gardener Shoot     {"error": "Shoot.core.gardener.cloud \"bd0e0e5\" is invalid: spec.networking.type: Required value: networking type must be provided"}
2024-08-19T09:33:13Z    INFO    reqID 53227     Gardener shoot for runtime initialised successfully     {"Name": "bd0e0e5", "Namespace": "garden-kyma-dev"}
2024-08-19T09:33:35Z    ERROR   reqID 53231     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:33:35Z    DEBUG   events  Gardener API create error: shoots.core.gardener.cloud "bd0e0e5" already exists: kcp-system/72a29ca1-65c7-4a5a-8191-a260adbc2de2     {"type": "Warning", "object": {"kind":"Runtime","namespace":"kcp-system","name":"72a29ca1-65c7-4a5a-8191-a260adbc2de2","uid":"7295b4b4-ab35-473c-8ae2-0c20d2e932a6","apiVersion":"infrastructuremanager.kyma-project.io/v1","resourceVersion":"4052686206"}, "reason": "GardenerErr"}
2024-08-19T09:34:25Z    ERROR   reqID 53239     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:34:25Z    DEBUG   events  Gardener API create error: shoots.core.gardener.cloud "bd0e0e5" already exists: kcp-system/72a29ca1-65c7-4a5a-8191-a260adbc2de2     {"type": "Warning", "object": {"kind":"Runtime","namespace":"kcp-system","name":"72a29ca1-65c7-4a5a-8191-a260adbc2de2","uid":"7295b4b4-ab35-473c-8ae2-0c20d2e932a6","apiVersion":"infrastructuremanager.kyma-project.io/v1","resourceVersion":"4052687325"}, "reason": "GardenerErr"}
2024-08-19T09:34:52Z    ERROR   reqID 53243     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:35:13Z    ERROR   reqID 53247     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:35:34Z    ERROR   reqID 53251     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:35:57Z    ERROR   reqID 53255     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:36:23Z    ERROR   reqID 53259     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:36:49Z    ERROR   reqID 53263     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:37:10Z    ERROR   reqID 53267     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:37:29Z    ERROR   reqID 53271     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:37:53Z    ERROR   reqID 53275     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:38:13Z    ERROR   reqID 53279     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:38:34Z    ERROR   reqID 53283     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:38:52Z    ERROR   reqID 53287     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:39:09Z    ERROR   reqID 53291     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:39:28Z    ERROR   reqID 53295     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:39:46Z    ERROR   reqID 53299     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:40:08Z    ERROR   reqID 53303     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:40:27Z    ERROR   reqID 53307     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:40:44Z    ERROR   reqID 53311     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:02Z    ERROR   reqID 53315     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:19Z    ERROR   reqID 53319     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:38Z    ERROR   reqID 53323     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
2024-08-19T09:41:56Z    ERROR   reqID 53327     Failed to create new gardener Shoot     {"error": "shoots.core.gardener.cloud \"bd0e0e5\" already exists"}
  1. It looks like the cluster fails to be created because of the invalid networking type.
  2. I suspect that this might be some temporarily hiccup, as finally the cluster is created and in a healthy state https://dashboard.garden.canary.k8s.ondemand.com/namespace/garden-kyma-dev/shoots/bd0e0e5
  3. I have not yet analysed the state machine but it could be that KIM is reconciling an initial error and trying again to create the cluster... which is actually created. That's result in valid response from Gardener and invalid KIM behavior of setting the state to error.

@jaroslaw-pieszka
Copy link

jaroslaw-pieszka commented Aug 19, 2024

After manually changing shoot name again, for a few secs we had state Pending then we got:

  conditions:
  - lastTransitionTime: "2024-08-19T10:38:33Z"
    message: 'Gardener API create error: shoots.core.gardener.cloud "badcafe00" already
      exists'
    reason: GardenerErr
    status: "False"
    type: Provisioned

@akgalwas
Copy link
Contributor

There is definitely a regression. To sum it up, the already exists error occurs in the following cases:

  1. New runtime CR is created. The spec.shoot.name points to a shoot that already exists on Gardener
  2. spec.shoot.name is changed on existent Runtime CR

The problem is a blocker for migration as KIM will not be able to takeover existent runtime created by the Provisioner.

@Disper
Copy link
Member

Disper commented Sep 6, 2024

@piotrmiskiewicz , @jaroslaw-pieszka have you encountered this issue recently?

@Disper
Copy link
Member

Disper commented Sep 6, 2024

I've heard from @piotrmiskiewicz that recent runtime creations were succesful so I'm closing the ticket and please re-open it if you encounter the issue again.

@Disper Disper closed this as completed Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants