Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Kim FSM operations timeout #325

Closed

Conversation

koala7659
Copy link
Contributor

@koala7659 koala7659 commented Aug 4, 2024

Description

The operations processed by the KIM service in the Runtime CR controller reconciliation loop are interrupted after the timeout period has elapsed. KIM is configured with parameters:

  • provisioning-timeout - for shoot creation timeout
  • update-timeout - for shoot update timeout
  • deprovisioning-timeout - for Shoot deletion timeout

Implementation
The moment every new operation starts by Runtime controller - a new annotation is added to RuntimeCR with the current timestamp. Example:

kyma-project.io/runtime-operation-started: "2024-08-09T05:06:02Z"

During each reconciliation cycle - the current time is checked against saved timestamp and configuration parameter.

When operation succeeds on Gardener side before timeout - the Runtime CR status is switched in Ready state and timestamp annotation is removed from Runtime CR.

When operation does on Gardener side before timeout - the Runtime CR status is switched in Failed state and timestamp annotation remains on Runtime CR to prevent any further reconciliation attempts.

Recovery
If the timeout during Update function occurs - user (KEB) can fix broken configuration that has timeout by manually deleting annotation in the Runtime instance:
kyma-project.io/runtime-operation-started:
In such a case Runtime reconciler will Patch Gardner shoot with fixed configuration and will start next upgrade cycle and wait for operation to be completed on Gardener side.

Output

Provisioning timeout:

 status:
    conditions:
    - lastTransitionTime: "2024-08-08T11:17:23Z"
      message: Shoot creation timeout
      reason: ShootCreationTimeout
      status: "False"
      type: Provisioned
    state: Failed

Deprovisioning timeout:

 status:
    conditions:
    - lastTransitionTime: "2024-08-08T11:17:23Z"
      message: Runtime deprovisioning timeout
      reason: ShootDeletionTimeout
      status: "False"
      type: Deprovisioned
    state: Failed

Upgrade timeout:

 status:
    conditions:
    - lastTransitionTime: "2024-08-08T11:17:23Z"
      message: Shoot reconcile timeout
      reason: ShootProcessingTimeout
      status: "False"
      type: Provisioned
    state: Failed

Related issue(s)
#324

@koala7659 koala7659 requested a review from a team as a code owner August 4, 2024 10:33
@kyma-bot kyma-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cla: yes Indicates the PR's author has signed the CLA. labels Aug 4, 2024
@koala7659 koala7659 marked this pull request as draft August 4, 2024 10:33
@kyma-bot kyma-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 4, 2024
@koala7659 koala7659 force-pushed the kim-fsm-operations-timeout branch from 202c144 to 2059974 Compare August 4, 2024 10:39
@koala7659 koala7659 force-pushed the kim-fsm-operations-timeout branch from 2059974 to a1e9c9c Compare August 5, 2024 06:50
@kyma-bot kyma-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 5, 2024
@kyma-bot kyma-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 6, 2024
@kyma-bot kyma-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 6, 2024
@kyma-bot kyma-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 6, 2024
@koala7659
Copy link
Contributor Author

Closing as we decided this functionality will not be part of KIM and we will use Gardener shoot state to determine Runtime CR status. All timeouts if occurred will be will be propagated.

@koala7659 koala7659 closed this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Indicates the PR's author has signed the CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants