Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KIM:add mechanism to prevent incompatible migration of Shoot-Spec #400

Closed
1 task
tobiscr opened this issue Sep 26, 2024 · 8 comments
Closed
1 task

KIM:add mechanism to prevent incompatible migration of Shoot-Spec #400

tobiscr opened this issue Sep 26, 2024 · 8 comments
Assignees
Labels
area/control-plane Related to all activities around Kyma Control Plane kind/feature Categorizes issue or PR as related to a new feature.

Comments

@tobiscr
Copy link
Contributor

tobiscr commented Sep 26, 2024

Description

To block critical changes in already published Shoot-CRs, KIM was to be extended to detect critical changes it would apply in the Shoot-CR during the migration.

KIM is only allowed to migrate an existing SKR Shoot-CR if its generated Shoot-CR looks not different compared to the existing Shoot-Spec.

Incompatible changes will be detected and reported via either Metrics to the operational team or/and written to KIM logs.

Screenshot 2024-09-26 at 11 22 47

AC:

  • KIM is able to detect incompatible changes in the Shoot-Spec before it's updating the Shoot-Spec on Gardener's Seed. cluster.

Reasons

Prevent incompatible Shoot-Spec changes to prevent service interruptions by KIM.

Attachments

@tobiscr tobiscr added kind/feature Categorizes issue or PR as related to a new feature. area/control-plane Related to all activities around Kyma Control Plane labels Sep 26, 2024
@Disper Disper removed their assignment Sep 30, 2024
@m00g3n m00g3n assigned VOID404 and unassigned m00g3n Oct 16, 2024
@akgalwas
Copy link
Contributor

First run for the migration tool (AWS) with the matchers included:

spec/extensions: Expected
    <[]v1beta1.Extension | len:3, cap:4>: [
        {
            Type: "shoot-dns-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.dns.extensions.gardener.cloud/v1alpha1\",\"dnsProviderReplication\":{\"enabled\":true},\"kind\":\"DNSConfig\"}",
                Object: nil,
            },
            Disabled: nil,
        },
        {
            Type: "shoot-networking-filter",
            ProviderConfig: nil,
            Disabled: true,
        },
        {
            Type: "shoot-cert-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.cert.extensions.gardener.cloud/v1alpha1\",\"shootIssuers\":{\"enabled\":true},\"kind\":\"CertConfig\"}",
                Object: nil,
            },
            Disabled: nil,
        },
    ]
to match elements: [[shoot-dns-service].ProviderConfig:
	Expected object to be comparable, diff:   &runtime.RawExtension{
	  	Raw: bytes.Join({
	  		`{"apiVersion":"service.dns.extensions.gardener.cloud/v1alpha1","`,
	  		`dnsProviderReplication":{"enabled":true},"kind":"DNSConfig"`,
	+ 		`,"providers":[{"domains":{"include":["c-76a80a6.dev.kyma.ondeman`,
	+ 		`d.com"]},"secretName":"shoot-dns-service-aws-route53-secret-dev"`,
	+ 		`,"type":"aws-route53"}],"syncProvidersFromShootSpecDNS":true`,
	  		"}",
	  	}, ""),
	  	Object: nil,
	  }
	, [shoot-networking-filter].Disabled:
	Expected object to be comparable, diff:   &bool(
	- 	true,
	+ 	false,
	  )
	]
spec/kubernetes: Expected
    <string>: Kubernetes
to match fields: {
.KubeAPIServer:
	Expected
	    <string>: KubeAPIServerConfig
	to match fields: {
	.StructuredAuthentication:
		unexpected field StructuredAuthentication: {KubernetesConfig:{FeatureGates:map[]} AdmissionPlugins:[] APIAudiences:[] AuditConfig:nil OIDCConfig:&OIDCConfig{CABundle:nil,ClientAuthentication:nil,ClientID:*9bd05ed7-a930-44e6-8c79-e6defeb7dec9,GroupsClaim:*groups,GroupsPrefix:nil,IssuerURL:*https://kymatest.accounts400.ondemand.com,RequiredClaims:map[string]string{},SigningAlgs:[RS256],UsernameClaim:*sub,UsernamePrefix:*-,} RuntimeConfig:map[] ServiceAccountConfig:nil WatchCacheSizes:nil Requests:nil EnableAnonymousAuthentication:<nil> EventTTL:nil Logging:nil DefaultNotReadyTolerationSeconds:<nil> DefaultUnreachableTolerationSeconds:<nil> EncryptionConfig:nil StructuredAuthentication:nil}
	}

}

spec/cloudProfile: Expected object to be comparable, diff:   (*v1beta1.CloudProfileReference)(
- 	nil,
+ 	s"&CloudProfileReference{Kind:CloudProfile,Name:aws,}",
  )

spec/provider: spec/provider/controlPlaneConfig: expected should equal actual
metadata/labels: Expected
    <map[string]string | len:2>: {
        "account": "8cd57dc2-edb2-45e0-af8b-7d881006e516",
        "subaccount": "1b23b45b-7d90-401f-850a-42f873736160",
    }
to have {key: value}
    <map[interface {}]interface {} | len:1>: {
        <string>"extensions.extensions.gardener.cloud/shoot-oidc-service": <string>"true",
    }
metadata/annotations: Expected
    <map[string]string | len:5>: {
        "kcp.provisioner.kyma-project.io/operation-id": "40de55bb-8ec9-4e9c-92f7-c8fd5460d0c3",
        "kcp.provisioner.kyma-project.io/runtime-id": "096bdb3d-1f30-4d7c-b41f-c04cfa04871a",
        "compass.provisioner.kyma-project.io/operation-id": "40de55bb-8ec9-4e9c-92f7-c8fd5460d0c3",
        "compass.provisioner.kyma-project.io/runtime-id": "096bdb3d-1f30-4d7c-b41f-c04cfa04871a",
        "gardener.cloud/created-by": "system:serviceaccount:garden-kyma-dev:compass-provisioner",
    }
to have {key: value}
    <map[interface {}]interface {} | len:1>: {
        <string>"infrastructuremanager.kyma-project.io/runtime-id": <string>"096bdb3d-1f30-4d7c-b41f-c04cfa04871a",
    }%

@akgalwas
Copy link
Contributor

akgalwas commented Oct 23, 2024

The following problems were identified that are blockers for the migration:

  • Matchers:
    • DNS matcher detects difference on secretName field
    • We expect that labels generated by the converter will contain some labels which are not here ( e.g. extensions.extensions.gardener.cloud/shoot-dns-service) (PR)
    • unexpected field StructuredAuthentication (PR)
    • We expect that shoot from gardener contains infrastructuremanager.kyma-project.io/runtime-generation which is not true (PR)
    • cloudProfile field should not be compared (PR)
    • Raw extension matcher modifies compared objects (PR)
    • CloudProfileName was removed from comparison (PR)
  • KIM
    • shoot-networking-filter is set to disabled when it should be enabled (PR)
    • spec.provider.controlPlane config differs in AWS migration scenario. Probably we need to allow passing the provider configs in the Runtime CR.
  • Migrator :
    • Runtime CR is not saved (PR)
    • Storing shoot yamls fails (PR)
    • Generating runtime for Openstack fails
    • Remaining fixes PR
      • Make output shorter if possible
      • Fix issue with providing incorrect path to the runtime CR when failed to find RuntimeID
      • Handle do-not-delete label
      • Describe changes in a README

Problems to investigate:

  • DNS extender generates extension yaml that is different from the one on Gardener so it can cause unwanted shoot reconciliation . We suspect that implementation from this (PR) caused regression, however, it this point we cannot reproduce that.

@akgalwas
Copy link
Contributor

akgalwas commented Oct 24, 2024

Remaining differences in AWS (changes in KIM required):

spec/extensions: Expected
    <[]v1beta1.Extension | len:3, cap:4>: [
        {
            Type: "shoot-dns-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.dns.extensions.gardener.cloud/v1alpha1\",\"kind\":\"DNSConfig\",\"dnsProviderReplication\":{\"enabled\":true},\"providers\":[{\"domains\":{\"include\":[\"c081b56.dev.kyma.ondemand.com\"]},\"secretName\":\"aws-route53-secret-dev\",\"type\":\"aws-route53\"}],\"syncProvidersFromShootSpecDNS\":true}",
                Object: nil,
            },
            Disabled: nil,
        },
        {
            Type: "shoot-networking-filter",
            ProviderConfig: nil,
            Disabled: false,
        },
        {
            Type: "shoot-cert-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.cert.extensions.gardener.cloud/v1alpha1\",\"shootIssuers\":{\"enabled\":true},\"kind\":\"CertConfig\"}",
                Object: nil,
            },
            Disabled: nil,
        },
    ]
to match elements: [shoot-dns-service].ProviderConfig:
	expected should equal actual
spec/provider: spec/provider/controlPlaneConfig: expected should equal actual

@akgalwas
Copy link
Contributor

akgalwas commented Oct 24, 2024

Remaining differences in GCP (changes in KIM required)::

spec/extensions: Expected
    <[]v1beta1.Extension | len:3, cap:4>: [
        {
            Type: "shoot-dns-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.dns.extensions.gardener.cloud/v1alpha1\",\"kind\":\"DNSConfig\",\"dnsProviderReplication\":{\"enabled\":true},\"providers\":[{\"domains\":{\"include\":[\"d0c89ca.dev.kyma.ondemand.com\"]},\"secretName\":\"aws-route53-secret-dev\",\"type\":\"aws-route53\"}],\"syncProvidersFromShootSpecDNS\":true}",
                Object: nil,
            },
            Disabled: nil,
        },
        {
            Type: "shoot-networking-filter",
            ProviderConfig: nil,
            Disabled: false,
        },
        {
            Type: "shoot-cert-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.cert.extensions.gardener.cloud/v1alpha1\",\"shootIssuers\":{\"enabled\":true},\"kind\":\"CertConfig\"}",
                Object: nil,
            },
            Disabled: nil,
        },
    ]
to match elements: [shoot-dns-service].ProviderConfig:
	expected should equal actua

@akgalwas
Copy link
Contributor

Remaining differences in Azure (changes in KIM required):

spec/extensions: Expected
    <[]v1beta1.Extension | len:3, cap:4>: [
        {
[
            Type: "shoot-dns-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.dns.extensions.gardener.cloud/v1alpha1\",\"kind\":\"DNSConfig\",\"dnsProviderReplication\":{\"enabled\":true},\"providers\":[{\"domains\":{\"include\":[\"a50de45.dev.kyma.ondemand.com\"]},\"secretName\":\"aws-route53-secret-dev\",\"type\":\"aws-route53\"}],\"syncProvidersFromShootSpecDNS\":true}",
                Object: nil,
            },
            Disabled: nil,
        },
        {
            Type: "shoot-networking-filter",
            ProviderConfig: nil,
            Disabled: false,
        },
        {
            Type: "shoot-cert-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.cert.extensions.gardener.cloud/v1alpha1\",\"shootIssuers\":{\"enabled\":true},\"kind\":\"CertConfig\"}",
                Object: nil,
            },
            Disabled: nil,
        },
    ]
to match elements: [shoot-dns-service].ProviderConfig:
	expected should equal actual

@akgalwas
Copy link
Contributor

akgalwas commented Oct 25, 2024

Observation number 1: Migrator app needs some adjustments to be more user friendly.

Problems:

  • Error handling: On Error Resume Next approach like in the good old Visual Basic makes a lot of confusion. The current implementation doesn't stop processing a runtime when error occurs, it continues even it doesn't make sense to do so.
  • Going through all the runtimes on Gardener. It is very confusing: I want to migrate one runtime but see dozens of those on output.
  • Migrator handles only create scenario. When Runtime CR already exists it will fail. We need to consider that.

@akgalwas
Copy link
Contributor

akgalwas commented Oct 25, 2024

Remaining differences in Openstack (changes in KIM required):

spec/extensions: Expected
    <[]v1beta1.Extension | len:3, cap:4>: [
        {
            Type: "shoot-dns-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.dns.extensions.gardener.cloud/v1alpha1\",\"kind\":\"DNSConfig\",\"dnsProviderReplication\":{\"enabled\":true},\"providers\":[{\"domains\":{\"include\":[\"ca6a4dd.dev.kyma.ondemand.com\"]},\"secretName\":\"aws-route53-secret-dev\",\"type\":\"aws-route53\"}],\"syncProvidersFromShootSpecDNS\":true}",
                Object: nil,
            },
            Disabled: nil,
        },
        {
            Type: "shoot-networking-filter",
            ProviderConfig: nil,
            Disabled: false,
        },
        {
            Type: "shoot-cert-service",
            ProviderConfig: {
                Raw: "{\"apiVersion\":\"service.cert.extensions.gardener.cloud/v1alpha1\",\"shootIssuers\":{\"enabled\":true},\"kind\":\"CertConfig\"}",
                Object: nil,
            },
            Disabled: nil,
        },
    ]
to match elements: [shoot-dns-service].ProviderConfig:
	expected should equal actual

@akgalwas
Copy link
Contributor

Observation number 3

Runtime migrated by the migrator contains exactly labels that are defined in the (PR). GardenerCluster CR has a bit more labels:

  • operator.kyma-project.io/managed-by
  • operator.kyma-project.io/internal
  • kyma-project.io/provider
  • kyma-project.io/platform-region

The set of labels on GardenerCluster is the same as on Kyma resource. It seems the difference is insignificant due to the following:

  • it could be handy to search by provider but broker-plan-name will be sufficient.
  • platformRegion is a property on the CR
  • managed-by has no real value (all CRs are managed by KIM)
  • internal has no real value

@tobiscr tobiscr closed this as completed Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Related to all activities around Kyma Control Plane kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

6 participants