update common to fix eso #88

day0hero · 2023-07-31T18:39:14Z

Updating common subtree to fix issue with eso seccomp profile.

@claudiol

* Updated namespaces template to include labels and annotations functionality * Added schema validation to support additional formal for labels and annotations * Updated the values-example.yaml to include new format for namespaces * Updated Changes.md to include new namespaces functionality. * Updating CI tests * Fixed Markdown errors * Add an experimental letsencypt chart This change adds an experimental letsencrypt chart that allows a pattern user/developer to have all routes and the API endpoint use signed certificates by letsencrypt. At this stage only AWS is supported. The full documentation is contained in the chart's README.md file * Do not run kubeconform on the certificate stuff just yet * Fix up kustomize example In the same vein as Industrial Edge 57f41dc135f72011d3796fe42d9cbf05d2b82052 we call kustomize build. Newer gitops versions dropped the openshift-clients rpm by default which contained kubectl. Let's just invoke "kustomize" directly as the binary is present in both old and new gitops versions Since "kubectl kustomize" builds the set of resources by default, we need to switch to "kubectl build" by default We also use the same naming conventions used in Industrial Edge while we're at it. * Upgrade vault-helm to v0.24.0 Tested on MCG with hub and spoke * Add a hello-world ansible playbook example Just a simple example that reads a helm value and puts it in a configmap * Inject ANSIBLE_CONFIG in make ansible-lint * Use new ansible-lint action * Fix some ansible-lint warnings * Fix up python versions * Skip cannot find role error Avoid checking those two playbooks the action seems to be too limited to understand where the ansible.cfg is * Added health check for pvc resource in argocd.yaml This allows argo to continue rolling out the rest of the applications. Without the health check the application is stuck in a progressing state and will not continue thus preventing any downstream application from deploying. * adding tests * Update super-linter image to latest * Update super-linter image to latest * Update CI workflows * updated template with why implemented comment * Add dependabot settings for github actions * adding tests * - Added functionality to support the following format for labels and annotations: labels: openshift.io/node-selector: "" annotations: openshift.io/cluster-monitoring: "true" * Fixed CI Issues * Applying @claudiol recommendation * make test * Avoid exited containers proliferation When running the `pattern.sh` script multiple times, a lot of podman exited containers will be left on the machine, adding `--rm` parameter to `podman run` makes podman automatically delete the exited containers leaving the machine cleaner. * Handling of pre-release builds is too complex for a helm chart Generating the ICSP and allowing insecure registries is best done prior to helm upgrade, and requires VPN access to registry-proxy.engineering.redhat.com * Fixing issues with operator groups * Adding CI test * Updated operator group template * Updating CI issues * Removed duplicate code for operatorgroup by using multiple conditions * Allow overriding the pattern's name This is especially useful when multiple people are working on a pattern an have been using different names: $ make help |grep Pattern: Pattern: multicloud-gitops $ make NAME=foobar help |grep Pattern: Pattern: foobar * Add precise instruction to upgrade the vault subchart * Upgrade vault-helm to v0.24.1 * Add an item to README.md * Fix up common/ tests * Fix super linter * Set gitOpsSpec.operatorSource After merging validatedpatterns/patterns-operator@235b303 it is now effectively possible to pick a different catalogSource for the gitops operator. This is needed in order to allow CI to install the gitops operator from an IIB * Introduce EXTRA_HELM_OPTS This variable can be set in order to pass additional helm arguments from the the command line. I.e. we can set things without having to tweak values files So it is now possible to run something like the following: ./pattern.sh make install \ EXTRA_HELM_OPTS="--set main.gitops.operatorSource=iib-49232" * Disable var-naming[no-role-prefix] in ansible lint * Add new ansible role to deal with IIBs * Simplify load-iib target * Add templates folder * Fix a couple of linting warnings * Fix some super-linter complaints * Skip the iib-ci playbook * Drop var-naming[no-role-prefix] linter * Allow for multiple images when calling load-iib * Add help for load-iib * Output index_image in make * Output index_image in make (2) * Set facts later in the playbook not in defaults/ * Fix how we export vars in make load-iib * Fix how we export vars in make load-iib (2) * Use machineCount to register the number of nodes that need to be ready * Add helpful debug messages * Add | on shell now that we call pipefail * Test dropping nevercontact source * Skip insecure tls when logging in * Also allow gchr.io * Revert "Test dropping nevercontact source" This reverts commit d8746a37fce2663018f52203c892f00b825e32a7. * Fix typo * Clarify instructions in the README file * Automate the channel example * Find out KUBEADMINAPI programmatically * Use command instead of shell * Do not grep for operator bundle unless it is the gitops operator * Also whitelist ghcr.io * Fetch the operator bundle itself in a more robust way It seems that the operator bundle image itself is nowhere to be found inside any OCP cluster object (it's not in packagemanifests nor catalogsource). Resorting to parsing the IIB via opm alpha commands to fetch the exact image. * Add more mirrors * Some more work to support MCE * Cleanup spacing * Fix super-linter * Move task in right folder * Drop last mention of operator instead of item * Improve the grepping for the operator bundle Without also grepping for the default_channel we can end up getting multiple results, which breaks everything. Tested this and it fixed the issue I was seeing with the openshift-gitops-operator this morning * Drop display_skipped_hosts display_skipped_hosts=False has a horrible side-effect: When a task takes a long time, it is always the *next* task and not the one printed on the screen/log. That is because ansible has to wait for the task to finish before printing it as it does not know before hand if the host will be skipped and hence the task should not be displayed at all * Be more specific about the steps in the README * Upgrade ESO to v0.8.2 * Update README.md * Update tests after eso 0.8.2 upgrade * Move to new spec format for dex/sso Via https://issues.redhat.com/browse/GITOPS-2761 we are told that the dex configuration has a new format. Old format: spec: dex: openShiftOAuth: true resources: ... New format: spec: sso: provider: dex dex: openShiftOAuth: true resources: ... This format is only supported starting with gitops-1.8.0, so we should merge this only when we are absolutely sure that no pattern in no situation needs an older gitops version. Tested on MCG with gitops-1.8.2 Note: with this change gitops < 1.8 is not supported. Starting with gitops-1.9 the old format will be unsupported. * Disable ArgoCD from kubeconform The reason is that most of the tools we used to generate the json schema, seem to be unmaintained, so it is getting hard to update our schemas in our GH org. We'll need to revisit this in the future. * Add a short line about username/token for the iib role on OCP <= 4.12 * Drop https:// from podman login Seems we hit https://www.github.com/containers/podman/issues/13691 at least with older podman versions. If this turns out to break podman 4.5.0 I will special case it later * Set the mce-subscription-spec annotation We set it by default to "redhat-operators" and if defined to .Values.clusterGroup.subscriptions.acm.source The reason we do this is the following: 1. In a default deployment scenario MCE has to be deployed as normal from the redhat-operators catalogSource just as ACM is 2. When we deploy gitops-operator from an IIB instead, MCE would be installed trying to get it from the IIB because https://www.github.com/stolostron/multiclusterhub-operator/pull/975 made it so that it picks the latest version looking at all catalog sources. But since we only mirrored the gitops operator in the cluster, this breaks as the images for MCE from the IIB are not there By setting the default to "redhat-operators" we fix this case 3. Now in the case where we want to install ACM from an IIB we need to be able to override this and we will pick whatever value is set in .Values.clusterGroup.subscriptions.acm.source, which will need to be defined for this to work when testing ACM+MCE from an IIB Note: Currently point 3. works only if you set it in a values file. Setting .Values.clusterGroup.subscriptions.acm.source via extraParams won't be passed down from the clusterGroup app to the applications. It's a bug that we need to fix. Note(2): We surround this with an 'if kindIs "map" .Values.clusterGroup.subscriptions' because we do not want to break things if subscription is a list and not a map. If we ever manage to drop subscriptions as list, then we can remove that if * Fix typo in README for iib * Simplify the README a bit * Add support for extraParams being passed down to all applications Via validatedpatterns/patterns-operator#74 we add the extraParams in an extraParametersNested dictionary that holds the extraParams key/value pairs. If they exist, let's add them as parameters. This allows them to end up in the applications. * Add a lookup playbook to figure out IIB numbers * Allow overriding channel and source when installing the patterns-operator This will allow us to test the patterns-operator using a different catalogsource (potentially installed via an IIB). So we can run: make EXTRA_HELM_OPTS="\ --set main.extraParameters[0].name=main.patternsOperator.channel --set main.extraParameters[0].value=slow \ --set main.extraParameters[1].name=main.patternsOperator.source --set main.extraParameters[1].value=patten-index" install * Fix small typo in iib instructions * Drop a redirect and up retries when pushing the IIB to the internal registry * Update ESO to v0.8.3 * WIP add presync for eso that waits for vault to be up * Add tests * Fix image and comment * Adding rbac to support the vault sa checking on the vault-0 pod status. * Make Test * Revert "Make Test" This reverts commit 64e9dc7. * Revert "Adding rbac to support the vault sa checking on the vault-0 pod status." This reverts commit 598bc74. * Revert "Fix image and comment" This reverts commit d4d3fe1. * Revert "Add tests" This reverts commit ab5532a. * Revert "WIP add presync for eso that waits for vault to be up" This reverts commit 2797699. * Increase the default retry limit when syncing ArgoCD will retry 5 times by default to sync an application in case of errors and then will give up. So if an application contains a reference to a CRD that has not been installed yet (say because it will be installed by another application), it will error out and retry later. This happens by default for a maximum of 5 times [1]. After those 5 times the application will give up and will stay in Degraded moded and eventually move to Failed. In this case a manual sync will usually fix the application just fine (i.e. as long as the missing CRD has been installed in the meantime). Now to solve this issue we can add complex preSync Jobs that wait for the needed resources, but this fundamentally breaks the simplicity of things and introduces unneeded dependencies. In this change we just increase the default retry limit to something larger (20) that should cover most cases. The retry limit functionality is rather undocumented currently in the docs but is defined at [2] and also shown at [3]. In our patterns' case the concrete issue happened as follows: 1. ESO ClusterSecrets were often not synced/degraded 2. We introduced a Job in a preSync hook for the ESO chart that would wait on vault to be ready before applying the rest of ESO 3. MCG started failing because the config-demo app had already tried to sync 5 times and failed everytime because the ESO CRDs were not installed yet (due to ESO waiting on vault) So instead of adding yet another job, let's just try a lot more often. We picked 20 as a sane default because that should have argo try for about 60 minutes (3min is the default maximum backoff limit) Tested with two MCG installations (with the ESO Job hook included) and both worked out of the box. Whereas before I managed to get three failures out of three installs. [1] https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L1680 [2] https://github.com/argoproj/argo-cd/blob/master/manifests/crds/application-crd.yaml#L1476 [3] https://github.com/argoproj/argo-cd/blob/master/docs/operator-manual/application.yaml#L202C18-L202C100 * Add Changes.md entry * Fix up tests after common rebase --------- Co-authored-by: Lester Claudio <claudiol@redhat.com> Co-authored-by: day0hero <jonny@redhat.com> Co-authored-by: Lorenzo Dalrio <ldalrio@redhat.com> Co-authored-by: Andrew Beekhof <andrew@beekhof.net> Co-authored-by: Martin Jackson <mhjacks@redhat.com> Co-authored-by: jonny <65790298+day0hero@users.noreply.github.com>

@claudiol

* Updated namespaces template to include labels and annotations functionality * Added schema validation to support additional formal for labels and annotations * Updated the values-example.yaml to include new format for namespaces * Updated Changes.md to include new namespaces functionality. * Updating CI tests * Fixed Markdown errors * Add an experimental letsencypt chart This change adds an experimental letsencrypt chart that allows a pattern user/developer to have all routes and the API endpoint use signed certificates by letsencrypt. At this stage only AWS is supported. The full documentation is contained in the chart's README.md file * Do not run kubeconform on the certificate stuff just yet * Fix up kustomize example In the same vein as Industrial Edge 57f41dc135f72011d3796fe42d9cbf05d2b82052 we call kustomize build. Newer gitops versions dropped the openshift-clients rpm by default which contained kubectl. Let's just invoke "kustomize" directly as the binary is present in both old and new gitops versions Since "kubectl kustomize" builds the set of resources by default, we need to switch to "kubectl build" by default We also use the same naming conventions used in Industrial Edge while we're at it. * Upgrade vault-helm to v0.24.0 Tested on MCG with hub and spoke * Add a hello-world ansible playbook example Just a simple example that reads a helm value and puts it in a configmap * Inject ANSIBLE_CONFIG in make ansible-lint * Use new ansible-lint action * Fix some ansible-lint warnings * Fix up python versions * Skip cannot find role error Avoid checking those two playbooks the action seems to be too limited to understand where the ansible.cfg is * Added health check for pvc resource in argocd.yaml This allows argo to continue rolling out the rest of the applications. Without the health check the application is stuck in a progressing state and will not continue thus preventing any downstream application from deploying. * adding tests * Update super-linter image to latest * Update super-linter image to latest * Update CI workflows * updated template with why implemented comment * Add dependabot settings for github actions * adding tests * - Added functionality to support the following format for labels and annotations: labels: openshift.io/node-selector: "" annotations: openshift.io/cluster-monitoring: "true" * Fixed CI Issues * Applying @claudiol recommendation * make test * Avoid exited containers proliferation When running the `pattern.sh` script multiple times, a lot of podman exited containers will be left on the machine, adding `--rm` parameter to `podman run` makes podman automatically delete the exited containers leaving the machine cleaner. * Handling of pre-release builds is too complex for a helm chart Generating the ICSP and allowing insecure registries is best done prior to helm upgrade, and requires VPN access to registry-proxy.engineering.redhat.com * Fixing issues with operator groups * Adding CI test * Updated operator group template * Updating CI issues * Removed duplicate code for operatorgroup by using multiple conditions * Allow overriding the pattern's name This is especially useful when multiple people are working on a pattern an have been using different names: $ make help |grep Pattern: Pattern: multicloud-gitops $ make NAME=foobar help |grep Pattern: Pattern: foobar * Add precise instruction to upgrade the vault subchart * Upgrade vault-helm to v0.24.1 * Add an item to README.md * Fix up common/ tests * Fix super linter * Set gitOpsSpec.operatorSource After merging validatedpatterns/patterns-operator@235b303 it is now effectively possible to pick a different catalogSource for the gitops operator. This is needed in order to allow CI to install the gitops operator from an IIB * Introduce EXTRA_HELM_OPTS This variable can be set in order to pass additional helm arguments from the the command line. I.e. we can set things without having to tweak values files So it is now possible to run something like the following: ./pattern.sh make install \ EXTRA_HELM_OPTS="--set main.gitops.operatorSource=iib-49232" * Disable var-naming[no-role-prefix] in ansible lint * Add new ansible role to deal with IIBs * Simplify load-iib target * Add templates folder * Fix a couple of linting warnings * Fix some super-linter complaints * Skip the iib-ci playbook * Drop var-naming[no-role-prefix] linter * Allow for multiple images when calling load-iib * Add help for load-iib * Output index_image in make * Output index_image in make (2) * Set facts later in the playbook not in defaults/ * Fix how we export vars in make load-iib * Fix how we export vars in make load-iib (2) * Use machineCount to register the number of nodes that need to be ready * Add helpful debug messages * Add | on shell now that we call pipefail * Test dropping nevercontact source * Skip insecure tls when logging in * Also allow gchr.io * Revert "Test dropping nevercontact source" This reverts commit d8746a37fce2663018f52203c892f00b825e32a7. * Fix typo * Clarify instructions in the README file * Automate the channel example * Find out KUBEADMINAPI programmatically * Use command instead of shell * Do not grep for operator bundle unless it is the gitops operator * Also whitelist ghcr.io * Fetch the operator bundle itself in a more robust way It seems that the operator bundle image itself is nowhere to be found inside any OCP cluster object (it's not in packagemanifests nor catalogsource). Resorting to parsing the IIB via opm alpha commands to fetch the exact image. * Add more mirrors * Some more work to support MCE * Cleanup spacing * Fix super-linter * Move task in right folder * Drop last mention of operator instead of item * Improve the grepping for the operator bundle Without also grepping for the default_channel we can end up getting multiple results, which breaks everything. Tested this and it fixed the issue I was seeing with the openshift-gitops-operator this morning * Drop display_skipped_hosts display_skipped_hosts=False has a horrible side-effect: When a task takes a long time, it is always the *next* task and not the one printed on the screen/log. That is because ansible has to wait for the task to finish before printing it as it does not know before hand if the host will be skipped and hence the task should not be displayed at all * Be more specific about the steps in the README * Upgrade ESO to v0.8.2 * Update README.md * Update tests after eso 0.8.2 upgrade * Move to new spec format for dex/sso Via https://issues.redhat.com/browse/GITOPS-2761 we are told that the dex configuration has a new format. Old format: spec: dex: openShiftOAuth: true resources: ... New format: spec: sso: provider: dex dex: openShiftOAuth: true resources: ... This format is only supported starting with gitops-1.8.0, so we should merge this only when we are absolutely sure that no pattern in no situation needs an older gitops version. Tested on MCG with gitops-1.8.2 Note: with this change gitops < 1.8 is not supported. Starting with gitops-1.9 the old format will be unsupported. * Disable ArgoCD from kubeconform The reason is that most of the tools we used to generate the json schema, seem to be unmaintained, so it is getting hard to update our schemas in our GH org. We'll need to revisit this in the future. * Add a short line about username/token for the iib role on OCP <= 4.12 * Drop https:// from podman login Seems we hit https://www.github.com/containers/podman/issues/13691 at least with older podman versions. If this turns out to break podman 4.5.0 I will special case it later * Set the mce-subscription-spec annotation We set it by default to "redhat-operators" and if defined to .Values.clusterGroup.subscriptions.acm.source The reason we do this is the following: 1. In a default deployment scenario MCE has to be deployed as normal from the redhat-operators catalogSource just as ACM is 2. When we deploy gitops-operator from an IIB instead, MCE would be installed trying to get it from the IIB because https://www.github.com/stolostron/multiclusterhub-operator/pull/975 made it so that it picks the latest version looking at all catalog sources. But since we only mirrored the gitops operator in the cluster, this breaks as the images for MCE from the IIB are not there By setting the default to "redhat-operators" we fix this case 3. Now in the case where we want to install ACM from an IIB we need to be able to override this and we will pick whatever value is set in .Values.clusterGroup.subscriptions.acm.source, which will need to be defined for this to work when testing ACM+MCE from an IIB Note: Currently point 3. works only if you set it in a values file. Setting .Values.clusterGroup.subscriptions.acm.source via extraParams won't be passed down from the clusterGroup app to the applications. It's a bug that we need to fix. Note(2): We surround this with an 'if kindIs "map" .Values.clusterGroup.subscriptions' because we do not want to break things if subscription is a list and not a map. If we ever manage to drop subscriptions as list, then we can remove that if * Fix typo in README for iib * Simplify the README a bit * Add support for extraParams being passed down to all applications Via validatedpatterns/patterns-operator#74 we add the extraParams in an extraParametersNested dictionary that holds the extraParams key/value pairs. If they exist, let's add them as parameters. This allows them to end up in the applications. * Add a lookup playbook to figure out IIB numbers * Allow overriding channel and source when installing the patterns-operator This will allow us to test the patterns-operator using a different catalogsource (potentially installed via an IIB). So we can run: make EXTRA_HELM_OPTS="\ --set main.extraParameters[0].name=main.patternsOperator.channel --set main.extraParameters[0].value=slow \ --set main.extraParameters[1].name=main.patternsOperator.source --set main.extraParameters[1].value=patten-index" install * Fix small typo in iib instructions * Drop a redirect and up retries when pushing the IIB to the internal registry * Update ESO to v0.8.3 * WIP add presync for eso that waits for vault to be up * Add tests * Fix image and comment * Adding rbac to support the vault sa checking on the vault-0 pod status. * Make Test * Removed previous version of common to convert to subtree from https://github.com/hybrid-cloud-patterns/common.git main * updated script to check for new status * make test * make test and remove presync checks for eso * make test * make test --------- Co-authored-by: Lester Claudio <claudiol@redhat.com> Co-authored-by: Michele Baldessari <michele@acksyn.org> Co-authored-by: Lorenzo Dalrio <ldalrio@redhat.com> Co-authored-by: Andrew Beekhof <andrew@beekhof.net> Co-authored-by: Martin Jackson <mhjacks@redhat.com>

@claudiol

* Updated namespaces template to include labels and annotations functionality * Added schema validation to support additional formal for labels and annotations * Updated the values-example.yaml to include new format for namespaces * Updated Changes.md to include new namespaces functionality. * Updating CI tests * Fixed Markdown errors * Add an experimental letsencypt chart This change adds an experimental letsencrypt chart that allows a pattern user/developer to have all routes and the API endpoint use signed certificates by letsencrypt. At this stage only AWS is supported. The full documentation is contained in the chart's README.md file * Do not run kubeconform on the certificate stuff just yet * Fix up kustomize example In the same vein as Industrial Edge 57f41dc135f72011d3796fe42d9cbf05d2b82052 we call kustomize build. Newer gitops versions dropped the openshift-clients rpm by default which contained kubectl. Let's just invoke "kustomize" directly as the binary is present in both old and new gitops versions Since "kubectl kustomize" builds the set of resources by default, we need to switch to "kubectl build" by default We also use the same naming conventions used in Industrial Edge while we're at it. * Upgrade vault-helm to v0.24.0 Tested on MCG with hub and spoke * Add a hello-world ansible playbook example Just a simple example that reads a helm value and puts it in a configmap * Inject ANSIBLE_CONFIG in make ansible-lint * Use new ansible-lint action * Fix some ansible-lint warnings * Fix up python versions * Skip cannot find role error Avoid checking those two playbooks the action seems to be too limited to understand where the ansible.cfg is * Added health check for pvc resource in argocd.yaml This allows argo to continue rolling out the rest of the applications. Without the health check the application is stuck in a progressing state and will not continue thus preventing any downstream application from deploying. * adding tests * Update super-linter image to latest * Update super-linter image to latest * Update CI workflows * updated template with why implemented comment * Add dependabot settings for github actions * adding tests * - Added functionality to support the following format for labels and annotations: labels: openshift.io/node-selector: "" annotations: openshift.io/cluster-monitoring: "true" * Fixed CI Issues * Applying @claudiol recommendation * make test * Avoid exited containers proliferation When running the `pattern.sh` script multiple times, a lot of podman exited containers will be left on the machine, adding `--rm` parameter to `podman run` makes podman automatically delete the exited containers leaving the machine cleaner. * Handling of pre-release builds is too complex for a helm chart Generating the ICSP and allowing insecure registries is best done prior to helm upgrade, and requires VPN access to registry-proxy.engineering.redhat.com * Fixing issues with operator groups * Adding CI test * Updated operator group template * Updating CI issues * Removed duplicate code for operatorgroup by using multiple conditions * Allow overriding the pattern's name This is especially useful when multiple people are working on a pattern an have been using different names: $ make help |grep Pattern: Pattern: multicloud-gitops $ make NAME=foobar help |grep Pattern: Pattern: foobar * Add precise instruction to upgrade the vault subchart * Upgrade vault-helm to v0.24.1 * Add an item to README.md * Fix up common/ tests * Fix super linter * Set gitOpsSpec.operatorSource After merging validatedpatterns/patterns-operator@235b303 it is now effectively possible to pick a different catalogSource for the gitops operator. This is needed in order to allow CI to install the gitops operator from an IIB * Introduce EXTRA_HELM_OPTS This variable can be set in order to pass additional helm arguments from the the command line. I.e. we can set things without having to tweak values files So it is now possible to run something like the following: ./pattern.sh make install \ EXTRA_HELM_OPTS="--set main.gitops.operatorSource=iib-49232" * Disable var-naming[no-role-prefix] in ansible lint * Add new ansible role to deal with IIBs * Simplify load-iib target * Add templates folder * Fix a couple of linting warnings * Fix some super-linter complaints * Skip the iib-ci playbook * Drop var-naming[no-role-prefix] linter * Allow for multiple images when calling load-iib * Add help for load-iib * Output index_image in make * Output index_image in make (2) * Set facts later in the playbook not in defaults/ * Fix how we export vars in make load-iib * Fix how we export vars in make load-iib (2) * Use machineCount to register the number of nodes that need to be ready * Add helpful debug messages * Add | on shell now that we call pipefail * Test dropping nevercontact source * Skip insecure tls when logging in * Also allow gchr.io * Revert "Test dropping nevercontact source" This reverts commit d8746a37fce2663018f52203c892f00b825e32a7. * Fix typo * Clarify instructions in the README file * Automate the channel example * Find out KUBEADMINAPI programmatically * Use command instead of shell * Do not grep for operator bundle unless it is the gitops operator * Also whitelist ghcr.io * Fetch the operator bundle itself in a more robust way It seems that the operator bundle image itself is nowhere to be found inside any OCP cluster object (it's not in packagemanifests nor catalogsource). Resorting to parsing the IIB via opm alpha commands to fetch the exact image. * Add more mirrors * Some more work to support MCE * Cleanup spacing * Fix super-linter * Move task in right folder * Drop last mention of operator instead of item * Improve the grepping for the operator bundle Without also grepping for the default_channel we can end up getting multiple results, which breaks everything. Tested this and it fixed the issue I was seeing with the openshift-gitops-operator this morning * Drop display_skipped_hosts display_skipped_hosts=False has a horrible side-effect: When a task takes a long time, it is always the *next* task and not the one printed on the screen/log. That is because ansible has to wait for the task to finish before printing it as it does not know before hand if the host will be skipped and hence the task should not be displayed at all * Be more specific about the steps in the README * Upgrade ESO to v0.8.2 * Update README.md * Update tests after eso 0.8.2 upgrade * Move to new spec format for dex/sso Via https://issues.redhat.com/browse/GITOPS-2761 we are told that the dex configuration has a new format. Old format: spec: dex: openShiftOAuth: true resources: ... New format: spec: sso: provider: dex dex: openShiftOAuth: true resources: ... This format is only supported starting with gitops-1.8.0, so we should merge this only when we are absolutely sure that no pattern in no situation needs an older gitops version. Tested on MCG with gitops-1.8.2 Note: with this change gitops < 1.8 is not supported. Starting with gitops-1.9 the old format will be unsupported. * Disable ArgoCD from kubeconform The reason is that most of the tools we used to generate the json schema, seem to be unmaintained, so it is getting hard to update our schemas in our GH org. We'll need to revisit this in the future. * Add a short line about username/token for the iib role on OCP <= 4.12 * Drop https:// from podman login Seems we hit https://www.github.com/containers/podman/issues/13691 at least with older podman versions. If this turns out to break podman 4.5.0 I will special case it later * Set the mce-subscription-spec annotation We set it by default to "redhat-operators" and if defined to .Values.clusterGroup.subscriptions.acm.source The reason we do this is the following: 1. In a default deployment scenario MCE has to be deployed as normal from the redhat-operators catalogSource just as ACM is 2. When we deploy gitops-operator from an IIB instead, MCE would be installed trying to get it from the IIB because https://www.github.com/stolostron/multiclusterhub-operator/pull/975 made it so that it picks the latest version looking at all catalog sources. But since we only mirrored the gitops operator in the cluster, this breaks as the images for MCE from the IIB are not there By setting the default to "redhat-operators" we fix this case 3. Now in the case where we want to install ACM from an IIB we need to be able to override this and we will pick whatever value is set in .Values.clusterGroup.subscriptions.acm.source, which will need to be defined for this to work when testing ACM+MCE from an IIB Note: Currently point 3. works only if you set it in a values file. Setting .Values.clusterGroup.subscriptions.acm.source via extraParams won't be passed down from the clusterGroup app to the applications. It's a bug that we need to fix. Note(2): We surround this with an 'if kindIs "map" .Values.clusterGroup.subscriptions' because we do not want to break things if subscription is a list and not a map. If we ever manage to drop subscriptions as list, then we can remove that if * Fix typo in README for iib * Simplify the README a bit * Add support for extraParams being passed down to all applications Via validatedpatterns/patterns-operator#74 we add the extraParams in an extraParametersNested dictionary that holds the extraParams key/value pairs. If they exist, let's add them as parameters. This allows them to end up in the applications. * Add a lookup playbook to figure out IIB numbers * Allow overriding channel and source when installing the patterns-operator This will allow us to test the patterns-operator using a different catalogsource (potentially installed via an IIB). So we can run: make EXTRA_HELM_OPTS="\ --set main.extraParameters[0].name=main.patternsOperator.channel --set main.extraParameters[0].value=slow \ --set main.extraParameters[1].name=main.patternsOperator.source --set main.extraParameters[1].value=patten-index" install * Fix small typo in iib instructions * Drop a redirect and up retries when pushing the IIB to the internal registry * Update ESO to v0.8.3 * WIP add presync for eso that waits for vault to be up * Add tests * Fix image and comment * Adding rbac to support the vault sa checking on the vault-0 pod status. * Make Test * Revert "Make Test" This reverts commit 64e9dc7. * Revert "Adding rbac to support the vault sa checking on the vault-0 pod status." This reverts commit 598bc74. * Revert "Fix image and comment" This reverts commit d4d3fe1. * Revert "Add tests" This reverts commit ab5532a. * Revert "WIP add presync for eso that waits for vault to be up" This reverts commit 2797699. * Increase the default retry limit when syncing ArgoCD will retry 5 times by default to sync an application in case of errors and then will give up. So if an application contains a reference to a CRD that has not been installed yet (say because it will be installed by another application), it will error out and retry later. This happens by default for a maximum of 5 times [1]. After those 5 times the application will give up and will stay in Degraded moded and eventually move to Failed. In this case a manual sync will usually fix the application just fine (i.e. as long as the missing CRD has been installed in the meantime). Now to solve this issue we can add complex preSync Jobs that wait for the needed resources, but this fundamentally breaks the simplicity of things and introduces unneeded dependencies. In this change we just increase the default retry limit to something larger (20) that should cover most cases. The retry limit functionality is rather undocumented currently in the docs but is defined at [2] and also shown at [3]. In our patterns' case the concrete issue happened as follows: 1. ESO ClusterSecrets were often not synced/degraded 2. We introduced a Job in a preSync hook for the ESO chart that would wait on vault to be ready before applying the rest of ESO 3. MCG started failing because the config-demo app had already tried to sync 5 times and failed everytime because the ESO CRDs were not installed yet (due to ESO waiting on vault) So instead of adding yet another job, let's just try a lot more often. We picked 20 as a sane default because that should have argo try for about 60 minutes (3min is the default maximum backoff limit) Tested with two MCG installations (with the ESO Job hook included) and both worked out of the box. Whereas before I managed to get three failures out of three installs. [1] https://github.com/argoproj/argo-cd/blob/master/controller/appcontroller.go#L1680 [2] https://github.com/argoproj/argo-cd/blob/master/manifests/crds/application-crd.yaml#L1476 [3] https://github.com/argoproj/argo-cd/blob/master/docs/operator-manual/application.yaml#L202C18-L202C100 * Add Changes.md entry * Split off global helm variables to a helper definition We can only split out bits of yaml that reference $.* variables. This is because these sinppets in _helpers.tbl are passed a single context either $ or . and cannot use both like the top-level domain. * Switch ApplicationSets to use the newly-introduced helpers I only remove the variables that are defined identically in ApplicationSet and in the helper. Leaving the other ones as is as their presence is not entirely clear to me and I do not want to risk breaking things. * Split off valueFiles to _helpers.tbl * Switch applicationsets to use the new helper * Drop some older comments * Tweak the load secret debug message to be clearer When HOME is set we replace it with '~' in this debug message because when run from inside the container the HOME is /pattern-home which is confusing for users. Printing out '~' when at the start of the string is less confusing. Before: ok: [localhost] => { "msg": "/home/michele/.config/hybrid-cloud-patterns/values-secret-multicloud-gitops.yaml" } After: ok: [localhost] => { "msg": "~/.config/hybrid-cloud-patterns/values-secret-multicloud-gitops.yaml" } * Check if the KUBECONFIG file is pointing outside of the HOME folder If it is somewhere under /tmp or out of the HOME folder, bail out explaining why. This has caused a few silly situations where the user would save the KUBECONFIG file under /tmp. Since bind-mounting /tmp seems like a wrong thing to do in general, we at least bail out with a clear error message. To do this we rely on a bash functionality so let's just switch the script to use that. Tested as follows: export KUBECONFIG=/tmp/kubeconfig ./scripts/pattern-util.sh make help /tmp/kubeconfig is pointing outside of the HOME folder, this will make it unavailable from the container. Please move it somewhere inside your /home/michele folder, as that is what gets bind-mounted inside the container export KUBECONFIG=~/kubeconfig ./scripts/pattern-util.sh make help Pattern: common Usage: make <target> ... * Include an example SNO cluster pool in the tests * Enforce lowercase names for cluster claims * Avoid mixing yaml and json in the OCP install-config * Update provisioning tests * Sanely handle cluster pools with no clusters (yet) * Clustergroup Chart.yaml name change We currently have a small inconsistency where we use common/clustergroup in order to point Argo CD to this chart, but the name inside the chart is 'pattern-clustergroup'. This inconsistency is currently irrelevant, but in the future when migrating to helm charts inside proper helm repos, this becomes problematic. So let's fix the name to be the same as the folder. Tested on MCG successfully. * Fix the clusterPoolName in clusterClaims Currently with the following values snippet: managedClusterGroups: exampleRegion: name: group-one acmlabels: - name: clusterGroup value: group-one helmOverrides: - name: clusterGroup.isHubCluster value: false clusterPools: exampleAWSPool: size: 1 name: aws-ap-bandini openshiftVersion: 4.12.24 baseDomain: blueprints.rhecoeng.com controlPlane: count: 1 platform: aws: type: m5.2xlarge workers: count: 0 platform: aws: region: ap-southeast-2 clusters: - One You will get a clusterClaim that is pointing to the wrong Pool: NAMESPACE NAME POOL open-cluster-management one-group-one aws-ap-bandini This is wrong because the clusterPool name will be generated using the pool name + "-" group name: {{- $pool := . }} {{- $poolName := print .name "-" $group.name }} But the clusterPoolName inside the clusterName is only using the "$pool.name" which will make the clusterClaim ineffective as the pool does not exist. Switch to using the same poolName that is being used when creating the clusterPool. * Add some comments to make if/else and loops clearer Let's improve readability by adding some comments to point out which flow constructs are being ended. * Add some more comments in applications.yaml * Add a default for options applicationRetryLimit * Split out values files to a helper for the acm chart Just like we did for the clustergroup chart, let's split the values file list into a dedicated helper. This time since there are no global variables we include it with the current context and not with the '$' context. Tested with MCG: hub and spoke. Correctly observed all the applications running on the spoke. * Fix up tests They changed because we made the list indentation more correct (two extra spaces to the left) * Fix sa/namespace mixup in vault_spokes_init * Update local patch Also set seccompProfile to null to make things work on OCP 4.10 * Update ESO to 0.8.5 * Tweak ESO UBI images Tested the ESO upgrade on MCG on both 4.10 and 4.13 * Removed previous version of common to convert to subtree from https://github.com/hybrid-cloud-patterns/common.git main * make test --------- Co-authored-by: Lester Claudio <claudiol@redhat.com> Co-authored-by: Michele Baldessari <michele@acksyn.org> Co-authored-by: Lorenzo Dalrio <ldalrio@redhat.com> Co-authored-by: Andrew Beekhof <andrew@beekhof.net> Co-authored-by: Martin Jackson <mhjacks@redhat.com> Co-authored-by: Tom Stockwell <2060486+stocky37@users.noreply.github.com>

mbaldessari and others added 3 commits July 8, 2023 18:42

day0hero closed this Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update common to fix eso #88

update common to fix eso #88

day0hero commented Jul 31, 2023

update common to fix eso #88

update common to fix eso #88

Conversation

day0hero commented Jul 31, 2023