Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common automatic update #31

Merged
merged 148 commits into from
Dec 18, 2023

Conversation

mbaldessari
Copy link
Contributor

  • Introduce an argo-healthcheck make target
  • Adding label validatedpatterns.io/pattern to all applications.
  • CI test updates
  • Rework installation target
  • Simplify the loop
  • Introduce a validate-cluster target in the install target
  • Increase the wait for the internal registry
  • Add a note about SNOs and internal registries
  • Move from resourceCustomization to resourceHealthcheck
  • Fix up common/ tests
  • Upgrade to ESO 0.9.5
  • Release 0.0.3 golang-external-secrets
  • Release 0.0.3 clustergroup
  • Allow custom templating in .extraValueFiles
  • Support pattern-wide templated value files
  • Update tests
  • Pass in platform and ocp version as charts would expect
  • Add test for value file template expansion
  • Drop the Template.{Name,BasePath} hack due to problems with the imperative configmap
  • Fix up tests after last PR
  • Release clustergroup v0.0.4
  • Add --pull=newer when running the container
  • Allow imperative to be nil
  • Adding key to exclude target namespace in operatorgroup
  • Added target namespace logic to namespace map case
  • Changed description in schema
  • Added example to operatorgroupExcludeTargetNS
  • Fixing CI tests
  • Push localClusterName to remote clusters too
  • Preview a chart based on the current k8s cluster
  • Handle explcit value files
  • Update CRD for the operator
  • Add a README containing the CRD update instructions
  • Add ability to read overrides
  • Clean up tests after 7cda9c4
  • Add preview-all and remove some spurious stdout output
  • All prototype preview-all and silence some output
    • Removed new key operatorgroupExludeTargetNS - Added key to namespace map entry excludeOperatorGroupTargetNS.
  • Updates to CI
  • Update CRD from the operator
  • Make .plugin handling consistent
  • Preseed the patterns-operator-config configmap
  • Small IIB cleanups
  • Add small curl example for IIB
  • Adding option to include/exclude targetNamespaces in OperatorGroup
  • Updated CI tests
    • Fix: bug in task TASK [iib_ci : Mirror all the needed images]
    • Updated the mirrordest_tag to use the sha256 of the image instead of the IIB number.
  • Restored mirror template to original implementation
    • Updated structure for supporting OperatorGroup's per suggestion of decoupling operatorGroup and targetNamespaces. Example: - exclude-targetns: operatorGroup: true targetNamespaces: - Continues to support operatorgroupExcludes - Updated CI tests
  • Update logic to fix multiple targetNamespaces
  • Fix ci issues
  • Upgraded ESO to v0.9.8
  • Upgrade vault-helm to v0.26.1
  • Parametrize ESO caProvider fields
  • Simplify target namespace logic
  • Avoid nonhubCluster + hubCluster naming for ESO
  • Update for new configmanagement plugin feature
  • Remove obsolete comment and update tests
  • Update schema
  • Require plugin.yaml
  • Add tmpdir to sidecar mounts
  • True up to test code
  • Use nindent as appropriate
  • Remove stray files
  • Plugin config is plugin.yaml
  • Remove now-obsolete kustomize-renderer example
  • Allow pluginArgs to be set and add schema
  • Remove redundancy
  • Revert "Remove now-obsolete kustomize-renderer example"
  • Remove legacy configManagementPlugins support
  • Add configManagementPlugins to tests for industrial edge
  • Clustergroup 0.0.5
  • Small whitespace test
  • Stop referencing remote actions via @main. Use a specific commit
  • Updated ESO to v0.9.9
  • Updated vault-helm to v0.27.0
  • Prevent ArgoCD from writing ESO CRs to clusters that need full support
  • Fix whitespaces
  • Release clustergroup v0.8.0
  • Document preview limitations
  • Add support for private repos
  • Amend tests
  • Check for rc attribute to exist
  • Upgrade default imperative image
  • Release clustergroup v0.8.1
  • Update pattern operator CRD
  • Update CRD from the operator
  • Bump actions/setup-python from 4 to 5
  • Release clustergroup v0.8.2
  • Update CRD from the operator
  • Small clarification in IIB
  • Switch imageDigestMirrors to AllowContactingSource
  • Upgrade ESO to v0.9.10
  • Add initial support for deploying private repos via CLI directly
  • Update tests after common rebase

mbaldessari and others added 30 commits August 23, 2023 16:53
This is a simple quick check to see if all argo applications in all
namespaces are synced and error out if they are not.

Synced example:

    $ make argo-healthcheck
    make -f common/Makefile argo-healthcheck
    make[1]: Entering directory '/home/michele/Engineering/cloud-patterns/multicloud-gitops'
    Checking argo applications
    mcg-private-hub acm -> Sync: Synced - Health: Healthy
    mcg-private-hub config-demo -> Sync: Synced - Health: Healthy
    mcg-private-hub golang-external-secrets -> Sync: Synced - Health: Healthy
    mcg-private-hub hello-world -> Sync: Synced - Health: Healthy
    mcg-private-hub vault -> Sync: Synced - Health: Healthy
    openshift-gitops mcg-private-hub -> Sync: Synced - Health: Healthy
    make[1]: Leaving directory '/home/michele/Engineering/cloud-patterns/multicloud-gitops'

Not synced example:

    $ make argo-healthcheck
    make -f common/Makefile argo-healthcheck
    make[1]: Entering directory '/home/michele/Engineering/cloud-patterns/multicloud-gitops'
    Checking argo applications
    mcg-private-hub acm -> Sync: Synced - Health: Healthy
    mcg-private-hub config-demo -> Sync: Synced - Health: Degraded
    mcg-private-hub golang-external-secrets -> Sync: Synced - Health: Healthy
    mcg-private-hub hello-world -> Sync: Synced - Health: Healthy
    mcg-private-hub vault -> Sync: Synced - Health: Progressing
    openshift-gitops mcg-private-hub -> Sync: Synced - Health: Healthy
    Some applications are not synced or are unhealthy
    make[1]: *** [common/Makefile:115: argo-healthcheck] Error 1
    make[1]: Leaving directory '/home/michele/Engineering/cloud-patterns/multicloud-gitops'
    make: *** [Makefile:12: argo-healthcheck] Error 2
Adding label validatedpatterns.io/pattern to all applications.
Drop the old logic and just install the CRD via OC and use helm template
for the rest.

The rationale is that helm install is very picky whenever it encounters
things that already exist. We have three potential scenarios at work
here:
A) User installs operator+pattern via CLI and updates the pattern via
   CLI (this worked before this change as well)
B) User installs operator+pattern via UI but runs updates (changing
   branch for example) via CLI (this worked before this change as well)
C) User installs only the operator via UI. Installs and updated the
   pattern via CLI. This was broken before this change. The error you'd
   get was:
   ```
   ./pattern.sh make install
   ...
   https://github.com/mbaldessari/multicloud-gitops.git - branch main: Running inside a container: Skipping git ssh checks
   + oc get crds patterns.gitops.hybrid-cloud-patterns.io
   + echo 'Reapplying helm chart:'
   Reapplying helm chart:
   + helm template --name-template multicloud-gitops common/operator-install/ -f values-global.yaml --set main.git.repoURL=https://github.com/mbaldessari/multicloud-gitops.git --set main.git.revision=main
   + oc apply set-last-applied --create-annotation -f-
   WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/michele/sno1-kubeconfig
   Error from server (NotFound): patterns.gitops.hybrid-cloud-patterns.io "multicloud-gitops" not found
   ```

With this change we simplify the process and we forcefully apply/install
the CRD for patterns via the oc command. And then we simply template out
the operator-install chart and oc apply it. We retry it a few times,
because the CRD might not yet be fully registered in the cluster.

Tested with on the A), B) and C) scenarios successfully.
Install the CRD inside the loop to simplify the code a bit
The validate-cluster target will be in charge of doing some sanity check
on the cluster. Initially we just check the connection to the cluster
and that at least one storageclass is available to the cluster.

Tested as follows:
1) Cluster with a storageclass (LVM in my case)

    $ make validate-cluster
    Checking cluster:
      cluster-info: OK
      storageclass: OK

2) Cluster without a storageclass:

    $ make validate-cluster
    Checking cluster:
      cluster-info: OK
      storageclass: None Found
    make: *** [Makefile:99: validate-cluster] Error 1
Introduce a validate-cluster target in the install target
Our resourceCustomization is currently giving the following warning:

    Warning  DeprecationNotice  27m         ResourceCustomizations is
    deprecated, please use the new formats `ResourceHealthChecks`,
    `ResourceIgnoreDifferences`, and `Resource Actions` instead.

This actually becomes a problem with gitops-1.10 because it dropped
support for v1alpha versions of argoCD (it upgrades them automatically
to v1beta). So the cluster-wide argo app which is in charge of creating
the namespaced argoCD instance will always be OutOfSync as it will never
be able to set the `resourceCustomization` field.

Move to resourceHealthcheck which is the new supported way. This is also
backwards compatible with gitops-1.8.

Tested as follows:
1. Deployed 4.13 with gitops-1.10 and observed the multicloud-gitops-hub
   being OutOfSync
2. Applied this patch and observed it going to green and sync correctly
3. Tested this on gitops-1.8.5 on 4.13 and deployed MCG correctly with
   all apps becoming green everywhere.

Fixes: validatedpatterns/common#367
Move from resourceCustomization to resourceHealthcheck
Introduce an argo-healthcheck make target
mbaldessari and others added 29 commits November 29, 2023 15:29
This needs the corresponding PR from the operator
(https://www.github.com/validatedpatterns/patterns-operator/pull/139).
The way it works is that if "global.privateRepo" is set to true, we
add an acm policy on the hub only that reads the secret from the openshift-gitops
namespaces and copies it to the open-cluster-manager. And then we use
another policy that pushes the secret just copied to
open-cluster-managemnt to the openshift-gitops + pattern-name-group-one namespaces
so that the two argo instances can consume the private repositories.

Tested end to end with both https and ssh private repository.
Sometimes CI would error out with:

    [localhost]: FAILED! => {"msg": "The conditional check
    'vault_role_cmd.rc == 0' failed. The error was: error while evaluating
    conditional (vault_role_cmd.rc == 0): 'dict object' has no attribute
    'rc'. 'dict object' has no attribute 'rc'"}

This can ahepn when a call returns error 500 for whatever reason.
Let's make sure we catch this situation and keep trying and don't give
up due to this spurious error.
Tested in MCG and everything deployed correctly (imperative ansible jobs
ran without issues)
Check for rc attribute to exist
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 4 to 5.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v4...v5)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
…ons/actions/setup-python-5

Bump actions/setup-python from 4 to 5
Currently when we load a preview operator via the IIB mechanism we
redirect all images making up the operator bundle to the
cluster-internal registry. This is all fine and well, except these
redirects (done via an ImageDigestMirrorSet) are based on image names
without any specific hashes. (This is because OCP won't allow
you to specify hashes).

The problem arises when there is a prerelease operator which includes an
image that is used by the other non-prerelease operators. So if AAP
prerelease uses the image "registry.redhat.io/public/redis-6" we
redirect all these redis 6 images towards the internal registry.

But if another operator needs the redis-6 image with a hash that is not
the exact same that is used by AAP prerelease, it will be unable to find
it on the internal registry because we never uploaded it.

This is an example error:
2023-12-13 07:18:06,216 INFO Warning Failed 64m (x6 over 66m) kubelet Error: ImagePullBackOff
2023-12-13 07:18:06,216 INFO Normal BackOff 83s (x286 over 66m) kubelet Back-off pulling image "registry.redhat.io/rhel8/redis-6@sha256:edbd40185ed8c20ee61ebdf9f2e1e1d7594598fceff963b4dee3201472d6deda"

And this is a relevant /etc/containers/registries.conf :
[[registry]]
prefix = ""
location = "registry.redhat.io/rhel8/redis-6"
blocked = true

[[registry.mirror]]
location = "default-route-openshift-image-registry.apps.mcg-hub.blueprints.rhecoeng.com/openshift-marketplace/redis-6"
insecure = true
pull-from-mirror = "digest-only"

If we change the `mirrorSourcePolicy` from `NeverContactSource` to
`AllowContactingSource` we actually avoid this problem entirely.
OCP will try to pull the images from both the internal registry and the
original source and use the one it was able to find.

Tested both on AAP and Gitops prerelease and both deployed correctly
which was not the case before.
Switch imageDigestMirrors to AllowContactingSource
Tested with:

    export EXTRA_HELM_OPTS="--set main.tokenSecret=private-repo --set main.tokenSecretNamespace=openshift-operators"
    ./pattern.sh make install

Note that this is currently only working with https URLs because we have
logic in the Makefile to rewrite ssh-based git URLs into https ones.
Add initial support for deploying private repos via CLI directly
@mbaldessari mbaldessari merged commit 228dc54 into validatedpatterns:main Dec 18, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants