Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SPSMDB-1170: set error state on failed reconcile of replset #1651

Merged
merged 3 commits into from
Sep 30, 2024

Conversation

pooknull
Copy link
Contributor

@pooknull pooknull commented Sep 13, 2024

K8SPSMDB-1170 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPSMDB-1170

CHANGE DESCRIPTION

Problem:
When spec.tls.allowInvalidCertificates is set to false, the operator is stuck on initializing replsets. There are errors in the log, but the cluster doesn't have .status.state set to error.

Solution:
Set error state, when reconcileCluster method fails.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

@pull-request-size pull-request-size bot added the size/XS 0-9 lines label Sep 13, 2024
@pooknull pooknull marked this pull request as ready for review September 13, 2024 10:38
@egegunes
Copy link
Contributor

this doesn't fix the real problem. yes, now we're setting cluster state to error but operator still can't recover if you set tls.allowInvalidCertificates=false. Operator is now stuck in error state rather than initializing state.

Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK to merge it since error state makes more sense but we shouldn't close K8SPSMDB-1156 after merging this. This PR doesn't fix K8SPSMDB-1156.

@pooknull
Copy link
Contributor Author

@egegunes I agree that this PR doesn't fix K8SPSMDB-1156. But at least setting the cluster to error state will allow the cluster to be deleted if finalizers were specified

I will try to fix this issue in another PR

@hors hors changed the title K8SPSMDB-1156: set error state on failed reconcile of replset K8SPSMDB-1170: set error state on failed reconcile of replset Sep 26, 2024
@hors hors requested a review from egegunes September 26, 2024 14:14
@JNKPercona
Copy link
Collaborator

Test name Status
arbiter passed
balancer passed
custom-replset-name passed
custom-tls passed
custom-users-roles passed
custom-users-roles-sharded passed
cross-site-sharded passed
data-at-rest-encryption passed
data-sharded passed
demand-backup passed
demand-backup-eks-credentials passed
demand-backup-physical passed
demand-backup-physical-sharded passed
demand-backup-sharded passed
expose-sharded passed
ignore-labels-annotations passed
init-deploy passed
finalizer passed
ldap passed
ldap-tls passed
limits passed
liveness passed
mongod-major-upgrade passed
mongod-major-upgrade-sharded passed
monitoring-2-0 failure
multi-cluster-service passed
non-voting passed
one-pod passed
operator-self-healing-chaos passed
pitr passed
pitr-sharded passed
pitr-physical passed
pvc-resize passed
recover-no-primary passed
replset-overrides passed
rs-shard-migration passed
scaling passed
scheduled-backup passed
security-context passed
self-healing-chaos passed
service-per-pod passed
serviceless-external-nodes passed
smart-update passed
split-horizon passed
storage passed
tls-issue-cert-manager passed
upgrade passed
upgrade-consistency passed
upgrade-consistency-sharded-tls passed
upgrade-sharded passed
users passed
version-service passed
We run 52 out of 52

commit: 37dd5b4
image: perconalab/percona-server-mongodb-operator:PR-1651-37dd5b4a

@hors hors merged commit 8abe495 into main Sep 30, 2024
16 of 17 checks passed
@hors hors deleted the dev/K8SPSMDB-1156 branch September 30, 2024 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XS 0-9 lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants