-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backport of Fix AP upgrade version issue into release/1.15.x #27363
Backport of Fix AP upgrade version issue into release/1.15.x #27363
Conversation
Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement Learn more about why HashiCorp requires a CLA and what the CLA includes 2 out of 3 committers have signed the CLA.
temp seems not to be a GitHub user. Have you signed the CLA already but the status is still pending? Recheck it. |
Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement Learn more about why HashiCorp requires a CLA and what the CLA includes temp seems not to be a GitHub user. Have you signed the CLA already but the status is still pending? Recheck it. |
CI Results: |
* Fix AP upgrade version issue * add heartbeat logging at trace level * add log to show when heartbeats resume * Test the plumbing * Revert "Test the plumbing" This reverts commit e25fcd8. * Add CHANGELOG * Add plumbing test * Update misleading comment --------- Co-authored-by: Josh Black <raskchanky@gmail.com>
Note that I specifically did not backport the test because the test relied on some plumbing that apparently was not backported to 1.15.x and I didn't want this backport to get completely out of hand. |
Build Results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Josh
…ideally-choice-flamingo
I used the wrong process for these backports. |
Backport
This PR is auto-generated from #27277 to be assessed for backporting due to the inclusion of the label backport/1.15.x.
🚨
The person who merged in the original PR is:
@banks
This person should manually cherry-pick the original PR into a new backport PR,
and close this one when the manual backport PR is merged in.
The below text is copied from the body of the original PR.
This is a subtle issue introduced by #26449 while fixing a different issue. It only impacts Vault Enterprise but is generally simpler to reason about.
That PR was a correct fix, however the
effectiveSDKVersion
was only ever being set on active nodes not on perf standbys in Enterprise.The big change here is:
SetupCluster
which is necessary on all types of serversRaftBackend.effectiveSDKVersion
back to the binary version so even if we fail to plumb this from core in some edge case in the future it's still the right value in almost all cases (other than testing environments).I've manually tested this locally. To reproduce the issue it fixes you need to:
autopilot_upgrade_version
in config on the new servers (doing so bypasses the bug).In this case you'll see the new servers join but remain "target version" because their Echo's are sending an empty version string. You also see some logs about empty
upgrade_versions
which are legit in this case but are also confusing because you can see them even after the fix due to timing (if AP runs before an Echo has come in but after node joined raft on the leader you'll see that log even on a non-broken cluster and then you don't see a corresponding "OK" log after wards).TODO
Overview of commits