-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix race condition during ACME order finalization #4562
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I am not sure this is the correct approach. Stand by for additional detail. |
From https://datatracker.ietf.org/doc/html/rfc8555#section-7.1.6 (see diagram reproduced below) it is clear that an Order must move to status
From https://datatracker.ietf.org/doc/html/rfc8555#section-7.4:
This patch changes the behaviour such that an order would be successfully finalized Unfortunatley, the description of expected client behaviour is vague:
In particular, what does believes it has fulfilled the server's requirements mean? How to proceed? First I have a question: was this PR prompted by an observed issue in real world use or in tests? If so, can you provide the details, including what was the client implementation? |
Thinking more about the premise of the change (from PR description):
I don't think this can happen (based on the existing code, not this PR). |
@fmarco76 @ckelleyRH @frasertweedale Thanks for looking! These race condition issues were observed when I was testing ACME with |
Here's a patch that demonstrates the problem: edewata@ee43b23
As mentioned earlier,
https://github.com/edewata/pki/actions/runs/6162315047/job/16723890467#step:28:145
https://github.com/edewata/pki/actions/runs/6162315047/job/16723890467#step:28:154
Then
So you're right, the order didn't end up with a wrong status in the LDAP record, but from client's perspective the order didn't have the expected status. This is because |
I tried fixing the exception as described in the RFC: edewata@49103ac
https://github.com/edewata/pki/actions/runs/6163037711/job/16726539917#step:28:160
https://github.com/edewata/pki/actions/runs/6163037711/job/16726539917#step:28:169
|
I've created a new patch similar to this PR but probably more RFC-compliant: edewata@0ba6c70 Please let me know if this is OK, then I can update the PR. I'll also try to add compatibility tests using other clients in separate PRs. Thanks! |
I do think we ought to merge edewata@49103ac (perhaps in modified form to account for whatever other changes we want to make). Because it make us respond more properly in the "order status != ready" case. I am still reviewing the other changes. |
I do think certbot's behaviour is non-conformant here. Per RFC 8555:
Evidently certbot just bails out upon receiving the 403, which is wrong. But... we should still try and handle this better on our end. edit: I filed a certbot issue: certbot/certbot#9766 |
I reveiewed edewata@0ba6c70. I object to this approach. The RFC is clear that the order must transition to Let me suggest a different approach. In
This flips the order in which the state transitions occur. The Order transitions to For example (just a rough diff; I haven't built it, let alone tested): diff --git a/base/acme/src/main/java/org/dogtagpki/acme/server/ACMEChallengeProcessor.java b/base/acme/src/main/jav>
index 15514bb39..c5dc1eb68 100644
--- a/base/acme/src/main/java/org/dogtagpki/acme/server/ACMEChallengeProcessor.java
+++ b/base/acme/src/main/java/org/dogtagpki/acme/server/ACMEChallengeProcessor.java
@@ -118,8 +118,6 @@ public class ACMEChallengeProcessor implements Runnable {
Date expirationTime = engine.getPolicy().getValidAuthorizationExpirationTime(currentTime);
authorization.setExpirationTime(expirationTime);
- engine.updateAuthorization(account, authorization);
-
logger.info("Updating pending orders");
Collection<ACMEOrder> orders =
@@ -129,6 +127,10 @@ public class ACMEChallengeProcessor implements Runnable {
boolean allAuthorizationsValid = true;
for (String orderAuthzID : order.getAuthzIDs()) {
+ if (orderAuthzID.equals(authzID)) {
+ // We're about to set it to valid, so treat it as such
+ continue;
+ }
ACMEAuthorization authz = engine.getAuthorization(account, orderAuthzID);
if (authz.getStatus().equals("valid")) continue;
@@ -147,6 +149,13 @@ public class ACMEChallengeProcessor implements Runnable {
engine.updateOrder(account, order);
}
+
+ // We defer the LDAP update until AFTER any order transitions
+ // occur. This avoids a race condition where clients that eagerly
+ // proceed to finalization when all Authorizations are "valid"
+ // experience finalization failure, because the __Order__ has not yet
+ // transition to "ready".
+ engine.updateAuthorization(account, authorization);
}
public void finalizeInvalidAuthorization(ACMEError err) throws Exception { |
Thanks for the suggestion. I tried your patch as is with PR #4561 and the new exception, and it seems to be working: edewata@3a060d7 If you prefer, feel free to create a new PR for your patch, or I can just update this PR with your patch (and make you the author). |
@edewata great! I'll let you take it from here. Cheers! |
Previously when the authorization for an order completed the ACMEChallengeProcessor would update the authorization status and the order status after that, then the client would call the ACMEFinalizeOrderService to finalize the order. However, since these updates are not atomic there is a risk that the client will detect the new authorization status then immediately call the ACMEFinalizeOrderService before the order status can be updated by ACMEChallengeProcessor. Since both both ACMEFinalizeOrderService and ACMEChallengeProcessor will read and write the same order at the same time, there could be a race condition and the finalization could fail. To avoid the problem the ACMEChallengeProcessor has been modified to update the order status first before updating the authorization status.
Kudos, SonarCloud Quality Gate passed! |
@fmarco76 @ckelleyRH @frasertweedale Thanks! I've updated the PR. |
Previously when the authorization for an order completed the
ACMEChallengeProcessor
would update the authorization status and the order status as well, then the client would call theACMEFinalizeOrderService
to finalize the order.However, since these updates are not atomic there is a risk that the client will detect the new authorization status then immediately call the
ACMEFinalizeOrderService
before the order status can be updated byACMEChallengeProcessor
. Since both bothACMEFinalizeOrderService
andACMEChallengeProcessor
are trying to update the same order at the same time, the order can end up with a wrong status.To avoid the problem the code that updates the order status has been moved into
ACMEFinalizeOrderService
, so there is only one thread that can update the order at the same time.