Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node/Gov: Don't enqueue already signed VAAs #3284

Closed
wants to merge 2 commits into from

Conversation

bruce-riley
Copy link
Contributor

@bruce-riley bruce-riley commented Aug 11, 2023

There is currently a problem where old Oasis transactions are being reobserved and they are flooding the governor. This is not really necessary because the transactions have most likely previously reached quorum.

This PR changes the governor to check to see if a transfer has been previously been approved, and if so, allows it to be published immediately, without impacting the notional value of the governor.

The following checks are done to determine if the transfer can be approved immediately:

  • The VAA is already in the signed VAA database.
  • The current digest matches what is in the database.
  • The VAA in the database was signed by this guardian.

If the above conditions are met, the VAA can be approved immediately. Additionally, this check is made when pending transfers are reloaded on start up, which should immediately clean up the backlog when a guardian upgrades.

@bruce-riley bruce-riley marked this pull request as ready for review August 11, 2023 19:14
@bruce-riley bruce-riley force-pushed the node/gov_dont_enqueue_already_signed_VAAs branch from 828a789 to 96476b3 Compare August 11, 2023 20:35
panoel
panoel previously approved these changes Aug 14, 2023
Copy link
Contributor

@tbjump tbjump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Threat Surface

I agree with the issue that this change is addressing, but I would solve it differently. With the current approach we are adding

  1. another code path that bypasses the governor
  2. another place in the code where the guardian private key is being used

That's added threat surface in the two most security critical paths of the codebase, and I would like to avoid that.

I'd prefer if the processor would check if the Guardian had made this observation in the past and if so broadcast the existing signature. That way, the guardian private key doesn't get involved in this new code path at all.

False Negatives

The proposed change checking the stored VAA, which typically only has 13 signatures and even if Guardian g has signed a particular message in the past, g's signature may not be in the locally stored VAA for that message, leading to false negatives in this check. To address this issue, all observations would need to be stored, probably separately from the VAAs.

Copy link
Contributor

@evan-gray evan-gray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try without the additional private key usage.

}

// Generate our signature for this VAA so we can see if we are listed as a signer.
sig, err := crypto.Sign(v.SigningDigest().Bytes(), gov.gk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right here I was expecting that we look through v.Signatures to find our index, then recover the public key from that signature (given that we just verified the hashes match above). If the public key matches our guardian's public key, we should be good to proceed. That would avoid needing the private key here, which would be best avoided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach much better! I have changed it to use Ecrecover with the public key rather than the private key.

@evan-gray
Copy link
Contributor

Threat Surface

I agree with the issue that this change is addressing, but I would solve it differently. With the current approach we are adding

  1. another code path that bypasses the governor
  2. another place in the code where the guardian private key is being used

That's added threat surface in the two most security critical paths of the codebase, and I would like to avoid that.

I'd prefer if the processor would check if the Guardian had made this observation in the past and if so broadcast the existing signature. That way, the guardian private key doesn't get involved in this new code path at all.

False Negatives

The proposed change checking the stored VAA, which typically only has 13 signatures and even if Guardian g has signed a particular message in the past, g's signature may not be in the locally stored VAA for that message, leading to false negatives in this check. To address this issue, all observations would need to be stored, probably separately from the VAAs.

On the threat surface front, see my comment and perhaps that mitigates the concerns. Ultimately it would be good to bypass the re-observation route that could trigger this issue (and the need for this fix) altogether. That said, I think it important to mitigate the issue affecting production today and leave observation collecting (and re-observation handling) to a new, breakout service in the future.

@bruce-riley
Copy link
Contributor Author

I agree with @evan-gray. I have changed the code to use Ecrecover with the public key rather than the private key. Also, we do not currently have a source of all observations that this guardian has signed, so the best we can do is compare against the signed VAAs. Yes, that could lead to some misses, but those misses are no worse than what we have now.

@tbjump
Copy link
Contributor

tbjump commented Aug 15, 2023

I want to get more clarity on the problem this PR is addressing.
So right now if a Guardian g0 would re-sync their Oasis node it would lead to that Guardian re-observing many messages, which would exceed the governor limits, leading to many transactions getting enqueued in the Governor. But that shouldn't affect any other Guardians. If g0 has VAAs for those messages in store, he will not send any observation requests and the messages will just time-out. If on the other hand g0 does not have any VAA for this message stored, this PR will not change any behavior surrounding that message.

So this issue should only affect a the Guardians who are re-syncing their Oasis nodes and the issue goes away by itself ~48h after the re-sync is done, correct?

@bruce-riley bruce-riley marked this pull request as draft September 26, 2023 17:18
@bruce-riley
Copy link
Contributor Author

This problem was solved differently, so this PR is no longer necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants