-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flake in detecting reorg by nodes #15289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we adjust this to half of the nodes just to make the tests pass? I think it will help if we first try to investigate why rest of the nodes are not able to detect it? Also why half? if we verify f+1 number is able to detect reorg I think it would make more sense.
@AnieeG My understanding on this even if one node detects violation, the system should get halted. So, I was trying to make sure at least half the nodes detects it. If that's not the case and we need f+1 node to detect, then it would make sense to update. I couldn't reproduce this failure in my local and node logs are not available in CI run. As every nodes are setup similarly in this, wondering why some nodes alone behave different or this might be due to timings? |
@mateusz-sekara is this right assumption?
|
No, enough nodes need to detect it and avoid participating in OCR. Commit requires 2f+1 observations, so f+1 nodes need to detect it in order to force the entire DON to stop processing messages. Half the committee is sort of that, e.g for f = 1, N = 4 and half the committee means 2 nodes are seeing finality violations, which is also f+1. For f = 2, N = 7 and ceil(N/2) = 4 which is f+2. |
…fix-reorgtest-flake
For some weird reasons, the health check call in nodes doesn't show finality violated error consistently across all the nodes. Due to which this test TestSmokeCCIPReorgAboveFinalityAtDestination/Above_finality_reorg_in_destination_chain flakes often in CI.
Adjusting the expectation to have at least half of the nodes detects violation instead of every node.
CI Failure examples: https://github.com/smartcontractkit/chainlink/actions/runs/11894867899/job/33143516553
https://github.com/smartcontractkit/chainlink/actions/runs/11894232425/job/33141414438?pr=15282