-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable a sequencer to re-sequence batches #894
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving some early feedback.
Hey @cffls a few thoughts on the changes here:
To my knowledge the process for marking a batch as bad would be along the lines of:
This last step in my head would start the code we already have for L1 recovery but some new L1 tracking code would be scanning the L1 for this event around a bad batch and storing it in the database. Once we know the batch is bad we can immediately roll back the datastream to the last good batch along with the database using the normal unwind feature, from there the normal L1 recovery code would start but as part of our "bad batch" checks during recovery we'd have one more check for this scenario, when we encounter it we simply mark the batch as bad as do already and move on to normal recovery. This way blocks and batches would be an exact replicate of the L1 data apart from skipping the bad batch. Once L1 recovery is completed to the previous tip you can restart the sequencer as normal and the network continues. I think this approach works out simpler and we don't need to do anything fancy around splitting blocks or handling timestamps, we just recreate the L1 exactly as it was before. This has the advantage of also handling things like parent hashes automatically without fuss. What do you think? |
I think maybe there has been something missed here with regards to the banana fork requirements of the batch that will be removed being something that happens on the L1 to affect the chains history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @cffls - after the discussion yesterday and looking over the changes in a new light it's looking good and making a lot more sense approach wise to me now.
There are a couple of things that would help here I think:
- Making this process really explicit by using a flag, if it isn't set then the sequencer won't perform the checks to see if it is behind and just behave as normal.
- Rather than using a datastream client pointing back to itself, we have the
data_stream_server.go
file that should let you work with the stream without needing the client and it will work directly with the files on disk. That approach seems cleaner to me.
If we can make those changes I think we'll be in a safe place to test this and get it merged.
Thanks @hexoscott . I've addressed your feedbacks accordingly. |
Great 😄 could you give me a nudge once the latest zkevm branch stuff is merged in so I don't miss it 🙏 |
Thanks @hexoscott . I've merged from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice updates 😄 just one little thing around the use of the flag in the sequencing stage
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, just a few comments from the last time I had a peek at it.
Feature PR for #667
Overall workflow:
Stop sequencer
Unwind to target batch (Using command under https://github.com/0xPolygonHermez/cdk-erigon/blob/zkevm/cmd/integration/commands/stage_stages_zkevm.go). e.g.
CDK_ERIGON_SEQUENCER=1 go run ./cmd/integration state_stages_zkevm --config=/path/to/config.yaml --unwind-batch-no=8 --chain dynamic-kurtosis --datadir /path/to/chain/data
Backup data stream files in case the files are corrupted during re-sequencing
Restart sequencer
The sequencer will detect rollback and start re-sequencing automatically
On re-sequencing, L2 blocks will either be the same as before, split into multiple, but they can never be merged. Some properties of rollback behavior:
Tested the following rollback scenarios in kurtosis:
The original chain (before resequenced) contains both empty blocks and non-empty blocks with uniswap transactions.
ResequenceStrict=true
. No L1InfoUpdate on L1. No counter limit change.Result: Resequenced L2 blocks have exactly the same block hashes as previously created.
ResequenceStrict=true
. No L1InfoUpdate on L1. Lowered counter limit.Result: An error was raised because a previous batch had to be split into two batches.
ResequenceStrict=false
. No L1InfoUpdate on L1. Lowered counter limit.Result: Resequence completed with more number of batches than before.
ResequenceStrict=true
. Multiple L1InfoUpdates on L1. No counter limit change.Result: Same number of blocks and batches were resequenced. L2 block hash were the same.
ResequenceStrict=false
. Multiple L1InfoUpdates on L1. No counter limit change.Result: Same number of blocks and batches were resequenced. L2 block hash were different because different L1InfoUpdate index were used.