-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node stuck in "Generating state snapshot" #337
Comments
Please add additional information: description of component upgrade process, full log after op-geth startup, other warning logs for op-node (p2p issues are usually not critical issues) |
Description of component upgrade process:
Node is now slowly recovering and showing logs like:
Op-geth logs:
Additionally, I tried to disable snapshot ( |
I think snapshot here is not a same concept.
It may mean that it has returned to normal |
btw, snapshot needs to be enabled with we use snap-sync: Line 1812 in 5e9cb81
|
Sorry, I don't know much about this. Here may be some relevant information:
The crux of the matter is why Generating state snapshot has to happen and is so time consuming, right? |
the issue happen again in mainnet... Description"Generating state snapshot" is affecting node, which gets out of sync. System information
Expected behaviourNode should not get out of sync. Actual behaviourNode gets out of sync by more than 6000 blocks. LogsThis are the latest logs where the node is also processing blocks ("Imported new potential chain segment" / "Chain head was updated"). However for some time the node is only generating state snapshots and getting further behind the public network. op-geth
|
When I restart de node it starts again "Generating state snapshot" without processing new blocks. Logs:
|
@opfocus , is this an issue that needs to be fixed? Can the "generating snapshot" process be the cause of the node to run out of-sync??? should we use "--snapshot=false" to avoid this issue? |
Can you investigate the performance of your server, which appears in the op-node log above. "context deadline exceeded" messages that indicate things are taking too long (hence deadline exceeded). I am not sure |
I have increased resources allocation (cpu and memory) and disk is an io2 ebs volume with 20k iops, but I still see "context deadline exceeded" in the op-node logs. Op-geth finished "generating snapshot" but node is still 2000 block behind. op-geth:
op-node:
|
Every time I restart op-geth, it starts to generate snapshots:
is this related with snap sync? |
@opfocus , can you ask a developer to have a look at this issue and logs? IMO, this is not related with server resources... We are currently using a VM with 16 cpu cores and 64Gb (m6a.4xlarge), and still getting these warning messages:
and the node is still running behind and unable to catch up... |
I will share it on Discord, as it seems I cannot tag others here. There should be three issues so far:
Perhaps you should clarify the first issue: Will it always lag for some reason, making it impossible to sync to the latest block? |
Now is unable to catch up... the recent chain head update age is
and it is still generating snapshots:
Every time I see this lagging, the node is generating the state snapshot... If I restart the node it seems to start generating the state snapshot from the begging.... I have increased all the resources allocation (cpu, memory and storage settings) but it does not help... :/ |
can you also run |
|
seems io is ok,looks like Generating state snapshot is required in snap sync,I will recommend try with Archive sync,so that you don't need to
because you already
https://docs.optimism.io/builders/node-operators/management/snap-sync we can also wait developer to check this thread,because I didn't test it before. |
Can I change the node syncmode from Yes, we need a developer to look into this issue... this shouldn't be the normal behavior... |
better delete the datadir to start again |
it is important to add that we don't see this problem with base mainnet... and both networks have the same configuration. I'm not sure if the generating state snapshot is the issue... but the optimism mainnet node is unable to catch up.. it is now 4h9m3s behind... :/
|
Try resyncing again but delete
I don't think snap sync works with archive mode With archive mode, you must sync from a snapshot. Otherwise, it will take forever (months or even year). |
we are now observing the same issue with optimism sepolia node:
|
soyboy(from dev team) also think snap sync isn't for archive node
|
I will keep monitoring our optimism and base archive nodes using snap sync, and let you know if I see more issues. BTW, Base nodes have been running without issues. |
i hava the same question, |
We're having the same issue in our Base mainnet node:
Generating snapshots takes a looong time and the node has been falling behind since it started.
And these on
|
Summary
Node stuck in "Generating state snapshot" and not processing new blocks after upgrade to the latest software releases of op-geth (v1.101315.0 -> v1.101315.2) and op-node (v1.7.5 -> v1.7.7).
System information
Geth version:
geth version
Op-node:
OS & Version: Linux Ubuntu
Expected behaviour
Actual behaviour
Generating state snapshot process is blocking op-geth of processing new blocks
Logs
op-geth logs:
op-node logs:
CLI
op-geth:
op-node:
The text was updated successfully, but these errors were encountered: