-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beacon chain lags behind frequently when running the latest tree-states
branch
#4521
Comments
looks like |
I'm not able to reproduce this when running locally, I've tried the following 3 versions
I haven't seen any significant difference in terms of performance and lagging behind, but maybe i didn't keep them up long enough (I ran each version for about 15 mins) I'll try doing a fresh deployment to the same node to see if the problem persists. |
Might be worth trying this version as well to narrow down the possible changes that broke it: |
Our tree states node seems to be back to normal now, 8 hours after rolling back to the alpha release version 🤔 The slowness seems to be correlated to the frequency of |
Re-deployed latest tree-state branch again and looks fine - I think there's still something fishy with the new version though, because the old version seems to be able to recover from the slowness, but the performance with new version just keep degrading over time. Will leave it for a bit and keep monitoring. (Will try to reproduce with the attestation rewards API call later if it all looks good) |
The node running the latest |
I guess the state diffs being applied are for historic states. It's possible this is a manifestation of the HTTP death spiral issue in combination with some slight slowdown in the new version of tree-states vs the previous alpha. I think the fact we recently pointed the checkpointz server at this node might be part of the reason we've seen a change. We could try bumping up the hdiff buffer cache size to see if that helps take some of the time off the calls. Grepping the logs for "diff" should show long each diff is taking to be applied and whether the buffer cache is being hit. We probably also want to tweak the cache algorithm so that it prefers to keep "deeper" (more general) diffs, rather than ones for e.g. single epochs. It will also be interesting to test the HTTP API fix once that's applied in unstable. |
Something else it could be is the database migration frequency. The alpha has it set to 4, but the latest version has it set to 1. This was an attempt to reduce cache misses (by making it more likely the new finalized state is in the cache) but it may have backfired |
Thanks, would be interesting to try changing the database migration frequency and see how it peforms. It's worth noting that the |
I think I found part of the problem in #4573. I'll deploy that now and see if it shows much improvement. I haven't seen the crazy 20GB+ memory usage running the latest |
This has been fixed in #4576, closing. 🎉 |
Description
After merging in latest
unstable
totree-states
in PR #4514 , the BN runningtree-states
version seems to lag behind quite frequently for some reasons.The text was updated successfully, but these errors were encountered: