Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Introduce GC logging on failure, don't break the loop. #3876

Merged
merged 8 commits into from
Jul 22, 2024
Merged

Conversation

ruseinov
Copy link
Contributor

@ruseinov ruseinov commented Jan 15, 2024

Summary of changes

Changes introduced in this pull request:

  • GC error log.
  • No more breaking GC loop on error.

Reference issue to close (if applicable)

Work on #3863

Other information and links

Change checklist

  • I have performed a self-review of my own code,
  • I have made corresponding changes to the documentation,
  • I have added tests that prove my fix is effective or that my feature works (if possible),
  • I have made sure the CHANGELOG is up-to-date. All user-facing changes should be reflected in this document.

@ruseinov ruseinov requested a review from a team as a code owner January 15, 2024 13:44
@ruseinov ruseinov requested review from LesnyRumcajs and aatifsyed and removed request for a team January 15, 2024 13:44
@ruseinov ruseinov marked this pull request as draft January 15, 2024 14:28
@@ -163,7 +163,10 @@ impl<DB: Blockstore + GarbageCollectable + Sync + Send + 'static> MarkAndSweep<D
/// using CAR-backed storage with a snapshot, for implementation simplicity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this note still valid, if not please update it.

/// NOTE: This currently does not take into account the fact that we might be starting the node
/// using CAR-backed storage with a snapshot, for implementation simplicity.

Copy link
Contributor Author

@ruseinov ruseinov Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note will always be valid, it states that GC does not care about car-backed storage blocks. Perhaps I should rephrase it to make it clearer.

Basically, the way that it works is as follows: the database GC has access to is either ParityDB or MemoryDB, depending on the implementation. Since the GC is designed to wait until it starts doing the mark step - we don't really need to do anything special. And even if the database does not have many records by that time - it will just do a smaller cleanup in the first cycle.

Copy link
Contributor

@hanabi1224 hanabi1224 Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I don't quite understand how fn filter works when it's takes the parity-db only. Shouldn't fn unordered_stream_graph take the entire DB (parity-db + car-db)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should work just fine. We only care about cleaning up unreachable nodes. Since all the data that's stored in ParityDB is in the future for any data that's in car-db - we don't really care about it. Unless I'm missing some weird edge-case. I'm assuming that snapshot we export are finalized, in which case there should not be any issues.

The one enhancement I think we should do based on your comments and my revisiting the code is specifying the first epoch that's stored in the database as opposed to the car file. Otherwise

        if depth > current_epoch {
            time::sleep(interval).await;
            return anyhow::Ok(());
        }

is basically useless. Or we can get rid of it altogether and just rely on initial sleep that we do later for simplicity.

Copy link
Contributor

@hanabi1224 hanabi1224 Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that snapshot we export are finalized

I'm not positive about this, as I understand this can only be true when we export snapshot from latest - finality, @lemmih could you weight in on this?

just rely on initial sleep

I suggest storing the last GC epoch in parity-db metadata comlumn, calculating wait time instead of waiting for 10h on every start. In an edge case, if we have a node that restarts (maybe for getting updates) every 9h then GC would never get a chance to run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lemmih what do you think about the above? Seems reasonable to me.
We could also initialize the last GC epoch as the heaviest tipset we see in car-backed storage when the database is empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too concerned about this.

@@ -185,6 +188,9 @@ impl<DB: Blockstore + GarbageCollectable + Sync + Send + 'static> MarkAndSweep<D
// Make sure we don't run the GC too often.
time::sleep(interval).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to not sleep for 10h here and make it testable with CI calibnet checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it configurable. This sleep was introduced at David's request to avoid running the GC too often.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't think it's necessary. The reason being - we still need to wait finality to do filter and sweep, which makes this scenario pretty much untestable using calibnet checks, unless we want to wait 7.5+ hours.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it necessary to wait finality epochs as apposed to keeping 2*finality state roots like snapshot export?

Copy link
Contributor

@hanabi1224 hanabi1224 Jan 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which makes this scenario pretty much untestable using calibnet checks

Could you elaborate on why it becomes impossible to proactively trigger GC by a CLI event like before? That was useful for e2e tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make it artificially testable by introducing more abstractions in order to be able to mock certain parts of the algorithm.

If I'm not wrong, this still does not allow effective manual e2e testing or benchmarking on calibnet or mainnet, sounds like a deal breaker to me.

Note that the old GC has solved the exact problem (running in the background without downtime) without waiting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ideal, whether or not it's a deal breaker, because it's not manually testable - I'm not really sure. We can always benchmark every part of the three steps, there's no need to benchmark all three steps at the same time.

@lemmih could you please chip in with your thoughts on the matter? I kind of agree that this is hard to test and impossible to test manually, but it does offer a lot simpler codebase and algorithm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for chain finality is required for correctness. Due to possible forks, we may not delete data unless (1) it is unreachable from the current HEAD, and (2) it was written to the database more than chain_finality epochs ago. This does make end-to-end testing more difficult: How do we quickly test the GC when it is not allowed to garbage collect anything in the first 7.5 hours?

Lowering (or removing) the wait time is not an option. The wait is required for correctness and cannot be any lower than 7.5 hours on mainnet or calibnet.

I posit that end-to-end testing is not important. We can verify with quickcheck that the GC behaves as expected (ie. deletes what it should delete, doesn't delete what it shouldn't, etc). And we can statically guarantee that the GC loop will never exit (except for critical failures, which should terminate Forest). Furthermore, the nodes we run in production will show whether there are any GC leaks over time.

In summary: Keep the 7.5 hour wait (this is not optional). Keep the quickcheck tests. Don't do end-to-end tests in the CI (as the GC doesn't do anything interesting in this timespan). Prove to the reader that the GC loop will run as long as Forest itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted in the issue comment, one way out of this is providing GC alarms in our long-running node. This is not ideal, given our current release process, but it's better than pure faith. 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently figuring out why the GC gets stuck. It could be due to the fact that we bootstrap the node with a snapshot (car-backed storage) and the parallel graph walk somehow get stuck due to not having access to data all the way to genesis. Once that is debugged and resolved - we'll be able to monitor GC via checks and metrics indeed.

@LesnyRumcajs
Copy link
Member

@ruseinov Can this be closed? Or should it be merged?

@ruseinov ruseinov closed this Jul 19, 2024
@ruseinov ruseinov deleted the ru/fix/gc branch July 19, 2024 08:44
@ruseinov
Copy link
Contributor Author

@ruseinov Can this be closed? Or should it be merged?

yep, about time.

@ruseinov ruseinov restored the ru/fix/gc branch July 19, 2024 09:19
@ruseinov ruseinov reopened this Jul 19, 2024
@ruseinov ruseinov marked this pull request as ready for review July 19, 2024 09:20
@ruseinov ruseinov added this pull request to the merge queue Jul 22, 2024
Merged via the queue into main with commit 81d0f55 Jul 22, 2024
28 checks passed
@ruseinov ruseinov deleted the ru/fix/gc branch July 22, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants