Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] GC - revisit the way records are scheduled for deletion. #4457

Open
ruseinov opened this issue Jun 29, 2024 · 0 comments
Open

Comments

@ruseinov
Copy link
Contributor

Overview

Currently the way this is done is we identify the garbage and store all the deref aka remove operations in a Vec till after all the columns are iterated. That definitely takes more memory than if we'd store those in a CidHashSet and schedule removals batch by batch.

Here's the code in question:

forest/src/db/parity_db.rs

Lines 340 to 365 in 43aeca5

fn remove_keys(&self, keys: CidHashSet) -> anyhow::Result<()> {
let mut iter = self.db.iter(DbColumn::GraphFull as u8)?;
// It's easier to store cid's scheduled for removal directly as an `Op` to avoid costly
// conversion with allocation.
let mut deref_vec = Vec::new();
while let Some((key, _)) = iter.next()? {
let cid = Cid::try_from(key)?;
if keys.contains(&cid) {
deref_vec.push(Self::dereference_operation(&cid));
}
}
self.db
.iter_column_while(DbColumn::GraphDagCborBlake2b256 as u8, |val| {
let hash = Blake2b256.digest(&val.value);
let cid = Cid::new_v1(DAG_CBOR, hash);
if keys.contains(&cid) {
deref_vec.push(Self::dereference_operation(&cid));
}
true
})?;
self.db.commit_changes(deref_vec).context("error remove")
}
.

I'm unsure this really needs to be done, as we won't have all that much garbage for it to matter for the current use-cases. Perhaps this could safely be ignored. This would also require some profiling to see exactly how much memory can be saved here, as we have to take what ParityDB does internally into account too.

Summary

It all boils down to having one big batch scheduled for removal at the same time vs a more light-weight storage and smaller batch removal, in order to be able to reduce the memory footprint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant