[Enhancement] GC - revisit the way records are scheduled for deletion. #4457

ruseinov · 2024-06-29T17:57:16Z

Overview

Currently the way this is done is we identify the garbage and store all the deref aka remove operations in a Vec till after all the columns are iterated. That definitely takes more memory than if we'd store those in a CidHashSet and schedule removals batch by batch.

Here's the code in question:

forest/src/db/parity_db.rs

Lines 340 to 365 in 43aeca5

 fn remove_keys(&self, keys: CidHashSet) -> anyhow::Result<()> { 

 let mut iter = self.db.iter(DbColumn::GraphFull as u8)?; 

 // It's easier to store cid's scheduled for removal directly as an `Op` to avoid costly 

 // conversion with allocation. 

 let mut deref_vec = Vec::new(); 

 while let Some((key, _)) = iter.next()? { 

 let cid = Cid::try_from(key)?; 

 if keys.contains(&cid) { 

 deref_vec.push(Self::dereference_operation(&cid)); 

 } 

 } 

 self.db 

 .iter_column_while(DbColumn::GraphDagCborBlake2b256 as u8, |val| { 

 let hash = Blake2b256.digest(&val.value); 

 let cid = Cid::new_v1(DAG_CBOR, hash); 

 if keys.contains(&cid) { 

 deref_vec.push(Self::dereference_operation(&cid)); 

 } 

 true 

 })?; 

 self.db.commit_changes(deref_vec).context("error remove") 

 }

.

I'm unsure this really needs to be done, as we won't have all that much garbage for it to matter for the current use-cases. Perhaps this could safely be ignored. This would also require some profiling to see exactly how much memory can be saved here, as we have to take what ParityDB does internally into account too.

Summary

It all boils down to having one big batch scheduled for removal at the same time vs a more light-weight storage and smaller batch removal, in order to be able to reduce the memory footprint.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] GC - revisit the way records are scheduled for deletion. #4457

[Enhancement] GC - revisit the way records are scheduled for deletion. #4457

ruseinov commented Jun 29, 2024

[Enhancement] GC - revisit the way records are scheduled for deletion. #4457

[Enhancement] GC - revisit the way records are scheduled for deletion. #4457

Comments

ruseinov commented Jun 29, 2024

Overview

Summary