-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up remants of eth #249
Conversation
During DB initialization, we end out using an unwritten CompactSpaceHeader. This change writes one to disk when the database is created, causing allocations of new objects to start at offset 1024 instead of at offset 0. Writing objects at offset zero was working for a while because we have an unused Node object there. That will get removed on the next commit.
firewood/src/db.rs
Outdated
std::slice::from_raw_parts( | ||
&header as *const DbParams as *const u8, | ||
size_of::<DbParams>(), | ||
) | ||
.to_vec() | ||
}; | ||
|
||
// write out the DbHeader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain (maybe in the PR description) why this is not needed before the acc_root
removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First comment should be addressed even if only in the form of a reply
/// The root node of the generic key-value store. | ||
kv_root: DiskAddress, | ||
} | ||
|
||
impl DbHeader { | ||
pub const MSIZE: u64 = 16; | ||
pub const MSIZE: u64 = std::mem::size_of::<Self>() as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of Self
actually isn't guaranteed to be constant unless we use #[repr(C)]
... In practice, I don't think it'll ever change, but I think we should probably just add #[repr(C)]
to the type.
We could/should also have a test to make sure headers can be deserialized properly from disk when restoring a database. Not for this PR, but something to think about. I can create an issue if you don't want to implement for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could/should also have a test to make sure headers can be deserialized properly from disk when restoring a database. Not for this PR, but something to think about. I can create an issue if you don't want to implement for this PR.
I think we have this problem generally across the whole code base. I think it's probably isolated to things that implement Storable and we can probably generalize this. I considered adding a method to anything Storable with a better signature, but that's for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of
Self
actually isn't guaranteed to be constant unless we use#[repr(C)]
... In practice, I don't think it'll ever change, but I think we should probably just add#[repr(C)]
to the type.
Agreed, but this is a step up from the constant 16 that was there. There was also no guarantee that this created 16 bytes.
I thought about adding a #[repr(_)]
but there weren't any great choices:
#[repr(C)]
is consistent but there might still be some padding#[repr(packed)]
might create performance problems#[repr(align(4))]
might be a good choice but currently the way the code works it isn't good for the serialization methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree on both counts, just want to leave evidence that we are aware of these issues
Easier and probably faster than serializing via hydrate() Not overly concerned about the extra allocations here, probably could do better by not making everything a vector, but this code only happens once.
Need to fix the "required checks" repo setting still |
#[derive(Debug)] | ||
/// mutable DB-wide metadata, it keeps track of the root of the top-level trie. | ||
#[repr(C)] | ||
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)] | |
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)] |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these added auto traits are required, and might generate more code that we don't really need.
.chain({ | ||
// compute the DbHeader as bytes | ||
let hdr = DbHeader::new_empty(); | ||
// clippy thinks to_vec isn't necessary, but it actually is :( | ||
#[allow(clippy::unnecessary_to_owned)] | ||
bytemuck::bytes_of(&hdr).to_vec().into_iter() | ||
}) | ||
.chain({ | ||
// write out the CompactSpaceHeader | ||
let csh = CompactSpaceHeader::new( | ||
NonZeroUsize::new(SPACE_RESERVED as usize).unwrap(), | ||
NonZeroUsize::new(SPACE_RESERVED as usize).unwrap(), | ||
); | ||
#[allow(clippy::unnecessary_to_owned)] | ||
bytemuck::bytes_of(&csh).to_vec().into_iter() | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now this is a scope issue. If you define hdr
and csh
at the same scope as .chain
, you can use bytemuck::bytes_of(&hdr).iter().copied()
. No allocations needed, no clippy-allow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely can still do a little better but there are diminishing returns since this only happens at db create time, and there are SOOO many more allocations in the main code path, it's fine like this for now.
wal_block_nbit: cfg.wal.block_nbit, | ||
root_hash_file_nbit: cfg.root_hash_file_nbit, | ||
}; | ||
let bytes = bytemuck::bytes_of(¶ms).to_vec(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left the same comment below. If you hang on to the params
at the same scope as header_bytes
, you'll only need to do one allocation for all the header_bytes
.
@@ -11,6 +11,7 @@ license = "MIT" | |||
hex = "0.4.3" | |||
lru = "0.11.0" | |||
thiserror = "1.0.38" | |||
bytemuck = { version = "1.13.1", features = ["derive"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -127,7 +127,8 @@ impl Storable for CompactDescriptor { | |||
} | |||
} | |||
|
|||
#[derive(Debug)] | |||
#[repr(C)] | |||
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)] | |
#[derive(Copy, Clone, Debug, bytemuck::Zeroable, bytemuck::Pod)] |
@@ -2,10 +2,13 @@ use std::hash::Hash; | |||
use std::num::NonZeroUsize; | |||
use std::ops::{Deref, DerefMut}; | |||
|
|||
use bytemuck::NoUninit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use bytemuck::NoUninit; | |
use bytemuck::{Pod, Zeroable}; |
use crate::{CachedStore, ShaleError, Storable}; | ||
|
||
/// The virtual disk address of an object | ||
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq)] | ||
#[repr(C)] | ||
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, NoUninit)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, NoUninit)] | |
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, Pod, Zeroable)] |
This change shrinks the DbHeader to remove the old
acc_root
. It also fixes an initialization bug that shows up if you attempt to do this.