Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up remants of eth #249

Merged
merged 6 commits into from
Sep 6, 2023
Merged

Clean up remants of eth #249

merged 6 commits into from
Sep 6, 2023

Conversation

rkuris
Copy link
Collaborator

@rkuris rkuris commented Sep 5, 2023

This change shrinks the DbHeader to remove the old acc_root. It also fixes an initialization bug that shows up if you attempt to do this.

During DB initialization, we end out using an unwritten
CompactSpaceHeader. This change writes one to disk when the database is
created, causing allocations of new objects to start at offset 1024
instead of at offset 0.

Writing objects at offset zero was working for a while because we have
an unused Node object there. That will get removed on the next commit.
std::slice::from_raw_parts(
&header as *const DbParams as *const u8,
size_of::<DbParams>(),
)
.to_vec()
};

// write out the DbHeader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain (maybe in the PR description) why this is not needed before the acc_root removal?

Copy link
Contributor

@richardpringle richardpringle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First comment should be addressed even if only in the form of a reply

/// The root node of the generic key-value store.
kv_root: DiskAddress,
}

impl DbHeader {
pub const MSIZE: u64 = 16;
pub const MSIZE: u64 = std::mem::size_of::<Self>() as u64;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of Self actually isn't guaranteed to be constant unless we use #[repr(C)]... In practice, I don't think it'll ever change, but I think we should probably just add #[repr(C)] to the type.

We could/should also have a test to make sure headers can be deserialized properly from disk when restoring a database. Not for this PR, but something to think about. I can create an issue if you don't want to implement for this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could/should also have a test to make sure headers can be deserialized properly from disk when restoring a database. Not for this PR, but something to think about. I can create an issue if you don't want to implement for this PR.

I think we have this problem generally across the whole code base. I think it's probably isolated to things that implement Storable and we can probably generalize this. I considered adding a method to anything Storable with a better signature, but that's for another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of Self actually isn't guaranteed to be constant unless we use #[repr(C)]... In practice, I don't think it'll ever change, but I think we should probably just add #[repr(C)] to the type.

Agreed, but this is a step up from the constant 16 that was there. There was also no guarantee that this created 16 bytes.

I thought about adding a #[repr(_)] but there weren't any great choices:

  • #[repr(C)] is consistent but there might still be some padding
  • #[repr(packed)] might create performance problems
  • #[repr(align(4))] might be a good choice but currently the way the code works it isn't good for the serialization methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on both counts, just want to leave evidence that we are aware of these issues

firewood/src/db.rs Show resolved Hide resolved
firewood/src/db.rs Show resolved Hide resolved
firewood/src/db.rs Outdated Show resolved Hide resolved
Easier and probably faster than serializing via hydrate()
Not overly concerned about the extra allocations here, probably
could do better by not making everything a vector, but this code
only happens once.
@richardpringle
Copy link
Contributor

Need to fix the "required checks" repo setting still

#[derive(Debug)]
/// mutable DB-wide metadata, it keeps track of the root of the top-level trie.
#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]
#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these added auto traits are required, and might generate more code that we don't really need.

Comment on lines +419 to +434
.chain({
// compute the DbHeader as bytes
let hdr = DbHeader::new_empty();
// clippy thinks to_vec isn't necessary, but it actually is :(
#[allow(clippy::unnecessary_to_owned)]
bytemuck::bytes_of(&hdr).to_vec().into_iter()
})
.chain({
// write out the CompactSpaceHeader
let csh = CompactSpaceHeader::new(
NonZeroUsize::new(SPACE_RESERVED as usize).unwrap(),
NonZeroUsize::new(SPACE_RESERVED as usize).unwrap(),
);
#[allow(clippy::unnecessary_to_owned)]
bytemuck::bytes_of(&csh).to_vec().into_iter()
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this is a scope issue. If you define hdr and csh at the same scope as .chain, you can use bytemuck::bytes_of(&hdr).iter().copied(). No allocations needed, no clippy-allow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely can still do a little better but there are diminishing returns since this only happens at db create time, and there are SOOO many more allocations in the main code path, it's fine like this for now.

wal_block_nbit: cfg.wal.block_nbit,
root_hash_file_nbit: cfg.root_hash_file_nbit,
};
let bytes = bytemuck::bytes_of(&params).to_vec();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left the same comment below. If you hang on to the params at the same scope as header_bytes, you'll only need to do one allocation for all the header_bytes.

@@ -11,6 +11,7 @@ license = "MIT"
hex = "0.4.3"
lru = "0.11.0"
thiserror = "1.0.38"
bytemuck = { version = "1.13.1", features = ["derive"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -127,7 +127,8 @@ impl Storable for CompactDescriptor {
}
}

#[derive(Debug)]
#[repr(C)]
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]
#[derive(Copy, Clone, Debug, bytemuck::Zeroable, bytemuck::Pod)]

@@ -2,10 +2,13 @@ use std::hash::Hash;
use std::num::NonZeroUsize;
use std::ops::{Deref, DerefMut};

use bytemuck::NoUninit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use bytemuck::NoUninit;
use bytemuck::{Pod, Zeroable};

use crate::{CachedStore, ShaleError, Storable};

/// The virtual disk address of an object
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq)]
#[repr(C)]
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, NoUninit)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, NoUninit)]
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, Pod, Zeroable)]

@rkuris rkuris merged commit 32bcdae into main Sep 6, 2023
5 checks passed
@rkuris rkuris deleted the rkuris/get-ahead-of-db-header branch September 6, 2023 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants