Clean up remants of eth #249

rkuris · 2023-09-05T20:34:56Z

This change shrinks the DbHeader to remove the old acc_root. It also fixes an initialization bug that shows up if you attempt to do this.

During DB initialization, we end out using an unwritten CompactSpaceHeader. This change writes one to disk when the database is created, causing allocations of new objects to start at offset 1024 instead of at offset 0. Writing objects at offset zero was working for a while because we have an unused Node object there. That will get removed on the next commit.

xinifinity · 2023-09-05T20:59:13Z

firewood/src/db.rs

                std::slice::from_raw_parts(
                    &header as *const DbParams as *const u8,
                    size_of::<DbParams>(),
                )
                .to_vec()
            };

+            // write out the DbHeader


Can you explain (maybe in the PR description) why this is not needed before the acc_root removal?

richardpringle

First comment should be addressed even if only in the form of a reply

richardpringle · 2023-09-05T20:55:05Z

firewood/src/db.rs

    /// The root node of the generic key-value store.
    kv_root: DiskAddress,
 }

 impl DbHeader {
-    pub const MSIZE: u64 = 16;
+    pub const MSIZE: u64 = std::mem::size_of::<Self>() as u64;


The size of Self actually isn't guaranteed to be constant unless we use #[repr(C)]... In practice, I don't think it'll ever change, but I think we should probably just add #[repr(C)] to the type.

We could/should also have a test to make sure headers can be deserialized properly from disk when restoring a database. Not for this PR, but something to think about. I can create an issue if you don't want to implement for this PR.

We could/should also have a test to make sure headers can be deserialized properly from disk when restoring a database. Not for this PR, but something to think about. I can create an issue if you don't want to implement for this PR.

I think we have this problem generally across the whole code base. I think it's probably isolated to things that implement Storable and we can probably generalize this. I considered adding a method to anything Storable with a better signature, but that's for another PR.

The size of Self actually isn't guaranteed to be constant unless we use #[repr(C)]... In practice, I don't think it'll ever change, but I think we should probably just add #[repr(C)] to the type.

Agreed, but this is a step up from the constant 16 that was there. There was also no guarantee that this created 16 bytes.

I thought about adding a #[repr(_)] but there weren't any great choices:

#[repr(C)] is consistent but there might still be some padding

#[repr(packed)] might create performance problems

#[repr(align(4))] might be a good choice but currently the way the code works it isn't good for the serialization methods.

Agree on both counts, just want to leave evidence that we are aware of these issues

firewood/src/db.rs

…b-header

Easier and probably faster than serializing via hydrate() Not overly concerned about the extra allocations here, probably could do better by not making everything a vector, but this code only happens once.

richardpringle · 2023-09-06T15:38:57Z

Need to fix the "required checks" repo setting still

richardpringle · 2023-09-06T15:51:30Z

firewood/src/db.rs

-#[derive(Debug)]
+/// mutable DB-wide metadata, it keeps track of the root of the top-level trie.
+#[repr(C)]
+#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]


Suggested change

#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]

#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]

?

I don't think these added auto traits are required, and might generate more code that we don't really need.

richardpringle · 2023-09-06T15:55:09Z

firewood/src/db.rs

+            .chain({
+                // compute the DbHeader as bytes
+                let hdr = DbHeader::new_empty();
+                // clippy thinks to_vec isn't necessary, but it actually is :(
+                #[allow(clippy::unnecessary_to_owned)]
+                bytemuck::bytes_of(&hdr).to_vec().into_iter()
+            })
+            .chain({
+                // write out the CompactSpaceHeader
+                let csh = CompactSpaceHeader::new(
+                    NonZeroUsize::new(SPACE_RESERVED as usize).unwrap(),
+                    NonZeroUsize::new(SPACE_RESERVED as usize).unwrap(),
+                );
+                #[allow(clippy::unnecessary_to_owned)]
+                bytemuck::bytes_of(&csh).to_vec().into_iter()
+            })


Now this is a scope issue. If you define hdr and csh at the same scope as .chain, you can use bytemuck::bytes_of(&hdr).iter().copied(). No allocations needed, no clippy-allow.

Definitely can still do a little better but there are diminishing returns since this only happens at db create time, and there are SOOO many more allocations in the main code path, it's fine like this for now.

richardpringle · 2023-09-06T15:57:58Z

firewood/src/db.rs

+                    wal_block_nbit: cfg.wal.block_nbit,
+                    root_hash_file_nbit: cfg.root_hash_file_nbit,
+                };
+                let bytes = bytemuck::bytes_of(&params).to_vec();


Left the same comment below. If you hang on to the params at the same scope as header_bytes, you'll only need to do one allocation for all the header_bytes.

richardpringle · 2023-09-06T15:58:21Z

shale/Cargo.toml

@@ -11,6 +11,7 @@ license = "MIT"
 hex = "0.4.3"
 lru = "0.11.0"
 thiserror = "1.0.38"
+bytemuck = { version = "1.13.1", features = ["derive"] }


richardpringle · 2023-09-06T15:58:39Z

shale/src/compact.rs

@@ -127,7 +127,8 @@ impl Storable for CompactDescriptor {
    }
 }

-#[derive(Debug)]
+#[repr(C)]
+#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]


Suggested change

#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]

#[derive(Copy, Clone, Debug, bytemuck::Zeroable, bytemuck::Pod)]

richardpringle · 2023-09-06T15:59:10Z

shale/src/disk_address.rs

@@ -2,10 +2,13 @@ use std::hash::Hash;
 use std::num::NonZeroUsize;
 use std::ops::{Deref, DerefMut};

+use bytemuck::NoUninit;


Suggested change

use bytemuck::NoUninit;

use bytemuck::{Pod, Zeroable};

richardpringle · 2023-09-06T15:59:22Z

shale/src/disk_address.rs

 use crate::{CachedStore, ShaleError, Storable};

 /// The virtual disk address of an object
-#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq)]
+#[repr(C)]
+#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, NoUninit)]


Suggested change

#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, NoUninit)]

#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, Pod, Zeroable)]

rkuris added 2 commits September 5, 2023 20:27

Don't initialize acc_root

b3e34b3

rkuris requested review from xinifinity and richardpringle as code owners September 5, 2023 20:34

Cargo fmt

a4b6f13

xinifinity reviewed Sep 5, 2023

View reviewed changes

richardpringle reviewed Sep 5, 2023

View reviewed changes

richardpringle approved these changes Sep 5, 2023

View reviewed changes

xinifinity approved these changes Sep 5, 2023

View reviewed changes

rkuris added 3 commits September 5, 2023 23:31

Merge remote-tracking branch 'origin/main' into rkuris/get-ahead-of-d…

7fdd6aa

…b-header

cargo fmt

7d0348b

Switch to bytemuck

76fb768

Easier and probably faster than serializing via hydrate() Not overly concerned about the extra allocations here, probably could do better by not making everything a vector, but this code only happens once.

rkuris requested review from richardpringle and xinifinity September 5, 2023 23:45

xinifinity approved these changes Sep 6, 2023

View reviewed changes

richardpringle reviewed Sep 6, 2023

View reviewed changes

rkuris merged commit 32bcdae into main Sep 6, 2023
5 checks passed

rkuris deleted the rkuris/get-ahead-of-db-header branch September 6, 2023 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up remants of eth #249

Clean up remants of eth #249

rkuris commented Sep 5, 2023

xinifinity Sep 5, 2023

richardpringle left a comment

richardpringle Sep 5, 2023

rkuris Sep 5, 2023

rkuris Sep 5, 2023

richardpringle Sep 5, 2023

richardpringle commented Sep 6, 2023

richardpringle Sep 6, 2023

rkuris Sep 6, 2023

richardpringle Sep 6, 2023

rkuris Sep 6, 2023

richardpringle Sep 6, 2023

richardpringle Sep 6, 2023

richardpringle Sep 6, 2023

richardpringle Sep 6, 2023

richardpringle Sep 6, 2023

	#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]
	#[derive(Copy, Clone, Debug, bytemuck::Pod, bytemuck::Zeroable)]

	#[derive(Copy, Clone, Debug, bytemuck::NoUninit)]
	#[derive(Copy, Clone, Debug, bytemuck::Zeroable, bytemuck::Pod)]

	#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, NoUninit)]
	#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq, Pod, Zeroable)]

Clean up remants of eth #249

Clean up remants of eth #249

Conversation

rkuris commented Sep 5, 2023

Choose a reason for hiding this comment

richardpringle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardpringle commented Sep 6, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment