Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rkuris/streaming iterator from start #346

Merged
merged 34 commits into from
Nov 15, 2023
Merged

Conversation

rkuris
Copy link
Collaborator

@rkuris rkuris commented Nov 11, 2023

This walks down the leftmost branch every time, recording the path down, then figuring out what the key and value is at the bottom, then resumes.

Also refactors get_key_from_parents, as this is now done 3 times in the code, sometimes when we add a leaf key value, and other times when not.

There are several TODOs here, but the code works for the two test cases
I created. Both of these avoid making extension nodes, which are going
away, and this code may need some tweaks to support values inside branch
nodes.
Per review comments
IteratorState is any of:
 - Start at the beginning of the trie
 - Start at a specific key value of the trie
 - Continue after a saved node and parents

See the inline comments for more details.

TODO:
 - Implement StartAtBegininng (and tests)
 - If a key is provided that doesn't exist, start iterating at the next
   one (if we really need this)
 - If an error is returned, we should next return None, then start from
   the beginning. This makes the resumption after an error a little
   cleaner. We'd need another IteratorState for this.
 - Remove extension nodes (depends on other diffs)
This was an artifact from an earlier change. Reverted back to original.
Not sure how this got in there. Reverting it.
Seems like we can just reference self and it works!
Removed unnecessary clone
Added additional comments
Added TODO for a few things:
 - possible to return a reference to the value instead of a copy
 - should handle case of branch nodes with values while traversing

Adding a randomized key-value test after extension nodes are removed is
also recommended.
@rkuris rkuris self-assigned this Nov 11, 2023
@rkuris
Copy link
Collaborator Author

rkuris commented Nov 11, 2023

Depends on streaming iterator spike PR

@rkuris rkuris marked this pull request as ready for review November 11, 2023 02:13
This walks down the leftmost branch every time, recording the path down,
then figuring out what the key and value is at the bottom, then resumes.

Also refactors get_key_from_parents, as this is now done 3 times in the
code, sometimes when we add a leaf key value, and other times when not.
@rkuris rkuris force-pushed the rkuris/streaming_iterator_from_start branch from 059f9dd to 8b3648f Compare November 11, 2023 03:44
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Show resolved Hide resolved
firewood/src/merkle.rs Show resolved Hide resolved
firewood/src/merkle.rs Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
Moved the comments about the behavior at the end of the stream to doc
comments on the iterator.

Per review comments.
Not 100% sure this is what the reviewer asked for, but this is okay too.
Copy link
Contributor

@richardpringle richardpringle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of stylistic suggestions, I guess those can all be considered nits.

I do, however, think that if it's possible to write code that doesn't panic, we should absolutely be writing code that cannot panic, even in the event of code changes. It's always better to have the compiler uphold your invariants.

As for the tests, splitting case coverage into separate unit tests also makes refactoring and debugging much easier. Unit tests should cover a single case (within reason) such that it's obvious what failure is occurring. Right now, if we wanted to change the behaviour of the iterator starting over from the beginning, we would have to modify a test that covers multiple other cases. We could introduce a bug into our test coverage of a case that we did not want to change and cause a regression.

firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
firewood/src/merkle.rs Outdated Show resolved Hide resolved
Comment on lines 1366 to 1378
let found_offset = children[child_position as usize..]
.iter()
.position(|&addr| addr.is_some());

if let Some(found_offset) = found_offset {
child_position += found_offset as u8;
} else {
next = parents.pop();
continue;
}

let addr = children[child_position as usize].unwrap();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let found_offset = children[child_position as usize..]
.iter()
.position(|&addr| addr.is_some());
if let Some(found_offset) = found_offset {
child_position += found_offset as u8;
} else {
next = parents.pop();
continue;
}
let addr = children[child_position as usize].unwrap();
let found_offset = children
.iter()
.skip(child_position as usize)
.enumerate()
.filter_map(|(i, addr)| addr.map(|addr| (i as u8, addr)))
.next();
let Some((found_offset, addr)) = found_offset else {
next = parents.pop();
continue;
};
child_position += found_offset;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another case of removing the Option being easier. Can we just land #341 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed per the suggestion


(found_key, value)
}
_ => unreachable!(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a foot gun if anyone changes the code. It would be better to have

let (found_key, value) = match std::mem::take(&mut self.key_state) {

at the top of the poll_next method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get this to work. It's a bit tricky because of what's borrowed and moved. I'll create a task for fixing this in a followup.

firewood/src/merkle.rs Outdated Show resolved Hide resolved
@rkuris rkuris mentioned this pull request Nov 14, 2023
You have to add one only if it wasn't None
@rkuris rkuris linked an issue Nov 14, 2023 that may be closed by this pull request
Added cases of starting in the middle and at the last key
Copy link
Contributor

@richardpringle richardpringle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly have preferred no panicking code because I don't think that #341 should be merged. I will leave a comment in that PR as to why.

@rkuris
Copy link
Collaborator Author

rkuris commented Nov 14, 2023

I would strongly have preferred no panicking code because I don't think that #341 should be merged. I will leave a comment in that PR as to why.

Fair point; lets make these changes before merging then.

Per review comments
The loop to calculate the next found position and address can be a
little simpler using enumerate, skip, and filter_map.
Also removed incorrect comment and comment about DiskAddress::is_null
Copy link
Contributor

@richardpringle richardpringle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥇

Thanks for the patience

@rkuris rkuris merged commit 1989a96 into main Nov 15, 2023
5 checks passed
@rkuris rkuris deleted the rkuris/streaming_iterator_from_start branch November 15, 2023 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Support iterators
3 participants