-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Sumtree]: Initialization and Insertion for Sumtree #56
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job – looks like happy path insertions are working generally correctly! This implementation is still likely vulnerable to being pushed to a linear-time worst case, but we should discuss how to address this offline and do a follow up PR.
Did a first pass with comments on readability (especially on the core insertion function & tests) that we should address before merging. Happy to discuss these and move them to separate issues to not block this as well
/// If the node is a leaf it will be inserted by the following priority: | ||
/// 1. New node fits in either left or right range, insert accordingly | ||
/// 2. Left is empty, insert left | ||
/// 3. Incompatible left, Right is empty, insert right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clarify what incompatible refers to here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let maybe_right = self.get_right(storage)?; | ||
|
||
// Check if left node exists | ||
if let Some(mut left_node) = maybe_left { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on splitting these branches up into higher level conditionals that are only 1 branch deep and have variable names for the conditions (e.g. leftLeaf && rightInternalInRange
)? I'm finding these multi layer if/elses somewhat difficult to keep track of mentally while reviewing, and I suspect future reviewers will feel similarly.
This would also help us better understand where there is shared logic or overlap that can help lower the number of cases we need to separately handle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// New node is being placed below current, update ranges and accumulator as tree is traversed | ||
self.add_value(new_node.get_value())?; | ||
if self.get_min_range() > new_node.get_min_range() { | ||
self.set_min_range(new_node.get_min_range())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a risk here that the range of the left child of root gets expanded to overlap with the right child of root? If this is the case, I suspect this would get in the way of prefix sum calculations.
Unfortunately we probably can't completely avoid mutating ranges for internal nodes, since creating a new internal node/layer for every out of range insertion can potentially be spammed.
I guess the specific case we're dealing with here is: what do we do when the gap between left and right child ranges is less than the size of an order at the high end of the left range? My intuition is that this shouldn't be possible since individual orders can't have overlapping ranges, but we should validate & document this well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include an edge case test for this as part of rebalancing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Created an issue here: #62
|
||
use crate::sumtree::node::{generate_node_id, NodeType, TreeNode, NODES}; | ||
|
||
pub fn print_tree(storage: &dyn Storage, node: &TreeNode, depth: u8, top: bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on having the trees print vertically so we can compare better against hand-solved examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how well this algorithm will scale with tree size, it quickly becomes very unreadable. It may even have issues with node string length when printing the nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work with the fixes and readability for tests. I created issues for non-blocking comments to not hold up this PR since other ones depend on it, but I did have one blocking question about a potential broken invariant for case 3 (and I suspect others as well if we consider deletion)
// New node is being placed below current, update ranges and accumulator as tree is traversed | ||
self.add_value(new_node.get_value())?; | ||
if self.get_min_range() > new_node.get_min_range() { | ||
self.set_min_range(new_node.get_min_range())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Created an issue here: #62
s | ||
} | ||
|
||
pub fn print_tree_row(row: Vec<(Option<TreeNode>, Option<TreeNode>)>, top: bool, height: u32) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I wonder if it would make sense to move these helpers to the end of the file to not distract from the core tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, easier to look straight at the test cases. Printing algorithm is less important.
#[test] | ||
fn test_node_insert_valid() { | ||
let book_id = 1; | ||
let tick_id = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we expect to have independent trees on each tick for each orderbook, we should probably have another layer of tests that does not hardcode these and builds multiple trees in the same test case.
Moved this into an issue here #64 since this makes sense to implement after rebalancing logic is complete to avoid needing to recompute too many test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we test this independently? Or as part of order cancellation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have separate sanity check/unit tests for cancellation & higher level flows
// ┌──────── | ||
// 2: 1 10 | ||
TestNodeInsertCase { | ||
name: "Case 2: Left Empty Right Empty", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to say Left Leaf, Right Empty
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case is the precondition, this tree is completely empty with only one insert. Might be a better descriptor.
// ┌────────────────┐ | ||
// 2: 1 10 3: 12 10 | ||
TestNodeInsertCase { | ||
name: "Case 3: Left Leaf, Right Empty", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one Left Leaf, Right Leaf
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make the cases clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I see. Probably worth having a distinction between the precondition and the action being tested (e.g. [Left Internal, Right Internal] -> Insert right side
). Non blocking though
} | ||
|
||
// Case 3 | ||
if !is_in_left_range && maybe_right.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err does this make it possible to insert a node with an ETAS that is below the range of the left child into the right child?
e.g. assume you create two orders and cancel (insert) the second. You now have a left leaf (right empty). When you cancel (insert) the first, it extends the minimum range of the root node but places the order as the right child?
For the prefix sum to work, all the leaves need to be stored in increasing ETAS from left to right, so unless I'm missing something here we might need to add some more cases :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added two more cases:
- As describe, node on left is higher than new node, right empty, reorder
- Node on right is smaller than new node, left empty, reorder
I'm wondering if min/max range is a sufficient check? Could we consider checking the median value of the node and comparing which is closer when the node is between left and right values? Could improve balance. |
Oh hmm that's an interesting idea – moving this to an offline discussion to dive a bit deeper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM as a happy-path implementation. There were several loose threads/bugs that I either created issues for or committed directly to not block this PR further. Happy to discuss offline & make additional changes in follow up PRs.
Also as a heads up I primarily reviewed the sumtree logic in this PR and only looked at the tick wiring parts at a very cursory level.
|
||
// Case 3: reordering | ||
let is_lower_than_left_leaf = maybe_left.clone().map_or(false, |l| { | ||
!l.is_internal() && new_node.get_max_range() <= l.get_min_range() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to be handling range bounds inconsistently/potentially incorrectly (i.e. this assumes they're fully exclusive while splitting logic assumes they're fully inclusive, and the correct approach is probably neither). Created an issue to track this #68
(self.key, new_node.key) | ||
} else { | ||
(new_node.key, self.key) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems technically correct but a somewhat risky implementation. We should probably be much more explicit in the check here (the min of one node being greater than the min of the other feels excessively loose + we should probably have explicit cases & error in else
)
Created issue here #70
new_node.get_max_range().max(self.get_max_range()), | ||
); | ||
|
||
let mut new_parent = TreeNode::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also be setting the new node's parent here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committed this change to not block the PR further – if this was intentional/I made a mistake we can fix in a follow up PR
// Case 5: Reordering | ||
// TODO: Add edge case test for this | ||
let is_higher_than_right_leaf = maybe_right.clone().map_or(false, |r| { | ||
!r.is_internal() && new_node.get_min_range() >= r.get_min_range() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be new_node.get_min_range() >= r.get_max_range()
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Committed the fix directly for this one as well
} | ||
|
||
// Case 5: Reordering | ||
// TODO: Add edge case test for this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue to track this so we don't forget #69
Closes: #54 #55
What is the purpose of the change
These changes introduce the concept of a sumtree in to our orderbook implementation. The implementation is separate from the standard linear orderbook as a completely new contract. The implementation of the sumtree revolves around the state mentioned in #54, the only difference being the naming of the
Node
struct asTreeNode
.Several utility methods were added to the
TreeNode
struct to ease readability of insertion logic. Primarily getters and setters for all values.The
min
andmax
range defined in these methods can be defined as follows:Internal
node this is simply therange
values.Leaf
node themin
is the Effective Total Amount Sold (ETAS) of the node and themax
is the value of the node added to theETAS
The
get_value
method returns the following:Internal
node it returns theaccumulator
(sum of all value in ancestral leafs)Leaf
node it returns thevalue
of the leaf (how much was canceled with the order)Insertion logic is handled using insert which is a recursive function that traverses the tree and inserts when specified by the conditions outlined in #55 . Each condition is handled on a case by case basis, invoking priority based on code execution order.
Internal node values are updated as the tree is traversed by the insertion logic, with the assumption that if a node is visited then the new node will be an ancestor of that node. Each individual case is labeled via comments and handled in the
test_node_insert_valid
test intest_node.rs
.There is a potential upgrade in insertion logic (currently insertion leans left) by judging traversal based on closest range. There is also an issue where the range of each initial leaf is judged by the first two nodes added on each side.
Testing and Verifying
Tests can be run using: