Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sumtree]: Initialization and Insertion for Sumtree #56

Merged
merged 23 commits into from
Mar 15, 2024

Conversation

crnbarr93
Copy link
Collaborator

Closes: #54 #55

What is the purpose of the change

These changes introduce the concept of a sumtree in to our orderbook implementation. The implementation is separate from the standard linear orderbook as a completely new contract. The implementation of the sumtree revolves around the state mentioned in #54, the only difference being the naming of the Node struct as TreeNode.

Several utility methods were added to the TreeNode struct to ease readability of insertion logic. Primarily getters and setters for all values.

The min and max range defined in these methods can be defined as follows:

  1. For an Internal node this is simply the range values.
  2. For a Leaf node the min is the Effective Total Amount Sold (ETAS) of the node and the max is the value of the node added to the ETAS

The get_value method returns the following:

  1. For an Internal node it returns the accumulator (sum of all value in ancestral leafs)
  2. For a Leaf node it returns the value of the leaf (how much was canceled with the order)

Insertion logic is handled using insert which is a recursive function that traverses the tree and inserts when specified by the conditions outlined in #55 . Each condition is handled on a case by case basis, invoking priority based on code execution order.

Internal node values are updated as the tree is traversed by the insertion logic, with the assumption that if a node is visited then the new node will be an ancestor of that node. Each individual case is labeled via comments and handled in the test_node_insert_valid test in test_node.rs.

There is a potential upgrade in insertion logic (currently insertion leans left) by judging traversal based on closest range. There is also an issue where the range of each initial leaf is judged by the first two nodes added on each side.

Testing and Verifying

Tests can be run using:

cargo unit-test test_node
  • Added a test for all 6 possible node insertion cases

@crnbarr93 crnbarr93 linked an issue Mar 8, 2024 that may be closed by this pull request
Copy link
Collaborator

@AlpinYukseloglu AlpinYukseloglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job – looks like happy path insertions are working generally correctly! This implementation is still likely vulnerable to being pushed to a linear-time worst case, but we should discuss how to address this offline and do a follow up PR.

Did a first pass with comments on readability (especially on the core insertion function & tests) that we should address before merging. Happy to discuss these and move them to separate issues to not block this as well

contracts/sumtree-orderbook/NOTICE Outdated Show resolved Hide resolved
/// If the node is a leaf it will be inserted by the following priority:
/// 1. New node fits in either left or right range, insert accordingly
/// 2. Left is empty, insert left
/// 3. Incompatible left, Right is empty, insert right
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we clarify what incompatible refers to here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let maybe_right = self.get_right(storage)?;

// Check if left node exists
if let Some(mut left_node) = maybe_left {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on splitting these branches up into higher level conditionals that are only 1 branch deep and have variable names for the conditions (e.g. leftLeaf && rightInternalInRange)? I'm finding these multi layer if/elses somewhat difficult to keep track of mentally while reviewing, and I suspect future reviewers will feel similarly.

This would also help us better understand where there is shared logic or overlap that can help lower the number of cases we need to separately handle

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// New node is being placed below current, update ranges and accumulator as tree is traversed
self.add_value(new_node.get_value())?;
if self.get_min_range() > new_node.get_min_range() {
self.set_min_range(new_node.get_min_range())?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk here that the range of the left child of root gets expanded to overlap with the right child of root? If this is the case, I suspect this would get in the way of prefix sum calculations.

Unfortunately we probably can't completely avoid mutating ranges for internal nodes, since creating a new internal node/layer for every out of range insertion can potentially be spammed.

I guess the specific case we're dealing with here is: what do we do when the gap between left and right child ranges is less than the size of an order at the high end of the left range? My intuition is that this shouldn't be possible since individual orders can't have overlapping ranges, but we should validate & document this well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should include an edge case test for this as part of rebalancing

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Created an issue here: #62


use crate::sumtree::node::{generate_node_id, NodeType, TreeNode, NODES};

pub fn print_tree(storage: &dyn Storage, node: &TreeNode, depth: u8, top: bool) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on having the trees print vertically so we can compare better against hand-solved examples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5e34b08

I'm not sure how well this algorithm will scale with tree size, it quickly becomes very unreadable. It may even have issues with node string length when printing the nodes.

contracts/sumtree-orderbook/src/sumtree/test/test_node.rs Outdated Show resolved Hide resolved
@AlpinYukseloglu AlpinYukseloglu mentioned this pull request Mar 11, 2024
73 tasks
Copy link
Collaborator

@AlpinYukseloglu AlpinYukseloglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work with the fixes and readability for tests. I created issues for non-blocking comments to not hold up this PR since other ones depend on it, but I did have one blocking question about a potential broken invariant for case 3 (and I suspect others as well if we consider deletion)

// New node is being placed below current, update ranges and accumulator as tree is traversed
self.add_value(new_node.get_value())?;
if self.get_min_range() > new_node.get_min_range() {
self.set_min_range(new_node.get_min_range())?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Created an issue here: #62

s
}

pub fn print_tree_row(row: Vec<(Option<TreeNode>, Option<TreeNode>)>, top: bool, height: u32) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if it would make sense to move these helpers to the end of the file to not distract from the core tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, easier to look straight at the test cases. Printing algorithm is less important.

#[test]
fn test_node_insert_valid() {
let book_id = 1;
let tick_id = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we expect to have independent trees on each tick for each orderbook, we should probably have another layer of tests that does not hardcode these and builds multiple trees in the same test case.

Moved this into an issue here #64 since this makes sense to implement after rebalancing logic is complete to avoid needing to recompute too many test cases

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we test this independently? Or as part of order cancellation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have separate sanity check/unit tests for cancellation & higher level flows

// ┌────────
// 2: 1 10
TestNodeInsertCase {
name: "Case 2: Left Empty Right Empty",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to say Left Leaf, Right Empty?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case is the precondition, this tree is completely empty with only one insert. Might be a better descriptor.

// ┌────────────────┐
// 2: 1 10 3: 12 10
TestNodeInsertCase {
name: "Case 3: Left Leaf, Right Empty",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one Left Leaf, Right Leaf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5f1151e

Does this make the cases clearer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see. Probably worth having a distinction between the precondition and the action being tested (e.g. [Left Internal, Right Internal] -> Insert right side). Non blocking though

}

// Case 3
if !is_in_left_range && maybe_right.is_none() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err does this make it possible to insert a node with an ETAS that is below the range of the left child into the right child?

e.g. assume you create two orders and cancel (insert) the second. You now have a left leaf (right empty). When you cancel (insert) the first, it extends the minimum range of the root node but places the order as the right child?

For the prefix sum to work, all the leaves need to be stored in increasing ETAS from left to right, so unless I'm missing something here we might need to add some more cases :/

Copy link
Collaborator Author

@crnbarr93 crnbarr93 Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

78403e4

Added two more cases:

  1. As describe, node on left is higher than new node, right empty, reorder
  2. Node on right is smaller than new node, left empty, reorder

@crnbarr93
Copy link
Collaborator Author

I'm wondering if min/max range is a sufficient check? Could we consider checking the median value of the node and comparing which is closer when the node is between left and right values?

Could improve balance.

@AlpinYukseloglu
Copy link
Collaborator

I'm wondering if min/max range is a sufficient check? Could we consider checking the median value of the node and comparing which is closer when the node is between left and right values?

Oh hmm that's an interesting idea – moving this to an offline discussion to dive a bit deeper

Copy link
Collaborator

@AlpinYukseloglu AlpinYukseloglu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM as a happy-path implementation. There were several loose threads/bugs that I either created issues for or committed directly to not block this PR further. Happy to discuss offline & make additional changes in follow up PRs.

Also as a heads up I primarily reviewed the sumtree logic in this PR and only looked at the tick wiring parts at a very cursory level.


// Case 3: reordering
let is_lower_than_left_leaf = maybe_left.clone().map_or(false, |l| {
!l.is_internal() && new_node.get_max_range() <= l.get_min_range()
Copy link
Collaborator

@AlpinYukseloglu AlpinYukseloglu Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be handling range bounds inconsistently/potentially incorrectly (i.e. this assumes they're fully exclusive while splitting logic assumes they're fully inclusive, and the correct approach is probably neither). Created an issue to track this #68

(self.key, new_node.key)
} else {
(new_node.key, self.key)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems technically correct but a somewhat risky implementation. We should probably be much more explicit in the check here (the min of one node being greater than the min of the other feels excessively loose + we should probably have explicit cases & error in else)

Created issue here #70

new_node.get_max_range().max(self.get_max_range()),
);

let mut new_parent = TreeNode::new(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be setting the new node's parent here right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed this change to not block the PR further – if this was intentional/I made a mistake we can fix in a follow up PR

// Case 5: Reordering
// TODO: Add edge case test for this
let is_higher_than_right_leaf = maybe_right.clone().map_or(false, |r| {
!r.is_internal() && new_node.get_min_range() >= r.get_min_range()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be new_node.get_min_range() >= r.get_max_range() right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committed the fix directly for this one as well

}

// Case 5: Reordering
// TODO: Add edge case test for this
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue to track this so we don't forget #69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Sumtree]: Sumtree Insertion [Sumtree]: Implement Tree Structure and State
2 participants