Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First crack at serialization with backrefs. #119

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

First crack at serialization with backrefs. #119

wants to merge 13 commits into from

Conversation

richardkiss
Copy link
Contributor

No description provided.

@richardkiss richardkiss marked this pull request as draft May 19, 2022 21:29
@richardkiss richardkiss requested review from arvidn and ChiaMineJP May 19, 2022 21:30
clvm/SExp.py Outdated Show resolved Hide resolved
clvm/serialize.py Outdated Show resolved Hide resolved
clvm/serialize.py Outdated Show resolved Hide resolved
clvm/serialize.py Outdated Show resolved Hide resolved
clvm/object_cache.py Show resolved Hide resolved
clvm/read_cache_lookup.py Show resolved Hide resolved
clvm/serialize.py Outdated Show resolved Hide resolved
clvm/read_cache_lookup.py Outdated Show resolved Hide resolved
clvm/read_cache_lookup.py Outdated Show resolved Hide resolved
tests/serialize_test.py Show resolved Hide resolved
tests/serialize_test.py Show resolved Hide resolved
clvm/serialize.py Show resolved Hide resolved
assert len(b) == 19124

def test_deserialize_bomb(self):
def make_bomb(depth):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more challenging bomb is where each leaf is a unique (but large) subset of the atom. But I suppose the python implementation might not deduplicate substr.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deduplicating substr is a much bigger challenge, since the number of substrings in a string of length n is $O(n^2)$. We'd have to hash all of them, very tricky. Not sure how zlib does it; maybe it just builds a reverse table of prefixes. It's not in the scope of this change for sure.

@richardkiss richardkiss force-pushed the ser_v2 branch 3 times, most recently from d241d3b to 1b437ae Compare June 30, 2022 19:54
@richardkiss richardkiss marked this pull request as ready for review June 30, 2022 19:57
clvm/object_cache.py Outdated Show resolved Hide resolved

def path_to_bytes(path: List[int]) -> bytes:
"""
Convert a list of 0/1 values to a path expected by clvm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small suggetion.
I think adding information showing that left=0 and right=1 is helpful.
Like

"""
Convert a list of 0(left)/1(right) values to a path expected by clvm.
...
"""

clvm/serialize.py Show resolved Hide resolved
@@ -0,0 +1,41 @@
import unittest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, test_object_cache.py should be renamed to object_cache_test.py.

Plus, how about adding read_cache_lookup_test.py?
Following lines to understand how ReadCacheLookup works was a real challenge for me without a testing.
For example, in my understanding root_hash of ReadCacheLookup is a tree-hash representation of read_stack but adding test by original author will greatly help understanding the intention of those code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is difficult to understand and some tests may help elucidate (and test!).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some tests for the ReadCacheLookup class.

tests/serialize_test.py Show resolved Hide resolved
# find BOTH minimal paths to `foo`
self.assertEqual(
rcl.find_paths(foo_hash, serialized_length=20),
set([bytes([4]), bytes([6])]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyCharm warns that set([a, b, c]) can be replaced with {a, b, c}
image


class ObjectCacheTest(unittest.TestCase):
def check(self, obj_text, expected_hash, expected_length):
obj = assemble(obj_text)
Copy link
Contributor

@ChiaMineJP ChiaMineJP Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think circular dependency of clvm_tools should be avoided.
How about removing assemble and replacing with to_sexp_f here?

class ObjectCacheTest(unittest.TestCase):
    def check(self, obj, expected_hash, expected_length):
        th = ObjectCache(treehash)
        self.assertEqual(th.get(obj).hex(), expected_hash)
        sl = ObjectCache(serialized_length)
        self.assertEqual(sl.get(obj), expected_length)

    def test_various(self):
        self.check(
            # "0x00"
            to_sexp_f(b'\x00'),
            "47dc540c94ceb704a23875c11273e16bb0b8a87aed84de911f2133568115f254",
            1,
        )
        self.check(
            # "0"
            to_sexp_f(None),
            "4bf5122f344554c53bde2ebb8cd2b7e3d1600ad631c385a5d7cce23c7785459a", 1
        )
        self.check(
            # "foo"
            to_sexp_f("foo"),
            "0080b50a51ecd0ccfaaa4d49dba866fe58724f18445d30202bafb03e21eef6cb", 4
        )
        self.check(
            # "(foo . bar)",
            to_sexp_f(("foo", "bar")),
            "c518e45ae6a7b4146017b7a1d81639051b132f1f5572ce3088a3898a9ed1280b",
            9,
        )
        self.check(
            # "(this is a longer test of a deeper tree)"
            to_sexp_f([b"this", b"is", b"\x02", b"longer", b"test", b"of", b"\x02", b"deeper", b"tree"]),
            "0a072d7d860d77d8e290ced0fdb29a271198ca3db54d701c45d831e3aae6422c",
            47,
        )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests already depend upon clvm_tools as they use brun internally. I agree though, you're right that I should use to_sexp_f.

tests/read_cache_lookup_test.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants