Skip to content
This repository has been archived by the owner on Sep 16, 2024. It is now read-only.

Generic Resource Identifiers with ResourceIdTrait #90

Merged
merged 7 commits into from
Mar 22, 2024

Conversation

tareknaser
Copy link
Collaborator

Description

This PR aims to make ResourceIndex type generic over the hash function used to generate resource identifiers

Resolves #75

Changes

  • A new trait is implemented named ResourceIdTrait. This trait defines a set of functions and requirements that all specific ResourceId implementations must implement
  • Within ResourceIdTrait, a type alias named HashType is defined. This allows different ResourceId implementations to use different types for their hash values, but we ensure that all HashType types implement specific traits for consistent behavior
  • Modify ResourceIdCrc32 type to implement the ResourceIdTrait

Future work

  • src/index.rs file heavily relies on the assumption of only using ResourceIdCrc32. It will require adjustments, potentially including abstracting some tests, to function with various ResourceId types.
  • We can add conditional compilation easily. for example we can use ResourceIdBlake3 that implements ResourceIdTrait
  • Later, when Refactor Arklib into Separate Crates ark-core#5 is merged, we can move src/resource directory into a separate crate for better organization.

Copy link

Benchmark for a882b55

Click to view benchmark
Test Base PR %
compute_bytes_large/compute_bytes 141.6±5.35µs 141.3±3.38µs -0.21%
compute_bytes_medium/compute_bytes 27.5±0.52µs 27.6±0.42µs +0.36%
compute_bytes_small/compute_bytes 128.9±1.00ns 132.7±23.09ns +2.95%
index_build/index_build/tests/ 160.9±2.48µs 164.1±0.64µs +1.99%
tests/lena.jpg/compute_bytes 13.3±0.04µs 13.4±0.06µs +0.75%
tests/test.pdf/compute_bytes 130.1±3.14µs 114.1±4.70µs -12.30%

@tareknaser
Copy link
Collaborator Author

Take a look at https://github.com/ARK-Builders/arklib/tree/blake3_ResourceIdTrait. It shows how easy it is to include a new ResourceId type that implements ResourceIdTrait, featuring Blake3 as the hash function in this instance.
Also notice that ResourceIndex still has collision tracking although blake3 is a cryptographic hash function and there should be no collisions.

src/resource/crc32.rs Outdated Show resolved Hide resolved
Comment on lines 60 to 62
fn get_hash(&self) -> Self::HashType {
self.hash
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that we need to expose this. Probably ids should be "black boxes", i.e. users should only use them (via Eq, PartialEq, Ord, Display traits) without going into the details of how the ids are constructed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ResourceId has data_size and hash as public fields and I see them used in many places all over the codebase (especially in unit tests).
I thought if the hash is so frequently used, might be a good idea to add a method to get it.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's consider this by use-cases:

  1. Unit tests should be able to access internal fields, but they should not be exposed to users/apps.
  2. Users/apps could use data_size but if they need it, they would need it with other hash functions, too. It's better to provide such a field in some other way, generically for any hash function. Since we just pass this data to crc32 generation function, we could save it somewhere around. I don't think that users really need to do something with hash value, only with the id as a whole.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests should be able to access internal fields

In the current setup, implementers of ResourceIdTrait have the freedom to decide whether to expose data_size and hash or not. It may be best not to enforce methods like get_hash in the trait itself.

Same goes for any methods related to data_size, considering that this field might not be applicable in all implementations of ResourceIdTrait, especially in cryptographic hash function implementations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't include data_size in the draft for BLAKE3 implementation. Here #73 (comment)

Copy link

github-actions bot commented Mar 7, 2024

Benchmark for 1e81c2b

Click to view benchmark
Test Base PR %
compute_bytes_large/compute_bytes 143.4±0.68µs 467.0±2.02µs +225.66%
compute_bytes_medium/compute_bytes 28.4±0.20µs 26.8±0.39µs -5.63%
compute_bytes_small/compute_bytes 134.3±1.00ns 127.6±0.51ns -4.99%
index_build/index_build/tests/ 166.4±11.56µs 158.2±1.45µs -4.93%
tests/lena.jpg/compute_bytes 14.0±0.10µs 13.4±0.16µs -4.29%
tests/test.pdf/compute_bytes 113.3±1.89µs 110.7±0.66µs -2.29%

Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Signed-off-by: Tarek <tareknaser360@gmail.com>
Copy link

Benchmark for d95e0a5

Click to view benchmark
Test Base PR %
compute_bytes_large/compute_bytes 487.3±1.14µs 138.4±2.41µs -71.60%
compute_bytes_medium/compute_bytes 29.1±0.47µs 28.2±0.93µs -3.09%
compute_bytes_small/compute_bytes 135.0±2.20ns 127.4±3.10ns -5.63%
index_build/index_build/tests/ 175.9±6.09µs 174.9±1.20µs -0.57%
tests/lena.jpg/compute_bytes 13.9±0.09µs 13.3±0.12µs -4.32%
tests/test.pdf/compute_bytes 113.5±0.31µs 110.1±0.77µs -3.00%

Copy link
Member

@kirillt kirillt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@tareknaser tareknaser merged commit 6263e61 into main Mar 22, 2024
2 checks passed
@tareknaser tareknaser deleted the genericResourceId branch March 22, 2024 04:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index generic over hash function
2 participants