Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0 FSST implementation #2

Merged
merged 7 commits into from
Aug 12, 2024
Merged

v0 FSST implementation #2

merged 7 commits into from
Aug 12, 2024

Conversation

a10y
Copy link
Contributor

@a10y a10y commented Aug 12, 2024

Initial implementation of FSST.

We implement symbol table building, compression and decompression routines.

  • The implementation is still slow for building the symbol table, I have not done anything to optimize build or compress performance
  • I'm able to replicate the paper results for decoding speed! The decompress-single benchmark is able to decompress at ~1-2 cycles/byte. Note that to get there, I had to implement the unaligned writes pointer arithmetic stuff that you see here in the decompress method
  • Added some benchmarks to compare with LZ4, the baseline from the paper. I just found an off-the-shelf crate. LZ4 is considerably faster at both compression and decompression but doesn't appear to compress my test string well.

counts1: Vec<usize>,

/// Frequency count for each code-pair.
counts2: Vec<Vec<usize>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make this a [[usize; 512]; 512] but that exceeds Rust's default stack size, and I thought it'd be annoying to force every consumer to have to raise their stack limit when they use this crate

@a10y
Copy link
Contributor Author

a10y commented Aug 12, 2024

image

Benchmark results. Note that decompression speed is great, compression speed, which should be 2-4x slower than decompression, is currently 1000x slower. Will need to implement the vectorization stuff.

@@ -0,0 +1 @@
How these papers have been placed in sequence will be made manifest in the reading of them. All needless matters have been eliminated, so that a history almost at variance with the possibilities of later-day belief may stand forth as simple fact. There is throughout no statement of past things wherein memory may err, for all the records chosen are exactly contemporary, given from the standpoints and within the range of knowledge of those who made them. We left in pretty good time, and came after nightfall to Klausenburgh. Here I stopped for the night at the Hotel Royale. I had for dinner, or rather supper, a chicken done up some way with red pepper, which was very good but thirsty. (Mem., get recipe for Mina.) I asked the waiter, and he said it was called “paprika hendl,” and that, as it was a national dish, I should be able to get it anywhere along the Carpathians. I found my smattering of German very useful here; indeed, I don’t know how I should be able to get on without it.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a10y a10y merged commit e2a4c50 into develop Aug 12, 2024
1 check passed
@a10y a10y deleted the aduffy/initial branch August 12, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant