Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize z_stream internal buffer allocations #1

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

zargovv
Copy link

@zargovv zargovv commented Jan 13, 2024

This pull request gets rid of "Probably not enough output buffer was allocated" error and redundant allocations of zlib internal buffers.

The idea is to minimize redundant (de-)allocation amount of the zlib internal buffers. Although the madler zlib claims to be thread-safe as long as the allocator passed as zalloc is, this commit carries a thread local static z_stream. It also requires us to manually end the stream: function deflate_end introduced.
Reference: zlib manual

In fact, this optimization can be pushed just a bit further. Currently, deflate stream is reinitialized on every klass_predictor_predict call, but could be initialized once per every thread.

Additionally, I wanted to check some SIMD-accelerated implementations of ZLIB, which includes: zlib-chromium, zlib-ng, zlib-cloudflare and [!]zlib-intel.

Mentioned repos are under the zlib license.

Also it would be nice to check out libdeflate. It doesn't include gzip headers, nor checksums and claimed to be ~33% faster than zlib-cloudflare by some benchmarks (Haven't tested myself though).

All benchmarks below were intentionally tested on a single thread.
Every library from the list have been compiled using the default branch.
Benchmarks using ag-news dataset: ./nob; ./build/knn ./data/ag-news/classes.csv ./data/ag-news/test.csv
CPU: i7-12700F, 2100 Mhz
RAM: 32GB DDR5-5200 MT/s XMP 3.0
HDD: Seq 64 127 MB/s

Commit 52e1d9e:

Success rate: 79/100 (0.790000)
Total Elapsed: 271.484s
Avg Elapsed: 2.71484

z_stream optimization (from now on will be referred to as "zs opt"):

Success rate: 79/100 (0.790000)
Total Elapsed: 264.248s
Avg Elapsed: 2.64248s

zs opt + zlib-ng

Success rate: 79/100 (0.790000)
Total Elapsed: 282.062s
Avg Elapsed: 2.82062s

zs opt + zlib-chromium:

Success rate: 79/100 (0.790000)
Total Elapsed: 263.481s
Avg Elapsed: 2.63481s

zs opt + zlib-intel:

Success rate: 79/100 (0.790000)
Total Elapsed: 261.152s
Avg Elapsed: 2.61152s

zs opt + zlib-cloudflare:

Success rate: 78/100 (0.780000)
Total Elapsed: 224.895s
Avg Elapsed: 2.24895s


Open for discussions

P.S.
After some research, libdeflate seems to work well: Success rate has increased to 82% for the same dataset, avg elapsed time decreased to 1.96734s (Tested with K = 2 and compression level set to 9). In the end the speed has risen by ~28% compared to the original deflate_sv.
The increase in the success rate makes sense, since zlib and gzip headers are putting some format-specific headers, which potentially might add inaccuracy to the classification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant