Optimize z_stream internal buffer allocations #1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request gets rid of "Probably not enough output buffer was allocated" error and redundant allocations of zlib internal buffers.
The idea is to minimize redundant (de-)allocation amount of the zlib internal buffers. Although the madler zlib claims to be thread-safe as long as the allocator passed as zalloc is, this commit carries a thread local static z_stream. It also requires us to manually end the stream: function
deflate_end
introduced.Reference: zlib manual
In fact, this optimization can be pushed just a bit further. Currently, deflate stream is reinitialized on every
klass_predictor_predict
call, but could be initialized once per every thread.Additionally, I wanted to check some SIMD-accelerated implementations of ZLIB, which includes: zlib-chromium, zlib-ng, zlib-cloudflare and [!]zlib-intel.
Mentioned repos are under the zlib license.
Also it would be nice to check out libdeflate. It doesn't include gzip headers, nor checksums and claimed to be ~33% faster than zlib-cloudflare by some benchmarks (Haven't tested myself though).
All benchmarks below were intentionally tested on a single thread.
Every library from the list have been compiled using the default branch.
Benchmarks using ag-news dataset: ./nob; ./build/knn ./data/ag-news/classes.csv ./data/ag-news/test.csv
CPU: i7-12700F, 2100 Mhz
RAM: 32GB DDR5-5200 MT/s XMP 3.0
HDD: Seq 64 127 MB/s
Commit 52e1d9e:
z_stream optimization (from now on will be referred to as "zs opt"):
zs opt + zlib-ng
zs opt + zlib-chromium:
zs opt + zlib-intel:
zs opt + zlib-cloudflare:
Open for discussions
P.S.
After some research, libdeflate seems to work well: Success rate has increased to 82% for the same dataset, avg elapsed time decreased to 1.96734s (Tested with K = 2 and compression level set to 9). In the end the speed has risen by ~28% compared to the original
deflate_sv
.The increase in the success rate makes sense, since zlib and gzip headers are putting some format-specific headers, which potentially might add inaccuracy to the classification.