-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] kvikio v23.12 #322
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-23.10 to branch-23.12
Merge branch-23.10 into branch-23.12 and fix devcontainer CI workflow.
This PR builds conda packages using CUDA 12 on ARM. Closes #281. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) URL: #282
This PR contains a set of performance-related improvements for the batch nvCOMP codec. The short summary of the changes: * Replaced multiple calls to CUDA `memcpyAsync` with a single call to a CUDA kernel. * Removed redundant memory allocations and copies (some of them are the result of the previous change). * Vectorized loops, removed redundant loops. As a result, decompression throughput on ERA5 data increased from **4** GB/s to **31** GB/s for LZ4 algorithm. For highly-compressible data from [nvCOMP benchmark](https://github.com/NVIDIA/nvcomp/blob/main/doc/Benchmarks.md), the increase is even higher: from **6** GB/s to about **110** GB/s. Other algorithms, such as GDeflate, show performance improvements as well. Compression throughput was also improved, though the main target was decompression (compress once - decompress many kind of scenario). Limitations: * these improvements are available only when directly using the codec's batch methods, such as `decode_batch` while passing reasonably-sized batches to saturate the GPU. That means these changes will not be available (for now) to `zarr` users as `zarr` serializes the calls into a sequence of `decode` calls. * to get maximum performance, users should use equal-sized chunks (this is the default behavior in most of the cases anyway, such as `zarr`). Authors: - Alexey Kamenev (https://github.com/Alexey-Kamenev) - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Mads R. B. Kristensen (https://github.com/madsbk) - Lawrence Mitchell (https://github.com/wence-) URL: #293
Forward-merge branch-23.10 to branch-23.12
Forward-merge branch-23.10 to branch-23.12
This PR switches back to using `branch-23.12` for CI workflows because the CUDA 12 ARM conda migration is complete. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) URL: #304
Fixes #270 Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #294
It this PR we introduce `CudaCodec`, which is a base class for all CUDA Condecs/Compressors. This makes it possible to detect if an user tries to open a Zarr file using an incompatible compressor (see #297). Additionally, `kvikio.zarr.open_cupy_array()` now handles `mode="a"` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #298
Removing an old and broken thread-pool module: ```python kvikio.thread_pool.num_threads_reset() kvikio.thread_pool.get_num_threads() ``` Use the default module instead: ```python kvikio.defaults.num_threads_reset() kvikio.defaults.get_num_threads() ``` Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #308
... also added some more examples. Authors: - Mads R. B. Kristensen (https://github.com/madsbk) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #312
Update the nvCOMP version used for compression/decompression to 3.0.4. See also: rapidsai/cudf#13815 rapidsai/rapids-cmake#451 Authors: - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) - Mads R. B. Kristensen (https://github.com/madsbk) - Ray Douglass (https://github.com/raydouglass) URL: #314
Accidentally didn't commit this change in #314.
Update to use non deprecated signatures for `rapids_export` functions Authors: - Robert Maynard (https://github.com/robertmaynard) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #301
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-23.12
and v23.12 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-23.12
until release (merging of this PR).What is the purpose of this PR?
branch-23.12
intomain
for the release