Releases: CHIP-SPV/chipStar
chipStar release 1.2
Release Notes
This release brings significant stability and performance improvements, enhanced support for CUDA, new HIP/ROCm library ports and integrations for HipBLAS, HipFFT, HipRAND/RocRAND. Initial testing of running HIP/CUDA applications on RISC-V.
Tested Platforms
- Intel, AMD CPUs via Intel Compute Runtime
- Intel GPUs via Neo i915 driver
- ARM Mali GPUs (Quartz64 SBC)
- RISC-V (Starfive Visionfive 2 SBC Debian, experimental)
- AMD GPUs via rusticl(exploratory work)
Notable Changes
-
Introduced
cucc
, a drop-in replacement fornvcc
:- Added
cucc
, enabling direct compilation of CUDA sources. - Added
nvcc
softlink, allowing you to compile CUDA sources without making any changes. - Adjusted CUDA headers to improve compatibility with CUDA sources, including a dummy
cublas_v2.h
header to prevent conflicts with system headers.
- Added
-
Enhanced OpenCL backend:
- Support for
cl_ext_buffer_device_address
extension:- Added support for devices featuring the
cl_ext_buffer_device_address
extension, improving memory management capabilities.
- Added support for devices featuring the
- Optimized queue profiling:
- The OpenCL backend now uses non-profiling queues by default and switches to profiling queues only when needed, resulting in performance improvements.
- Various other performance optimizations
- Support for
-
Fixed Level Zero backend issues:
- Addressed out-of-memory (OOM) errors:
- Fixed memory leaks and improved resource management to prevent OOM errors during heavy workloads.
- Improved thread safety:
- Implemented mutexes and synchronization mechanisms to enhance thread safety within the Level Zero backend.
- Addressed out-of-memory (OOM) errors:
-
Rebased to HIP 6.x and updated hip-tests:
- Updated the codebase to be compatible with HIP 6.x.
Library Support Changes
- Expanded HIP library support:
- HipBLAS integration:
- Introduced the
CHIP_BUILD_HIPBLAS
option to enable building HipBLAS.
- Introduced the
- HipFFT integration:
- Introduced the
CHIP_BUILD_HIPFFT
option to enable building HipFFT.
- Introduced the
- RocRAND port:
- HipBLAS integration:
v1.2-RC1
What's Changed
- Refactor WaitForThreadExit by @pvelesko in #752
- Fix #757: skip texture tests with iGPU+OpenCL when USM=ON by @franz in #758
- adjust modules due to NFS going down by @pvelesko in #766
- Fix tests were unintentinally skipped by @linehill in #764
- remove Unit_hipMemsetFunctional_ZeroSize_hipMemsetD32 from exclusion list by @pvelesko in #767
- update ROCm-Device-Libs by @pvelesko in #765
- page lock runner test by @pvelesko in #773
- Various improvements by @pvelesko in #774
- Dynamic event pools by @pvelesko in #771
- Update cpp-linter-action version by @pvelesko in #777
- Map device built-ins to compiler built-ins by @linehill in #763
- Level-zero-premature-exit by @pvelesko in #778
- Add sanity check for catching unexpected atomic built-ins by @linehill in #706
- Use a fence for syncing RCL by @pvelesko in #688
- default to Debug by @pvelesko in #776
- Adjustments for future LLVM-18 release by @linehill in #714
- SYCL-HIP Interop - Drop RCL/ICL Quer by @pvelesko in #781
- OpenCL Event Cleanup by @pvelesko in #788
- OpenCL: Fix indirect USM pointer related issues by @linehill in #790
- Add CHIP_LAZY_JIT environment option to control JIT timing by @linehill in #786
- Remove SPIR-V version check in the parser by @linehill in #787
- Backend handles refactor by @pvelesko in #789
- Fixup exluded tests by @pvelesko in #795
- Add CHIP_DEVICE_TYPE to documentation by @karlwessel in #800
- Fix HIP float intrinsics were mapped double built-ins by @linehill in #793
- fix name of the cuda compiler script by @karlwessel in #802
- OpenCL BE: set CHIP_USE_INTEL_USM on by default by @linehill in #791
- Sample and Test Profiling by @pvelesko in #804
- Linter Fix include complaint by @pvelesko in #805
- Changes to reduce kernel launch overheads by @linehill in #794
- Fix Event Collection by @pvelesko in #803
- OpenCL: Skip SVM pointer annotation if possible by @linehill in #785
- Remove a confusing already registered and mapped warning by @linehill in #809
- Level Zero Refactor + Bugfixes by @pvelesko in #817
- Refactor Known Failing Tests by @pvelesko in #822
- Fix called incorrect compiler built-ins by @linehill in #820
- Internalize
__device__
functions by @linehill in #819 - Implement FencedCmdLists by @pvelesko in #823
- Rebase HIP 6.x + Update hip-tests by @pvelesko in #796
- HIPCC Fixes by @pvelesko in #827
- Add CHIP_BUILD_HIPBLAS option by @pvelesko in #831
- OpenCL: Use non-profiling queue, switch to profiling when needed by @linehill in #814
- Fixes scripts/configure_llvm.sh by @linehill in #835
- Use loginfo for printing device info by @pvelesko in #839
- OpenCL: Fix memory leak / OoM and stack overflow by @linehill in #837
- Fix bunch texture cases by @linehill in #842
- Ubuntu Fixes by @pvelesko in #825
- Small Fixes by @pvelesko in #844
- Various small optimizations by @linehill in #816
- Level Zero - Fix OOM & Improve Thread Safety by @pvelesko in #845
- Add a workaround for name mangling issue with PowerVR OpenCL by @franz in #828
- Add libCEED to testing + Update hipBLAS w/sync by @pvelesko in #847
- update spirv_hip_complex.h header by @pvelesko in #856
- rtdevlib: fix function signature mismatches by @linehill in #851
- OpenCL: Support devices with cl_ext_buffer_device_address by @linehill in #830
- Add SKIP_TESTS_WITH_DOUBLES Option by @pvelesko in #826
- [HipBLAS] Fix hiblas.h and hipsolver header conflicts by @pvelesko in #852
- Small Fixes by @pvelesko in #862
- New CUDA compiler by @pvelesko in #858
- LLVM Configure script changes by @pvelesko in #864
- known_failures.yaml add hostname key by @pvelesko in #867
- spirv-extractor link fix by @pvelesko in #871
- update configure_llvm for IPO by @pvelesko in #870
- Submodules track branches by @pvelesko in #872
- Fix math function j1 typo in dp_math.hh by @jjennychen in #876
- fixed cudaMallocManaged function parameter type issue by @jjennychen in #878
- CUDA Compiler Refactor by @pvelesko in #875
- Docker Images + update linter github action by @pvelesko in #879
- Update DockerfileFull by @pvelesko in #881
- Implement missing host-side math functions by @pvelesko in #884
- Adding runtime error conversion for Level0 backend by @jjennychen in #886
- Fix 885 by @pvelesko in #889
- Fix 887 by @pvelesko in #888
- handle relocatable code flags cucc by @pvelesko in #892
- Add more implicit casts to dim3 by @pvelesko in #895
- update HIPCC to preserve ordering by @pvelesko in #899
- spirv_hip_fp16.h header file updates by @jjennychen in #896
- ARM CI by @pvelesko in #903
- skip kernel annotation on CPU by @pvelesko in #905
- use github.sha for docker by @pvelesko in #907
- docker ref fix by @pvelesko in #908
- docker build only on merge to main by @pvelesko in #909
- Expand the use of error maps by @pvelesko in #891
- Ajust known_failures for abort,assert by @pvelesko in #906
- Properly annotate Intel USM kernels by @pvelesko in #911
- Make adjustments for LLVM-19 by @linehill in #901
- Fix device-side functions by @pvelesko in #913
- Enable building of hipFFT by @pvelesko in #912
- OpenCL Backend Fixes by @pvelesko in #914
- hipStreamSemantics Fixes by @pvelesko in #917
- Cleanup by @pvelesko in #918
New Contributors
- @karlwessel made their first contribution in #800
- @jjennychen made their first contribution in #876
Full Changelog: v1.1...v1.2-RC1
chipStar release 1.1
chipStar release 1.1
This release cycle focused on stabilization and performance improvements
over the 1.0 release. The release was measured to run some benchmarks up
to twice as fast as 1.0, with an average improvement of 30% measured on HeCBench.
Further highlights are described in the following.
Release Highlights
- Added support for Clang/LLVM 17. LLVM 15 and 16 are still supported.
- Ability to Use the Intel Unified Shared Memory Extension, with OpenCL backend
- Optimized Atomic Operations
- Use of Immediate Command Lists for Low Latency Dispatch, with Level Zero backend
- Improved portability to other platforms & devices
- Improved Asynchronous Execution
The full release notes are available in docs/release_notes/chipStar_1.1.rst
The full sources of the release (including git submodules) are available packaged in the attached file chipStar-1.1.tar.gz
(SHA256: 9258a313c503073a082ca310cebf048d84c4ab698facfc8d1d9ce1381ffb9fc5).
v1.1-RC4
The 4th release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.
v1.1-RC3
The third release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.
v1.1-RC2
The second release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.
v1.1-RC1
The first release candidate for v1.1. Please test and add your results to the test log and any major problems or regressions as issues to the 1.1 milestone.
chipStar 1.0
chipStar Release 1.0
chipStar can compile HIP and CUDA applications to platforms which support SPIR-V as the device intermediate representation. It supports OpenCL and Level Zero as the low-level runtime alternatives. More info here.
The full sources of the release (including git submodules) are available packaged in the attached file chipStar_1_0.tar.gz
(SHA256: aa2d46ad0ed6c1005e3466f0afcea26b594a49c5ec3725cf91842549d9b547ea).
Release Highlights
- The compilation toolchain works with Clang/LLVM 15 and 16.
- Over 950 unit tests pass and several full real-world HPC applications have been tested to work
- See the Features document for the current HIP/CUDA feature coverage.
- Tested with Intel Level Zero on multiple GPU devices, Intel OpenCL for CPUs and GPUs and PoCL on Intel CPUs and GPUs (via the Level Zero driver)
- Large number of bugs fixed since 0.9
Known Issues
The 1.0 release is focused on correctness. There are known bottlenecks that can limit the performance by up to 10x in some cases. Many of these performance bottlenecks will be addressed in the next release. Please keep this in mind while testing chipStar 1.0, and report any correctness/stability issues by opening an issue on github.
chipStar 1.0-RC3
chipStar Release 1.0-RC3
chipStar can compile HIP and CUDA applications to platforms which support SPIR-V as the device intermediate representation. It supports OpenCL and Level Zero as the low-level runtime alternatives. More info here,
The full sources for the release (including git submodules) are available packaged in the attached file chipStar_1_0_RC3.tar.gz
.
Release Highlights
- The compilation toolchain works with Clang/LLVM 15 and 16.
- Over 950 unit tests pass and several full real-world HPC applications have been tested to work
- See the Features document for the current HIP/CUDA feature coverage.
- Tested with Intel Level Zero on multiple GPU devices, Intel OpenCL for CPUs and GPUs and PoCL on Intel CPUs and GPUs (via the Level Zero driver)
- Large number of bugs fixed since 0.9
Known Issues
The 1.0 release is focused on correctness. There are known bottlenecks that can limit the performance by up to 10x in some cases. Many of these performance bottlenecks will be addressed in the next release. Please keep this in mind while testing chipStar 1.0, and report any correctness/stability issues by opening an issue on github.
chipStar 1.0 RC2
chipStar Release 1.0-RC2
chipStar can compile HIP and CUDA applications to platforms which support SPIR-V as the device intermediate representation. It supports OpenCL and Level Zero as the low-level runtime alternatives. More info here,
The full sources for the release (including git submodules) are available packaged in the attached file chipStar_1_0_RC2.tar.gz
.
Release Highlights
- The compilation toolchain works with Clang/LLVM 15 and 16.
- Over 950 unit tests pass, several real-world applications have been tested to work
- See the Features document for the current HIP/CUDA feature coverage.
- Tested with Intel Level Zero on multiple GPU devices.
- Tested with Intel OpenCL for CPUs and GPUs.
- Tested with PoCL on Intel CPUs
- Large number of bugs fixed since 0.9
Known Issues
The 1.0 release is focused on correctness. There are known bottlenecks that can limit the performance by up to 10x in some cases. Many of these performance bottlenecks will be addressed in the next release. Please keep this in mind while testing chipStar 1.0, and report any correctness/stability issues by opening an issue on github.