-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] oneapi/ccl.hpp: No such file or directory. #5653
Comments
Thanks @weiji14 for opening this to track. |
Hello, any update on this issue? |
Follow the instructions here to install conda install -c https://software.repos.intel.com/python/conda/ -c conda-forge oneccl-devel Solved this problem. |
Try to fix `fatal error: oneapi/ccl.hpp: No such file or directory` on CUDA builds using suggestion at microsoft/DeepSpeed#5653 (comment)
Try to fix `fatal error: oneapi/ccl.hpp: No such file or directory` on CUDA builds using suggestion at microsoft/DeepSpeed#5653 (comment)
* updated v0.14.4 * MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.36.2, and conda-forge-pinning 2024.06.21.08.07.40 * Remove ninja as runtime dependency Xref #1 * Replace pynvml with nvidia-ml-py Xref microsoft/DeepSpeed#5529. Also added note about compatibility with pydantic 2.0. * Reset build number to 0 * Add oneccl-devel to host dependencies Try to fix `fatal error: oneapi/ccl.hpp: No such file or directory` on CUDA builds using suggestion at microsoft/DeepSpeed#5653 (comment) * MNT: Re-rendered with conda-build 24.5.1, conda-smithy 3.38.0, and conda-forge-pinning 2024.08.11.18.23.17 --------- Co-authored-by: Wei Ji <23487320+weiji14@users.noreply.github.com>
Thanks so much @SnzFor16Min for pointing me to that That said, I'm still unsure if this issue should be closed, because this Intel oneAPI Toolkit should only be used for CPU builds and not CUDA (GPU) builds no? As mentioned at conda-forge/deepspeed-feedstock#56 (comment):
Will leave this up to @loadams and the deepspeed team to resolve. |
I'm no expert in building DeepSpeed, but as I see |
Ah yes, the |
@weiji14 - this should be fine to add to the dependencies, it should not cause any issues on the CUDA builds. Also it should be fine to leave I'd say lets leave this open for now, and I'll check back to confirm we have no issues reported from users, and we can also confirm the flow works with the next DeepSpeed release. |
Describe the bug
The builds on conda-forge have been failing since
deepspeed=0.14.1
for CUDA 11.8 and 12.0 with an error likefatal error: oneapi/ccl.hpp: No such file or directory
. Originally reported at conda-forge/deepspeed-feedstock#56 (comment).To Reproduce
Steps to reproduce the behavior:
python build_locally.py
locally, select the option with CUDA 11.8 and Python 3.9Expected behavior
A clear and concise description of what you expected to happen.
CUDA builds work as expected.
ds_report output
Please run
ds_report
to give us details about your setup.Note, this isn't the exact report for the conda-forge CI device, I copied this from the CPU build logs
Screenshots
Truncated traceback from https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=953875&view=logs&j=bb1c2637-64c6-57bd-9ea6-93823b2df951&t=350df31b-3291-5209-0bb7-031395f0baa1&l=3486:
System info (please complete the following information):
Launcher context
Are you launching your experiment with the
deepspeed
launcher, MPI, or something else? NoDocker context
Are you using a specific docker image that you can share?
quay.io/condaforge/linux-anvil-cuda:11.8
Additional context
Add any other context about the problem here.
The builds have been failing in these PRs as well:
The text was updated successfully, but these errors were encountered: