-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move GPU CI pipelines from old daint to new daint #1239
base: main
Are you sure you want to change the base?
Conversation
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
d11f8c2
to
acdfcb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
a72b2f8
to
2798c26
Compare
@@ -9,3 +9,6 @@ include: | |||
- local: '.gitlab/includes/clang14_cuda11_pipeline.yml' | |||
- local: '.gitlab/includes/gcc12_hip6_pipeline.yml' | |||
- local: '.gitlab/includes/sloc.yml' | |||
# TODO: move to on_merge before merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do.
Exporting
These are from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/docker-specialized.html#constraints. These work when testing manually, but don't seem to be work in CI yet. |
b6105a5
to
4716cc4
Compare
All right, we're making some progress:
I may end up disabling the test steps for the latter two in this PR to reenable them in separate PRs. |
0dea610
to
bd072ec
Compare
@@ -8,7 +8,7 @@ include: | |||
- local: '.gitlab/includes/common_pipeline.yml' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove pipeline?
The clang/cuda configuration with valgrind no longer complains about illegal instructions: good. It now reports many issues, which I don't know yet if they're real or not. I'll aim to get the GCC 12/CUDA 12 pipeline running properly (still some tweaks needed on the CSCS CI side apparently) and then I'll attempt to revive the two other CUDA configurations separately, possibly introducing another valgrind configuration on x86. |
214b45e
to
286b0c5
Compare
286b0c5
to
7541a28
Compare
9a982e3
to
559c234
Compare
d474e3e
to
178bad0
Compare
0bbf943
to
88e86d8
Compare
Moving to aarch64 triggers too many false positives.
Default stays 300 seconds.
The first test that uses the GPU can take significantly longer to run. Following tests take a normal amount of time. This seems to affect older CUDA 11.X versions. 11.8 does not have this issue, but 11.5 and 11.2 do.
There is already a a GCC 12 CI configuration with CUDA running on gh200/aarch64. Remove the GCC 13 configuration since it's too similar.
88e86d8
to
fe405f0
Compare
No description provided.