How to deploy a self-hosted GPU runner with CUDA enabled #67167
Replies: 2 comments
-
🕒 Discussion Activity Reminder 🕒 This Discussion has been labeled as dormant by an automated system for having no activity in the last 60 days. Please consider one the following actions: 1️⃣ Close as Out of Date: If the topic is no longer relevant, close the Discussion as 2️⃣ Provide More Information: Share additional details or context — or let the community know if you've found a solution on your own. 3️⃣ Mark a Reply as Answer: If your question has been answered by a reply, mark the most helpful reply as the solution. Note: This dormant notification will only apply to Discussions with the Thank you for helping bring this Discussion to a resolution! 💬 |
Beta Was this translation helpful? Give feedback.
-
@vaishnavimuley22 I'm also trying to implement the same thing. Just want to check if you got any success in this? |
Beta Was this translation helpful? Give feedback.
-
Select Topic Area
Question
Body
Hi, I am using actions-runner-controller to deploy a CPU-based and GPU-based self-hosted runner, but facing issues with CUDA in case of GPU runner (like the NVIDIA driver is unable to locate CUDA).
Details:
As mentioned, there is a karpenter provisioner named default-gpu that allocates GPU nodes and the runner pod gets scheduled on it.
Following is the GitHub workflow:
the nvcc -V command returns the CUDA version to be 11.4
Thus, also added the env variable setup in the github workflow but it didn't work and returned the same CUDA version to be N/A as well as torch.cuda.is_available() to be False.
I also tried extending the actions-runner-controllers image and creating a custom self-hosted runner for the same as follows:
Built it and pushed it to AWS ECR repo and used it in the workflow as follows:
On deploying this and using it in the workflow I started getting the following error message:
It would be great if someone help me to figure out what's going wrong and how I can solve it.
Beta Was this translation helpful? Give feedback.
All reactions