SCV is a distributed cluster GPU sniffer. It can cooperate with Yoda-Scheduler to achieve fine-grained GPU scheduling tasks.
- Core Frequency
- Model
- Free Memory
- Total Memory
- Memory Frequency
- Bandwidth
- Power
- GPU Number
apiVersion: core.run-linux.com/v1
kind: Scv
metadata:
creationTimestamp: "2020-09-01T06:45:19Z"
generation: 4
name: isl-super
resourceVersion: "88823392"
selfLink: /apis/core.run-linux.com/v1/scvs/isl-super
uid: 0fe4de13-34ab-44fc-9454-78a50407c4ad
spec:
updateInterval: 1000
status:
cardList:
- bandwidth: 15760
clock: 5705
core: 1911
freeMemory: 12194
health: Healthy
id: 0
model: TITAN Xp
power: 250
totalMemory: 12194
cardNumber: 1
freeMemorySum: 12194
totalMemorySum: 12194
updateTime: "2020-09-05T11:47:48Z"
-
Ensure that the nvidia container runtime and the nvidia driver are installed on each kubernetes worker node. See nvidia-docker for more details.
-
Ubuntu
# Add the package repositories $ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list $ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit nvidia-container-runtime $ sudo systemctl restart docker
-
Centos
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID) $ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | sudo tee /etc/yum.repos.d/nvidia-docker.repo $ sudo yum install -y nvidia-container-toolkit nvidia-container-runtime $ sudo systemctl restart docker
-
-
Enable the nvidia-container-runtime as docker default runtime on each kubernetes worker node.
You need to modify
/etc/docker/daemon.json
to the following content on each worker node:{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, "exec-opts": ["native.cgroupdriver=systemd"], "log-driver": "json-file", "log-opts": { "max-size": "100m" }, "storage-driver": "overlay2", "registry-mirrors": ["https://registry.docker-cn.com"] }
-
Deploy the SCV into your kubernetes cluster:
kubectl apply -f https://raw.githubusercontent.com/NJUPT-ISL/SCV/release-2.0/config/crd/bases/core.run-linux.com_scvs.yaml kubectl apply -f https://raw.githubusercontent.com/NJUPT-ISL/SCV/master/deploy/deploy.yaml