Skip to content

Commit

Permalink
Merge pull request #65 from Monokaix/dev
Browse files Browse the repository at this point in the history
migrate vgpu to external project
  • Loading branch information
volcano-sh-bot committed Apr 19, 2024
2 parents 6a108e0 + 5ee11b7 commit d9617aa
Show file tree
Hide file tree
Showing 33 changed files with 62 additions and 2,374 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,5 +34,3 @@ jobs:
- run: make ubuntu20.04
- run: TAG_VERSION="${BRANCH_NAME}" make push-tag
- run: make push-latest
- run: make vgpu
- run: TAG_VERSION="${BRANCH_NAME}" make push-vgpu-tag
20 changes: 3 additions & 17 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2023, NVIDIA CORPORATION. All rights reserved.
# Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -20,8 +20,7 @@

DOCKER ?= docker
REGISTRY ?= volcanosh
VERSION ?= latest
TAG_VERSION ?= 1.0.0
VERSION ?= 1.0.0

##### Public rules #####

Expand All @@ -39,24 +38,11 @@ push-latest:
$(DOCKER) tag "$(REGISTRY)/volcano-device-plugin:$(VERSION)-ubuntu20.04" "$(REGISTRY)/volcano-device-plugin:latest"
$(DOCKER) push "$(REGISTRY)/volcano-device-plugin:latest"

push-tag:
$(DOCKER) tag "$(REGISTRY)/volcano-device-plugin:$(VERSION)-ubuntu20.04" "$(REGISTRY)/volcano-device-plugin:$(TAG_VERSION)"
$(DOCKER) push "$(REGISTRY)/volcano-device-plugin:$(TAG_VERSION)"

push-vgpu-tag:
$(DOCKER) tag "$(REGISTRY)/volcano-vgpu-device-plugin:$(VERSION)-ubuntu20.04" "$(REGISTRY)/volcano-vgpu-device-plugin:$(TAG_VERSION)"
$(DOCKER) push "$(REGISTRY)/volcano-vgpu-device-plugin:$(TAG_VERSION)"

ubuntu20.04:
$(DOCKER) build --pull \
$(DOCKER) build --network=host --pull \
--tag $(REGISTRY)/volcano-device-plugin:$(VERSION)-ubuntu20.04 \
--file docker/amd64/Dockerfile.ubuntu20.04 .

vgpu:
$(DOCKER) build --pull \
--tag $(REGISTRY)/volcano-vgpu-device-plugin:$(VERSION)-ubuntu20.04 \
--file docker/amd64/Dockerfile.vgpu-ubuntu20.04 .

centos7:
$(DOCKER) build --pull \
--tag $(REGISTRY)/volcano-device-plugin:$(VERSION)-centos7 \
Expand Down
47 changes: 11 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,41 +73,13 @@ We will be editing the docker daemon config file which is usually present at `/e
Once you have enabled this option on *all* the GPU nodes you wish to use,
you can then enable GPU support in your cluster by deploying the following Daemonset:

VGPU:
```
$ kubectl create -f volcano-vgpu-device-plugin.yml
```

GPU-SHARE (**Will be deprecated in volcano v1.9**):
```shell
$ kubectl create -f volcano-device-plugin.yml
```

**Note** that volcano device plugin can be configured. For example, it can specify gpu strategy by adding in the yaml file ''args: ["--gpu-strategy=number"]'' under ''image: volcanosh/volcano-device-plugin''. More configuration can be found at [volcano device plugin configuration](https://github.com/volcano-sh/devices/blob/master/doc/config.md).

### Running VGPU Jobs

VGPU can be requested by both set "volcano.sh/vgpu-number" and "volcano.sh/vgpu-memory" in resource.limit

```shell script
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: gpu-pod1
spec:
containers:
- name: cuda-container
image: nvidia/cuda:9.0-devel
command: ["sleep"]
args: ["100000"]
resources:
limits:
volcano.sh/vgpu-number: 2 # requesting 1 gpu cards
volcano.sh/vgpu-memory: 3000
EOF
```
### Running GPU Sharing Jobs (**Will be deprecated in volcano v1.9**)
### Running GPU Sharing Jobs (Without memory isolation)

NVIDIA GPUs can now be shared via container level resource requirements using the resource name volcano.sh/gpu-memory:

Expand Down Expand Up @@ -144,7 +116,7 @@ spec:
> **WARNING:** *if you don't request GPUs when using the device plugin with NVIDIA images all
> the GPUs on the machine will be exposed inside your container.*
### Running GPU Number Jobs (**Will be deprecated in volcano v1.9**)
### Running GPU Number Jobs (Without number isolation)

NVIDIA GPUs can now be requested via container level resource requirements using the resource name volcano.sh/gpu-number:

Expand All @@ -170,7 +142,7 @@ EOF

Please note that:
- the device plugin feature is beta as of Kubernetes v1.11.
- the gpu-share device plugin is alpha and is missing the following features, and will be deprecated in volcano v1.9
- the Volcano device plugin is alpha and is missing
- More comprehensive GPU health checking features
- GPU cleanup features
- GPU hard isolation
Expand All @@ -180,16 +152,19 @@ The next sections are focused on building the device plugin and running it.

### With Docker

#### Deploy as DaemonSet:
#### Build
```shell
$ make ubuntu20.04.
```

GPU-SHARE:
#### Run locally
```shell
$ kubectl create -f nvidia-device-plugin.yml
$ docker run --security-opt=no-new-privileges --cap-drop=ALL --network=none -it -v /var/lib/kubelet/device-plugins:/var/lib/kubelet/device-plugins nvidia/k8s-device-plugin:{version}
```

VGPU:
#### Deploy as DaemonSet:
```shell
$ kubectl create -f nvidia-vgpu-device-plugin.yml
$ kubectl create -f nvidia-device-plugin.yml
```

# Issues and Contributing
Expand Down
191 changes: 0 additions & 191 deletions cmd/vgpu/main.go

This file was deleted.

48 changes: 0 additions & 48 deletions cmd/vgpu/watchers.go

This file was deleted.

2 changes: 1 addition & 1 deletion docker/amd64/Dockerfile.ubuntu20.04
Original file line number Diff line number Diff line change
Expand Up @@ -40,4 +40,4 @@ ENV NVIDIA_DRIVER_CAPABILITIES=utility

COPY --from=build /go/src/volcano.sh/devices/volcano-device-plugin /usr/bin/volcano-device-plugin

ENTRYPOINT ["volcano-device-plugin"]
ENTRYPOINT ["volcano-device-plugin"]
Loading

0 comments on commit d9617aa

Please sign in to comment.