Skip to content

Commit

Permalink
Added torch-gpu Dockerfile in packaging/dockers (#2531)
Browse files Browse the repository at this point in the history
* Added torch-gpu Dockerfile in packaging/dockers

Signed-off-by: Manvenddra Rawat <quic_manvendd@quicinc.com>
  • Loading branch information
quic-manvendd authored and quic-mprajapa committed Nov 2, 2023
1 parent 056625b commit 4d3047f
Show file tree
Hide file tree
Showing 2 changed files with 211 additions and 0 deletions.
74 changes: 74 additions & 0 deletions packaging/dockers/Dockerfile.torch_gpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
FROM docker-registry.qualcomm.com/library/nvidia/cuda:11.6.2-cudnn8-devel-ubuntu20.04

ARG DEBIAN_FRONTEND=noninteractive
ARG APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn

RUN mv /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/cuda.list.orig && \
apt-get update > /dev/null && \
apt-get install -y --no-install-recommends apt-utils && \
apt-key del --no-tty 7fa2af80 && \
apt-key adv --no-tty --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub && \
apt-key adv --no-tty --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub && \
apt-get update > /dev/null && \
rm -rf /var/lib/apt/lists/*

RUN apt-get update > /dev/null && \
apt-get install --no-install-recommends -y \
# Bare minimum Packages
ca-certificates \
git \
ssh \
sudo \
wget \
xterm \
xauth > /dev/null && \
rm -rf /var/lib/apt/lists/*

# Add sudo support
RUN echo "%users ALL = (ALL) NOPASSWD: ALL" >> /etc/sudoers

RUN apt-get update -y > /dev/null && \
apt-get install --no-install-recommends -y \
python3.8 \
python3-pip && \
rm -rf /var/lib/apt/lists/*

# Register the version in alternatives
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
# Set python 3.8 as the default python
RUN update-alternatives --set python3 /usr/bin/python3.8

# Upgrade Python3 pip
RUN python3 -m pip --no-cache-dir install --upgrade pip

EXPOSE 25000
RUN apt-get update && apt-get install -y openssh-server && rm -rf /var/lib/apt/lists/*
RUN mkdir /var/run/sshd

RUN apt-get update && apt-get install -y liblapacke liblapacke-dev && rm -rf /var/lib/apt/lists/*

RUN apt-get update && apt-get install -y libjpeg8-dev && \
rm -rf /var/lib/apt/lists/*

# Set up symlink to point to the correct python version
RUN ln -sf /usr/bin/python3.8 /usr/bin/python
RUN ln -s /usr/lib/x86_64-linux-gnu/libjpeg.so /usr/lib

RUN sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
sed -i 's/Port 22/Port 25000/' /etc/ssh/sshd_config

# SSH login fix. Otherwise user is kicked off after login
RUN sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd

# upgrade pip
RUN python3 -m pip --no-cache-dir install --upgrade pip

# Install the AIMET package wheel files
COPY *.whl /tmp/
RUN cd /tmp && python3 -m pip install *.whl -f https://download.pytorch.org/whl/torch_stable.html && rm -f /tmp/*.whl

# Remove onnxruntime install in order to fix onnxruntime-gpu
RUN export ONNXRUNTIME_VER=$(python3 -c 'import onnxruntime; print(onnxruntime.__version__)') && \
python3 -m pip uninstall -y onnxruntime && \
python3 -m pip --no-cache-dir install onnxruntime-gpu==$ONNXRUNTIME_VER

137 changes: 137 additions & 0 deletions packaging/dockers/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
AIMET Docker creation
=====================

This page provides instructions to build a docker image with AIMET packages and start the development docker container.

Setup workspace
---------------

```console
WORKSPACE="<absolute_path_to_workspace>"
mkdir $WORKSPACE && cd $WORKSPACE
git clone https://github.com/quic/aimet.git
cd aimet/packaging/dockers
```

Make sure no wheel file is present in present working directory
```console
rm -rf *.whl
```

Set variant
------------

Set the *<variant_string>* to ONE of the following depending on your desired variant

* For the PyTorch 1.13 GPU variant, use **torch_gpu**
* For the PyTorch 1.13 CPU variant, use **torch_cpu**
* For the PyTorch 1.9 GPU variant, use **torch_gpu_pt19**
* For the PyTorch 1.9 CPU variant, use **torch_cpu_pt19**
* For the TensorFlow GPU variant, use **tf_gpu**
* For the TensorFlow CPU variant, use **tf_cpu**
* For the ONNX GPU variant, use **onnx_gpu**
* For the ONNX CPU variant, use **onnx_cpu**

```console
export AIMET_VARIANT=<variant_string>
```

Download AIMET packages
------------------------

Go to https://github.com/quic/aimet/releases and identify the release tag of the package you want to install.


Replace <release_tag> in the steps below with the appropriate tag:

```console
export release_tag=<release_tag>
```

Set the package download URL as follows:

```console
export download_url="https://github.com/quic/aimet/releases/download/${release_tag}"
```

Set the common suffix for the package files as follows:

```console
export wheel_file_suffix="cp38-cp38-linux_x86_64.whl"
```

Download the AIMET packages in the order specified below:

```console
wget ${download_url}/AimetCommon-${AIMET_VARIANT}_${release_tag}-${wheel_file_suffix}


# Download ONE of the following depending on the variant
wget ${download_url}/AimetTorch-${AIMET_VARIANT}_${release_tag}-${wheel_file_suffix}

# OR

wget ${download_url}/AimetTensorflow-${AIMET_VARIANT}_${release_tag}-${wheel_file_suffix}

# OR

wget ${download_url}/AimetOnnx-${AIMET_VARIANT}_${release_tag}-${wheel_file_suffix}


wget ${download_url}/Aimet-${AIMET_VARIANT}_${release_tag}-${wheel_file_suffix}
```

Build docker image
------------------

Follow these instructions in order to build the docker image locally. If not, skip to the next section.

```console
docker_image_name="aimet-prod-docker-${AIMET_VARIANT}:<any_tag>"
docker_container_name="aimet-prod-${AIMET_VARIANT}-<any_name>"

docker build -t ${docker_image_name} -f Dockerfile.${AIMET_VARIANT} .
```

**NOTE:** Feel free to modify the *docker_image_name* and *docker_container_name* as needed.

Start docker container
-----------------------

Ensure that a docker named *$docker_container_name* is not already running; otherwise remove the existing container and then start a new container as follows:

```console
docker ps -a | grep ${docker_container_name} && docker kill ${docker_container_name}

docker run --rm -it -u $(id -u ${USER}):$(id -g ${USER}) \
-v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro \
-v ${HOME}:${HOME} -v ${WORKSPACE}:${WORKSPACE} \
-v "/local/mnt/workspace":"/local/mnt/workspace" \
--entrypoint /bin/bash -w ${WORKSPACE} --hostname ${docker_container_name} ${docker_image_name}
```

**NOTE:**
* Feel free to modify the above *docker run* command based on the environment and filesystem on your host machine.
* If nvidia-docker 2.0 is installed, then add *--gpus all* to the *docker run* commands in order to enable GPU access inside the docker container.
* If nvidia-docker 1.0 is installed, then replace *docker run* with *nvidia-docker run* in order to enable GPU access inside the docker container.
* Port forwarding needs to be done in order to run the Visualization APIs from docker container. This can be achieved by running the docker container as follows:

```console

port_id="<any-port-number>"

docker run -p ${port_id}:${port_id} --rm -it -u $(id -u ${USER}):$(id -g ${USER}) \
-v /etc/passwd:/etc/passwd:ro -v /etc/group:/etc/group:ro \
-v ${HOME}:${HOME} -v ${WORKSPACE}:${WORKSPACE} \
-v "/local/mnt/workspace":"/local/mnt/workspace" \
--entrypoint /bin/bash -w ${WORKSPACE} --hostname ${docker_container_name} ${docker_image_name}
```

Environment setup
------------------

Set the common environment variables as follows:

```console
source /usr/local/lib/python3.8/dist-packages/aimet_common/bin/envsetup.sh
```

0 comments on commit 4d3047f

Please sign in to comment.