diff --git a/README.md b/README.md index 878aeff..d61ef20 100644 --- a/README.md +++ b/README.md @@ -144,7 +144,7 @@ rpm -qa | grep habanalabs ``` 2. Download and install habanalabs-dkms ``` - sudo apt install -y habanalabs-dkms=0.15.2-4 + sudo apt install -y habanalabs-dkms=0.15.3-31 ``` 3. Load the driver ``` @@ -189,7 +189,7 @@ rpm -qa | grep habanalabs ``` 2. Download and install habanalabs-dkms ``` - sudo apt install -y habanalabs-dkms=0.15.2-4 + sudo apt install -y habanalabs-dkms=0.15.3-31 ``` 3. Load the driver ``` @@ -255,7 +255,7 @@ sudo yum remove habanalabs* ``` 2. Download and install new driver: ``` -sudo yum install habanalabs-0.15.2-4* -y +sudo yum install habanalabs-0.15.3-31* -y ``` 3. Load the driver ``` @@ -319,7 +319,7 @@ sudo yum remove habanalabs* ``` 2. Download and install new driver: ``` -sudo yum install habanalabs-0.15.2-4* -y +sudo yum install habanalabs-0.15.3-31* -y ``` 3. Load the driver ``` @@ -421,18 +421,18 @@ rpm -qa | grep habana ### Graph compiler and run-time installation To install the graph compiler and run-time, use the following command: ``` - sudo apt install -y habanalabs-graph=0.15.2-4 + sudo apt install -y habanalabs-graph=0.15.3-31 ``` ### Thunk installation To install the thunk library, use the following command: ``` - sudo apt install -y habanalabs-thunk=0.15.2-4 + sudo apt install -y habanalabs-thunk=0.15.3-31 ``` ### Update FW To update the firmware, follow the below steps: 1. Install the Firmware package: ``` - sudo apt install -y habanalabs-firmware=0.15.2-4 + sudo apt install -y habanalabs-firmware=0.15.3-31 ``` 2. Remove the driver: ``` @@ -449,13 +449,13 @@ rpm -qa | grep habana ### (Optional) FW tools installation To install the firmware tools, use the following command: ``` - sudo apt install -y habanalabs-firmware-tools=0.15.2-4 + sudo apt install -y habanalabs-firmware-tools=0.15.3-31 ``` ### (Optional) qual installation To install hl_qual, use the following command: ``` - sudo apt install -y habanalabs-qual=0.15.2-4 + sudo apt install -y habanalabs-qual=0.15.3-31 ``` @@ -480,18 +480,18 @@ rpm -qa | grep habana ### Graph compiler and run-time installation To install the graph compiler and run-time, use the following command: ``` - sudo apt install -y habanalabs-graph=0.15.2-4 + sudo apt install -y habanalabs-graph=0.15.3-31 ``` ### Thunk installation To install the thunk library, use the following command: ``` - sudo apt install -y habanalabs-thunk=0.15.2-4 + sudo apt install -y habanalabs-thunk=0.15.3-31 ``` ### Update FW To update the firmware, follow the below steps: 1. Install the Firmware package: ``` - sudo apt install -y habanalabs-firmware=0.15.2-4 + sudo apt install -y habanalabs-firmware=0.15.3-31 ``` 2. Remove the driver: ``` @@ -508,13 +508,13 @@ rpm -qa | grep habana ### (Optional) FW tools installation To install the firmware tools, use the following command: ``` - sudo apt install -y habanalabs-firmware-tools=0.15.2-4 + sudo apt install -y habanalabs-firmware-tools=0.15.3-31 ``` ### (Optional) qual installation To install hl_qual, use the following command: ``` - sudo apt install -y habanalabs-qual=0.15.2-4 + sudo apt install -y habanalabs-qual=0.15.3-31 ``` @@ -553,18 +553,18 @@ This will search for and list all packages with the word Habana. ### Graph compiler and run-time installation To install the graph compiler and run-time, use the following command: ``` -sudo yum install habanalabs-graph-0.15.2-4* -y +sudo yum install habanalabs-graph-0.15.3-31* -y ``` ### Thunk installation To install the thunk library, use the following command: ``` -sudo yum install habanalabs-thunk-0.15.2-4* -y +sudo yum install habanalabs-thunk-0.15.3-31* -y ``` ### Update FW To update the firmware, follow the below steps: 1. Install the Firmware package: ``` -sudo yum install habanalabs-firmware-0.15.2-4* -y +sudo yum install habanalabs-firmware-0.15.3-31* -y ``` 2. Remove the driver: ``` @@ -581,13 +581,13 @@ sudo modprobe habanalabs ### (Optional) FW tools installation To install the firmware tools, use the following command: ``` -sudo yum install habanalabs-firmware-tools-0.15.2-4* -y +sudo yum install habanalabs-firmware-tools-0.15.3-31* -y ``` ### (Optional) qual installation To install hl_qual, use the following command: ``` -sudo yum install habanalabs-qual-0.15.2-4* -y +sudo yum install habanalabs-qual-0.15.3-31* -y ```
@@ -624,18 +624,18 @@ This will search for and list all packages with the word Habana. ### Graph compiler and run-time installation To install the graph compiler and run-time, use the following command: ``` -sudo yum install habanalabs-graph-0.15.2-4* -y +sudo yum install habanalabs-graph-0.15.3-31* -y ``` ### Thunk installation To install the thunk library, use the following command: ``` -sudo yum install habanalabs-thunk-0.15.2-4* -y +sudo yum install habanalabs-thunk-0.15.3-31* -y ``` ### Update FW To update the firmware, follow the below steps: 1. Install the Firmware package: ``` -sudo yum install habanalabs-firmware-0.15.2-4* -y +sudo yum install habanalabs-firmware-0.15.3-31* -y ``` 2. Remove the driver: ``` @@ -652,13 +652,13 @@ sudo modprobe habanalabs ### (Optional) FW tools installation To install the firmware tools, use the following command: ``` -sudo yum install habanalabs-firmware-tools-0.15.2-4* -y +sudo yum install habanalabs-firmware-tools-0.15.3-31* -y ``` ### (Optional) qual installation To install hl_qual, use the following command: ``` -sudo yum install habanalabs-qual-0.15.2-4* -y +sudo yum install habanalabs-qual-0.15.3-31* -y ```
@@ -794,7 +794,7 @@ python3 -m pip install habana-horovod ``` For example: ``` -./docker_build.sh tensorflow ubuntu20.04 2.4.1 +./docker_build.sh tensorflow ubuntu20.04 2.5.0 ``` ### Install habana-container-runtime package @@ -822,7 +822,7 @@ For example: #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` - sudo apt install -y habanalabs-container-runtime=0.15.2-4 + sudo apt install -y habanalabs-container-runtime=0.15.3-31 ``` #### Docker Engine setup @@ -883,7 +883,7 @@ For example: #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` - sudo apt install -y habanalabs-container-runtime=0.15.2-4 + sudo apt install -y habanalabs-container-runtime=0.15.3-31 ``` #### Docker Engine setup @@ -958,7 +958,7 @@ This will search for and list all packages with the word Habana. #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` -sudo yum install habanalabs-container-runtime-0.15.2-4* -y +sudo yum install habanalabs-container-runtime-0.15.3-31* -y ``` #### Docker Engine setup @@ -1032,7 +1032,7 @@ This will search for and list all packages with the word Habana. #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` -sudo yum install habanalabs-container-runtime-0.15.2-4* -y +sudo yum install habanalabs-container-runtime-0.15.3-31* -y ``` ### Docker Engine setup @@ -1084,7 +1084,7 @@ It will look similar to this: **NOTE:** Modify below image name path $TF_VERSION to match the TF version chosen when building [2.4.1, 2.5.0] ``` -docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host artifactory.habana-labs.com/docker-local/0.15.2/$OS/habanalabs/$MODE-installer-tf-cpu-$TF_VERSION:0.15.2-4 +docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host artifactory.habana-labs.com/docker-local/0.15.3/$OS/habanalabs/$MODE-installer-tf-cpu-$TF_VERSION:0.15.3-31 ``` **OPTIONAL:** Add the following flag to mount a local host share folder to the docker in order to be able to transfer files out of docker: @@ -1134,7 +1134,7 @@ Setup complete, please proceed to [Run Reference Models](#Run-Reference-Models) #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` - sudo apt install -y habanalabs-container-runtime=0.15.2-4 + sudo apt install -y habanalabs-container-runtime=0.15.3-31 ``` #### Docker Engine setup @@ -1195,7 +1195,7 @@ Setup complete, please proceed to [Run Reference Models](#Run-Reference-Models) #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` - sudo apt install -y habanalabs-container-runtime=0.15.2-4 + sudo apt install -y habanalabs-container-runtime=0.15.3-31 ``` #### Docker Engine setup @@ -1270,7 +1270,7 @@ This will search for and list all packages with the word Habana. #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` -sudo yum install habanalabs-container-runtime-0.15.2-4* -y +sudo yum install habanalabs-container-runtime-0.15.3-31* -y ``` #### Docker Engine setup @@ -1344,7 +1344,7 @@ This will search for and list all packages with the word Habana. #### Install habana-container-runtime: Install the `habana-container-runtime` package: ``` -sudo yum install habanalabs-container-runtime-0.15.2-4* -y +sudo yum install habanalabs-container-runtime-0.15.3-31* -y ``` ### Docker Engine setup @@ -1401,11 +1401,11 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.2/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.3/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.3-31 ``` @@ -1414,22 +1414,22 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.2/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.3/ubuntu20.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.3-31 ``` *
Pytorch ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/ubuntu20.04/habanalabs/pytorch-installer:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/ubuntu20.04/habanalabs/pytorch-installer:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/0.15.2/ubuntu20.04/habanalabs/pytorch-installer:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/0.15.3/ubuntu20.04/habanalabs/pytorch-installer:0.15.3-31 ```
@@ -1443,11 +1443,11 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.2/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.3/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.3-31 ``` @@ -1456,11 +1456,11 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.2/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.3/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.3-31 ``` @@ -1469,11 +1469,11 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/ubuntu18.04/habanalabs/pytorch-installer:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/ubuntu18.04/habanalabs/pytorch-installer:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/0.15.2/ubuntu18.04/habanalabs/pytorch-installer:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/0.15.3/ubuntu18.04/habanalabs/pytorch-installer:0.15.3-31 ``` @@ -1487,11 +1487,11 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.2/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.3/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.4.1:0.15.3-31 ``` @@ -1500,11 +1500,11 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.2/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host vault.habana.ai/gaudi-docker/0.15.3/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.5.0:0.15.3-31 ``` @@ -1513,11 +1513,11 @@ It will look similar to this: ### Pull docker ``` - docker pull vault.habana.ai/gaudi-docker/0.15.2/amzn2/habanalabs/pytorch-installer:0.15.2-4 + docker pull vault.habana.ai/gaudi-docker/0.15.3/amzn2/habanalabs/pytorch-installer:0.15.3-31 ``` ### Run docker ``` - docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/0.15.2/amzn2/habanalabs/pytorch-installer:0.15.2-4 + docker run -it --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host vault.habana.ai/gaudi-docker/0.15.3/amzn2/habanalabs/pytorch-installer:0.15.3-31 ``` @@ -1578,6 +1578,10 @@ Check that all the cards show up by running the following command: ``` sudo lspci -tvvv | grep 1da3 ``` +OR +``` +sudo lspci -tvvv | grep Habana +``` You should expect to see all Gaudi cards listed. ### PCI Link Status diff --git a/dockerfiles/Dockerfile_amzn2_base_installer b/dockerfiles/Dockerfile_amzn2_base_installer index 9e5249f..dded63b 100644 --- a/dockerfiles/Dockerfile_amzn2_base_installer +++ b/dockerfiles/Dockerfile_amzn2_base_installer @@ -7,6 +7,7 @@ FROM amazonlinux:2.0.20210421.0 ARG ARTIFACTORY_URL ARG VERSION ARG REVISION + RUN yum update -y && yum install -y \ ethtool-4.8-10.amzn2.x86_64 \ python-devel \ diff --git a/dockerfiles/Dockerfile_amzn2_pytorch_installer b/dockerfiles/Dockerfile_amzn2_pytorch_installer index 9d0de4f..286cafa 100644 --- a/dockerfiles/Dockerfile_amzn2_pytorch_installer +++ b/dockerfiles/Dockerfile_amzn2_pytorch_installer @@ -35,16 +35,10 @@ RUN yum install -y \ moreutils && \ yum clean all -# Make Python 3.7 to be default python -RUN rm -f /usr/bin/python || echo "Python Symlink" && \ - sed -i 1s/python/python2/ /bin/yum && \ - sed -i 1s/python/python2/ /usr/libexec/urlgrabber-ext-down && \ - ln -s /usr/bin/python3.7 /usr/bin/python && \ - curl --create-dirs -sSLo get-pip.py https://bootstrap.pypa.io/get-pip.py && \ - python get-pip.py pip==19.3.1 --no-warn-script-location && \ +RUN wget https://bootstrap.pypa.io/get-pip.py && \ + python3 get-pip.py pip==21.0.1 --no-warn-script-location && \ rm -rf get-pip.py -# Install openmpi version 4.0.5 RUN wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-"${OPENMPI_VER}".tar.gz && \ tar -xvf openmpi-"${OPENMPI_VER}".tar.gz && \ cd openmpi-"${OPENMPI_VER}" && \ @@ -56,23 +50,24 @@ RUN wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmp rm -rf openmpi-"${OPENMPI_VER}"* && \ /sbin/ldconfig -# Set openmpi Path ENV MPI_ROOT=/usr/lib/habanalabs/openmpi ENV PATH=$MPI_ROOT/bin:$PATH -# Install mpi4py version 3.0.3 -RUN MPICC=/usr/lib/habanalabs/openmpi/bin/mpicc pip install mpi4py==3.0.3 --no-cache-dir +RUN MPICC=/usr/lib/habanalabs/openmpi/bin/mpicc pip3 install mpi4py==3.0.3 --no-cache-dir RUN wget "https://${ARTIFACTORY_URL}"/gaudi-pt-modules/"${VERSION}"/"${REVISION}"\ /amzn2/binary/pytorch_modules-"${VERSION}"_"${REVISION}".tgz && \ mkdir /root/habanalabs /root/habanalabs/pytorch_temp && \ tar -xf pytorch_modules-"${VERSION}"_"${REVISION}".tgz -C /root/habanalabs/pytorch_temp/. && \ mv /root/habanalabs/pytorch_temp/*.so /usr/lib/habanalabs/ && \ - pip install -r /root/habanalabs/pytorch_temp/requirements-pytorch.txt --user --no-warn-script-location && \ - pip uninstall --yes torch && \ - pip install /root/habanalabs/pytorch_temp/*.whl --user && \ + pip3 install -r /root/habanalabs/pytorch_temp/requirements-pytorch.txt --user --no-warn-script-location && \ + pip3 uninstall --yes torch && \ + pip3 install /root/habanalabs/pytorch_temp/*.whl --user && \ /sbin/ldconfig && \ echo "source /etc/profile.d/habanalabs.sh" >> ~/.bashrc && \ + pip3 uninstall -y pillow && \ + pip3 uninstall -y pillow-simd && \ + pip3 install pillow-simd==7.0.0.post3 --user && \ rm -rf /root/habanalabs/pytorch_temp/ && \ rm -rf pytorch_modules-"${VERSION}"_"${REVISION}".tgz diff --git a/dockerfiles/Dockerfile_amzn2_tensorflow_installer b/dockerfiles/Dockerfile_amzn2_tensorflow_installer index 4eae3f1..85a72cb 100644 --- a/dockerfiles/Dockerfile_amzn2_tensorflow_installer +++ b/dockerfiles/Dockerfile_amzn2_tensorflow_installer @@ -9,8 +9,7 @@ ARG REVISION FROM ${BASE_NAME}:${VERSION}-${REVISION} ARG VERSION ARG REVISION -# tensorflow-cpu version default set to 2.2.0 -ARG TF_VERSION=2.2.2 +ARG TF_VERSION=2.5.0 ARG OPENMPI_VER=4.0.5 ARG ARTIFACTORY_URL @@ -27,8 +26,7 @@ RUN yum install -y unzip \ openssh-server \ git \ bc \ - mesa-libGL && \ - yum remove openmpi -y + mesa-libGL # Install OpenMpi from public sources - it must be installed before requirements, # that has dependecy with mpi4py package @@ -43,7 +41,6 @@ RUN wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmp rm -rf openmpi-"${OPENMPI_VER}"* && \ /sbin/ldconfig -ENV CC=/usr/lib/habanalabs/openmpi/bin/mpicc ENV MPICC=/usr/lib/habanalabs/openmpi/bin/mpicc COPY requirements-training-release.txt requirements-training-release.txt @@ -51,14 +48,13 @@ COPY requirements-training-release.txt requirements-training-release.txt RUN wget https://bootstrap.pypa.io/get-pip.py && \ python3 get-pip.py pip==21.0.1 && \ rm -rf get-pip.py && \ + pip3 install tensorflow-cpu==${TF_VERSION} \ + tensorflow-model-optimization==0.5.0 && \ + # pycocotools has to be installed in separated process otherwise it fails with 'numpy.ufunc size changed' + pip3 install pycocotools==2.0.1 && \ pip3 install -r requirements-training-release.txt && \ - # pycocotools has to be installed in separated process otherwise it fails - pip3 install pycocotools==2.0.0 \ - tensorflow-cpu==${TF_VERSION} \ - tensorflow-model-optimization==0.5.0 && \ rm requirements-training-release.txt -# Using pip install habana-tensorflow and habana-horovod python packages RUN python3 -m pip install habana-tensorflow=="${VERSION}"."${REVISION}" \ --index-url "https://${ARTIFACTORY_URL}"/api/pypi/gaudi-python/simple && \ python3 -m pip install habana-horovod=="${VERSION}"."${REVISION}" \ diff --git a/dockerfiles/Dockerfile_ubuntu18.04_base_installer b/dockerfiles/Dockerfile_ubuntu18.04_base_installer index 55ba120..3541434 100644 --- a/dockerfiles/Dockerfile_ubuntu18.04_base_installer +++ b/dockerfiles/Dockerfile_ubuntu18.04_base_installer @@ -7,7 +7,7 @@ FROM ubuntu:bionic-20210512 ARG ARTIFACTORY_URL ARG VERSION ARG REVISION -ARG HABANA_PIP_VERSION="19.3.1" +ARG HABANA_PIP_VERSION="21.1.1" ENV LANG=en_US.UTF-8 ENV LANGUAGE=en_US.UTF-8 diff --git a/dockerfiles/Dockerfile_ubuntu18.04_py37_base_installer b/dockerfiles/Dockerfile_ubuntu18.04_py37_base_installer index d836e57..befc08d 100644 --- a/dockerfiles/Dockerfile_ubuntu18.04_py37_base_installer +++ b/dockerfiles/Dockerfile_ubuntu18.04_py37_base_installer @@ -56,9 +56,9 @@ RUN apt-get update && \ apt-get autoremove && apt-get clean RUN locale-gen en_US.UTF-8 -# Install python 3.7 as default version + +# Update default Python to 3.7 as Habana dropped support for python older than 3.7 RUN apt-get update && \ - # Install python 3.7 packages apt-get install -y --no-install-recommends \ python3.7 \ python3.7-dev \ @@ -67,7 +67,6 @@ RUN apt-get update && \ python3-distutils \ python3-tk && \ apt-get autoremove --yes && apt-get clean && rm -rf /var/lib/apt/lists/* && \ - # Configure alternatives update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 10 && \ update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 1 && \ update-alternatives --install /usr/bin/python python /usr/bin/python3.7 10 && \ diff --git a/dockerfiles/Dockerfile_ubuntu20.04_base_installer b/dockerfiles/Dockerfile_ubuntu20.04_base_installer index ab04a17..650999d 100644 --- a/dockerfiles/Dockerfile_ubuntu20.04_base_installer +++ b/dockerfiles/Dockerfile_ubuntu20.04_base_installer @@ -58,8 +58,11 @@ RUN apt-get update && \ apt-get autoremove && apt-get clean RUN locale-gen en_US.UTF-8 + RUN pip3 install setuptools==41.0.0 \ - google-pasta==0.2.0 + google-pasta==0.2.0 \ + requests==2.25.1 \ + urllib3==1.26.5 RUN yes '' | add-apt-repository ppa:deadsnakes/ppa && \ echo "deb https://${ARTIFACTORY_URL}/debian focal main" | tee -a /etc/apt/sources.list && \ diff --git a/dockerfiles/Dockerfile_ubuntu_pytorch_installer b/dockerfiles/Dockerfile_ubuntu_pytorch_installer index 452cbb9..2850fda 100644 --- a/dockerfiles/Dockerfile_ubuntu_pytorch_installer +++ b/dockerfiles/Dockerfile_ubuntu_pytorch_installer @@ -23,14 +23,13 @@ RUN apt-get update && apt-get install -y \ libcurl4 \ moreutils \ lsof \ + iproute2 \ libcairo2-dev \ libglib2.0-dev \ libselinux1-dev \ - libpcre2-dev \ - iproute2 && \ + libpcre2-dev && \ apt-get clean -# Make Python 3.x to be default python depending on OS version RUN bash -c "\ if [[ $BASE_NAME == *"ubuntu18.04"* ]]; then \ rm -rf /usr/bin/python /usr/bin/python3m; \ @@ -38,13 +37,10 @@ RUN bash -c "\ ln -s /usr/bin/python3.7m /usr/bin/python3m; \ else \ ln -s /usr/bin/python3.8 /usr/bin/python; \ - apt-get update; apt install -y libcairo2-dev; \ - apt-get clean; \ fi" RUN python3 -m pip install pip=="${HABANA_PIP_VERSION}" -# Install openmpi version 4.0.5 RUN wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-"${OPENMPI_VER}".tar.gz && \ tar -xvf openmpi-"${OPENMPI_VER}".tar.gz && \ cd openmpi-"${OPENMPI_VER}" && \ @@ -56,11 +52,9 @@ RUN wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmp rm -rf openmpi-"${OPENMPI_VER}"* && \ /sbin/ldconfig -# Set openmpi Path ENV MPI_ROOT=/usr/lib/habanalabs/openmpi ENV PATH=$MPI_ROOT/bin:$PATH -# Install mpi4py version 3.0.3 for both u18 & u20 RUN MPICC=/usr/lib/habanalabs/openmpi/bin/mpicc pip install mpi4py==3.0.3 --no-cache-dir RUN wget "https://${ARTIFACTORY_URL}"/gaudi-pt-modules/"${VERSION}"/"${REVISION}"\ @@ -73,6 +67,9 @@ RUN wget "https://${ARTIFACTORY_URL}"/gaudi-pt-modules/"${VERSION}"/"${REVISION} pip install /root/habanalabs/pytorch_temp/*.whl --user && \ /sbin/ldconfig && \ echo "source /etc/profile.d/habanalabs.sh" >> ~/.bashrc && \ + pip uninstall -y pillow && \ + pip uninstall -y pillow-simd && \ + pip install pillow-simd==7.0.0.post3 --user && \ rm -rf /root/habanalabs/pytorch_temp/ && \ rm -rf pytorch_modules-"${VERSION}"_"${REVISION}".tgz # requirement-pytorch.txt installs a pkg called torchvision.torchvision has dependency on "torch" pkg diff --git a/dockerfiles/Dockerfile_ubuntu_tensorflow_installer b/dockerfiles/Dockerfile_ubuntu_tensorflow_installer index 4107055..eff7785 100644 --- a/dockerfiles/Dockerfile_ubuntu_tensorflow_installer +++ b/dockerfiles/Dockerfile_ubuntu_tensorflow_installer @@ -9,23 +9,17 @@ ARG REVISION FROM ${BASE_NAME}:${VERSION}-${REVISION} ARG VERSION ARG REVISION -# tensorflow-cpu version default set to 2.2.2 -ARG TF_VERSION=2.2.2 +ARG TF_VERSION=2.5.0 ARG OPENMPI_VER=4.0.5 ARG ARTIFACTORY_URL ENV TF_MODULES_RELEASE_BUILD=/usr/lib/habanalabs/ -ENV PYTHONPATH=/root:/usr/lib/habanalabs/:/root +ENV PYTHONPATH=/usr/lib/habanalabs/:/root ENV LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/habanalabs/openmpi/lib/ ENV PATH=$PATH:/usr/lib/habanalabs/openmpi/bin/ ENV OPAL_PREFIX=/usr/lib/habanalabs/openmpi/ ENV MPI_ROOT=/usr/lib/habanalabs/openmpi/ -RUN apt-get update && \ - apt-get remove openmpi-bin -y && \ - apt-get autoremove --purge openmpi-bin -y && \ - apt-get clean - # Install OpenMpi from public sources - it must be installed before requirements, # that has dependecy with mpi4py package RUN wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-"${OPENMPI_VER}".tar.gz && \ @@ -42,16 +36,13 @@ RUN wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmp COPY requirements-training-release.txt requirements-training-release.txt RUN python3 -m pip install pip==21.0.1 && \ - pip3 install -r requirements-training-release.txt && \ - pip3 uninstall --yes habana tensorflow && \ - # tensorflow-cpu and -model have to be installed in separated processes otherwise old version of tf will be imported - pip3 install tensorflow-cpu==${TF_VERSION} && \ - pip3 install tensorflow-model-optimization==0.5.0 && \ + pip3 install tensorflow-cpu==${TF_VERSION} \ + tensorflow-model-optimization==0.5.0 && \ # pycocotools has to be installed in separated process otherwise it fails with 'numpy.ufunc size changed' pip3 install pycocotools==2.0.1 && \ + pip3 install -r requirements-training-release.txt && \ rm requirements-training-release.txt -# Using pip install habana-tensorflow and habana-horovod python packages RUN python3 -m pip install habana-tensorflow=="${VERSION}"."${REVISION}" \ --index-url "https://${ARTIFACTORY_URL}"/api/pypi/gaudi-python/simple && \ python3 -m pip install habana-horovod=="${VERSION}"."${REVISION}" \ diff --git a/dockerfiles/requirements-training-release.txt b/dockerfiles/requirements-training-release.txt index bb66f42..2d44807 100644 --- a/dockerfiles/requirements-training-release.txt +++ b/dockerfiles/requirements-training-release.txt @@ -1,4 +1,4 @@ -gast==0.3.3 +gast #version not specified, as different are required for a particular version of tensorflow-cpu py-cpuinfo==5.0.0 requests==2.25.1 tensorflow_datasets==1.2.0 @@ -6,15 +6,14 @@ tensorflow-metadata==0.12.1 tf-slim==1.1.0 cython==0.29.15 imgaug==0.4.0 -keras==2.3.1 +keras #version not specified, as different are required for a particular version of tensorflow-cpu cloudpickle==1.6.0 numpy>=1.18.0 -scipy==1.4.1 -tensorflow-addons==0.11.1 +tensorflow-addons==0.13.0 munch==2.5.0 git+https://github.com/nvidia/dllogger@26a0f8f1958de2c0c460925ff6102a4d2486d6cc#egg=dllogger git+https://github.com/tensorpack/tensorpack@11ca8b2c34056feb331744281000f78e3c157983 -h5py==2.10.0 +h5py #version not specified, as different are required for a particular version of tensorflow-cpu wrapt==1.12.1 bs4==0.0.1 tensorflow-hub==0.11.0 @@ -45,4 +44,11 @@ mpi4py==3.0.3 google-api-core==1.25.0 google-api-python-client==1.12.3 sacrebleu==1.3.6 - sacremoses==0.0.41 \ No newline at end of file + sacremoses==0.0.41 +# huggingface requirements + datasets==1.8.0 + transformers==4.6.1 +# retinanet requirements + scikit-learn==0.24.2 + seqeval==1.2.2 + #tensorflow_text==tensorflow cpu version [SW-48404][SW-48444] \ No newline at end of file