Skip to content

Commit

Permalink
Habana Base Deep Learning AMI released and Framework cleanup
Browse files Browse the repository at this point in the history
* Fix PYTHON path for pip installs TF framework
* --user for TF install directions
* Remove coming soon for Habana Base AMI
* Fix dockerfiles to workaround TF 2.6.0 release install bug
  • Loading branch information
omrialmog committed Nov 15, 2021
1 parent 6622684 commit 0a8aad1
Show file tree
Hide file tree
Showing 5 changed files with 54 additions and 38 deletions.
58 changes: 34 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
- [Check Habana Package Installation for no Docker](#check-habana-package-installation-for-no-docker)
- [Install SW Stack](#install-sw-stack)
- [Check TF/Horovod Habana packages](#check-tfhorovod-habana-packages)
- [Install TF/Horovod Habana python packages](#install-tfhorovod-habana-python-packages)
- [Install TF/Horovod Habana packages](#install-tfhorovod-habana-packages)
- [Check PT Habana packages](#check-pt-habana-packages)
- [Install PT Habana python packages](#install-pt-habana-python-packages)
- [Install PT Habana packages](#install-pt-habana-packages)
- Docker
- [Do you want to use prebuilt docker or build docker yourself?](#do-you-want-to-use-prebuilt-docker-or-build-docker-yourself)
- [How to Build Docker Images from Habana Dockerfiles](#how-to-build-docker-images-from-habana-dockerfiles)
Expand Down Expand Up @@ -196,12 +196,6 @@ Setup complete, please proceed to [Setup Complete](#Setup-Complete)

## Habana Deep Learning AMI from AWS Marketplace

<center>

**--- Coming Soon ---**

</center>

When using the Habana Deep Learning AMI from AWS Marketplace, you can either directly use containers or install a framework and proceed from there to run directly on the AMI.
<br />

Expand Down Expand Up @@ -999,7 +993,7 @@ ${PYTHON} -m pip list | grep habana
<center>

### Are the required python packages installed on your system?
[Yes](#Setup-Complete)[No](#Install-TFHorovod-Habana-python-packages)
[Yes](#Setup-Complete)[No](#install-tfhorovod-habana-packages)

</center>

Expand All @@ -1009,7 +1003,7 @@ ${PYTHON} -m pip list | grep habana

<br />

## Install TF/Horovod Habana python packages
## Install TF/Horovod Habana packages
This section describes how to obtain and install the TensorFlow software package. The package consists of two main components:

Base **habana-tensorflow** Python package - Libraries and modules needed to execute TensorFlow on a **single Gaudi** device.
Expand All @@ -1021,7 +1015,9 @@ Scale-out **habana-horovod** Python package - Libraries and modules needed to ex

<br />

The following example scripts include instructions from the steps [Base Installation (Single Node)](#Base-Installation-Single-Node) and [Scale-out Installation](#Scale-out-Installation) that can be used for your reference. The scripts install TF 2.5.1.
The following example scripts include instructions from the steps [Base Installation (Single Node)](#Base-Installation-Single-Node) and [Scale-out Installation](#Scale-out-Installation) that can be used for your reference. The scripts install TF 2.5.1.
The scripts are using Python3 from ``/usr/bin/`` with version according to the [Support Matrix](#SynapseAi-Support-Matrix).
Make sure, that Python3 is installed there, and if not, update the bash scripts with appropriate ``PYTHON=<path>``.

Ubuntu 18.04 example script [u18_tensorflow_installation.sh](https://github.com/HabanaAI/Setup_and_Install/blob/r1.0.1/installation_scripts/u18_tensorflow_installation.sh).

Expand Down Expand Up @@ -1149,16 +1145,30 @@ This will search for and list all packages with the word Habana.
### Base Installation (Single Node)
The habana-tensorflow package contains all the binaries and scripts to run topologies on a single-node.

1. Before installing habana-tensorflow, install supported TensorFlow version. See [Support Matrix](#SynapseAi-Support-Matrix). If no TensorFlow package is available, PIP will automatically fetch it.
1. All the steps listed below are using to ``${PYTHON}`` environment variable, which must be set to appropriate version of Python, according to [Support Matrix](#SynapseAi-Support-Matrix).
```
${PYTHON} -m pip install tensorflow-cpu==<supported_tf_version>
export PYTHON=/usr/bin/python<VER> # i.e. for U18 it's PYTHON=/usr/bin/python3.7
```
2. Before installing habana-tensorflow, install supported TensorFlow version. See [Support Matrix](#SynapseAi-Support-Matrix). If no TensorFlow package is available, PIP will automatically fetch it.
**NOTE:**
After TensorFlow release version 2.7.0, TensorFlow 2.6.0 has a broken dependency to TensorFlow Estimator, Keras and Tensorboard.
To overcome this dependency, user needs to explicitly install the proper version of those packages before installing TensorFlow.

2. habana-tensorflow is available in the Habana Vault. To allow PIP to search for the habana-tensorflow package, –extra-index-url needs to be specified:
```
${PYTHON} -m pip install habana-tensorflow==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
# Only when installing tensorflow-cpu==2.6.0
${PYTHON} -m pip install --user tensorflow-estimator==2.6.0
${PYTHON} -m pip install --user tensorboard==2.6.0
${PYTHON} -m pip install --user keras==2.6.0
```
Then install tensorflow-cpu:
```
3. Run the below command to make sure the habana-tensorflow package is properly installed:
${PYTHON} -m pip install --user tensorflow-cpu==<supported_tf_version>
```
3. habana-tensorflow is available in the Habana Vault. To allow PIP to search for the habana-tensorflow package, –extra-index-url needs to be specified:
```
${PYTHON} -m pip install --user habana-tensorflow==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
```
4. Run the below command to make sure the habana-tensorflow package is properly installed:
```
${PYTHON} -c "import habana_frameworks.tensorflow as htf; print(htf.__version__)"
```
Expand Down Expand Up @@ -1212,11 +1222,11 @@ export PATH=$MPI_ROOT/bin:$PATH
```
Install mpi4py binding
```
python3 -m pip install mpi4py==3.0.3
${PYTHON} -m pip install --user mpi4py==3.0.3
```
3. habana-horovod is also stored in the Habana Vault. To allow PIP to search for the habana-horovod package, –extra-index-url needs to be specified:
```
${PYTHON} -m pip install habana-horovod==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
${PYTHON} -m pip install --user habana-horovod==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
```

#### See also:
Expand Down Expand Up @@ -1274,7 +1284,7 @@ Python dependencies are gatehered in [model_requirements.txt](https://github.com

Download the file and invoke:
```
python3 -m pip install -r model_requirements.txt
${PYTHON} -m pip install --user -r model_requirements.txt
```

<br />
Expand Down Expand Up @@ -1371,7 +1381,7 @@ Check for habana-torch and habana-torch-hcl
<center>

### Are the required python packages installed on your system?
[Yes](#Setup-Complete)[No](#Install-PT-Habana-python-packages)
[Yes](#Setup-Complete)[No](#install-pt-habana-packages)

</center>

Expand All @@ -1381,7 +1391,7 @@ Check for habana-torch and habana-torch-hcl

<br />

## Install PT Habana python packages
## Install PT Habana packages
### Install Habana Pytorch
<details>
<summary>Ubuntu distributions</summary>
Expand Down Expand Up @@ -2224,7 +2234,7 @@ It will look similar to this:
* <details>
<summary>TF 2.6.0</summary>

### Pull docker
### Pull docker
```
docker pull vault.habana.ai/gaudi-docker/1.0.1/ubuntu18.04/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.0.1-81
```
Expand Down Expand Up @@ -2275,8 +2285,8 @@ It will look similar to this:

* <details>
<summary>TF 2.6.0</summary>

### Pull docker
### Pull docker
```
docker pull vault.habana.ai/gaudi-docker/1.0.1/amzn2/habanalabs/tensorflow-installer-tf-cpu-2.6.0:1.0.1-81
```
Expand Down
3 changes: 3 additions & 0 deletions dockerfiles/Dockerfile_amzn2_tensorflow_installer
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,9 @@ RUN wget https://bootstrap.pypa.io/get-pip.py && \
python3 get-pip.py pip==21.0.1 && \
rm -rf get-pip.py && \
pip3 install -r requirements-training-release.txt && \
pip3 install tensorflow-estimator==2.6.0 \
pip3 install tensorboard==2.6.0 \
pip3 install keras==2.6.0 \
pip3 install tensorflow-cpu==${TF_VERSION} \
tensorflow-model-optimization==0.5.0 && \
# pycocotools has to be installed in separated process otherwise it fails with 'numpy.ufunc size changed'
Expand Down
3 changes: 3 additions & 0 deletions dockerfiles/Dockerfile_ubuntu_tensorflow_installer
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ COPY requirements-training-release.txt requirements-training-release.txt
RUN python3 -m pip install pip==21.0.1 && \
pip3 install -r requirements-training-release.txt && \
pip3 uninstall --yes habana tensorflow && \
pip3 install tensorflow-estimator==2.6.0 \
pip3 install tensorboard==2.6.0 \
pip3 install keras==2.6.0 \
# tensorflow-cpu and -model have to be installed in separated processes otherwise old version of tf will be imported
pip3 install tensorflow-cpu==${TF_VERSION} && \
pip3 install tensorflow-model-optimization==0.5.0 && \
Expand Down
14 changes: 7 additions & 7 deletions installation_scripts/al2_tensorflow_installation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ export MPI_ROOT=/usr/local/openmpi
export LD_LIBRARY_PATH=$MPI_ROOT/lib:$LD_LIBRARY_PATH
export OPAL_PREFIX=$MPI_ROOT
export PATH=$MPI_ROOT/bin:$PATH
export PYTHON=/usr/bin/python3.7
echo "export MPI_ROOT=${MPI_ROOT}" | sudo tee -a /etc/profile.d/habanalabs.sh
echo "export OPAL_PREFIX=${MPI_ROOT}" | sudo tee -a /etc/profile.d/habanalabs.sh
echo 'export LD_LIBRARY_PATH=${MPI_ROOT}/lib:${LD_LIBRARY_PATH}' | sudo tee -a /etc/profile.d/habanalabs.sh
Expand All @@ -46,14 +47,13 @@ wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-"$
/sbin/ldconfig

export MPICC=${MPI_ROOT}/bin/mpicc
python3 -m pip install mpi4py==3.0.3
${PYTHON} -m pip install --user mpi4py==3.0.3

#install base tensorflow package
python3 -m pip install tensorflow-cpu==2.5.1
#instal Habana tensorflow bridge & Horovod
python3 -m pip install habana-tensorflow==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
python3 -m pip install habana-horovod==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
${PYTHON} -m pip install --user tensorflow-cpu==2.5.1
#install Habana tensorflow bridge & Horovod
${PYTHON} -m pip install --user habana-tensorflow==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
${PYTHON} -m pip install --user habana-horovod==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple

source /etc/profile.d/habanalabs.sh
python3 -c 'import tensorflow as tf;import habana_frameworks.tensorflow as htf;htf.library_loader.load_habana_module();x = tf.constant(2);y = x + x;assert y.numpy() == 4, "Sanity check failed: Wrong Add output";assert "HPU" in y.device, "Sanity check failed: Operation not executed on Habana";print("Sanity check passed")'

${PYTHON} -c 'import tensorflow as tf;import habana_frameworks.tensorflow as htf;htf.load_habana_module();x = tf.constant(2);y = x + x;assert y.numpy() == 4, "Sanity check failed: Wrong Add output";assert "HPU" in y.device, "Sanity check failed: Operation not executed on Habana";print("Sanity check passed")'
14 changes: 7 additions & 7 deletions installation_scripts/u18_tensorflow_installation.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ export MPI_ROOT=/usr/local/openmpi
export LD_LIBRARY_PATH=$MPI_ROOT/lib:$LD_LIBRARY_PATH
export OPAL_PREFIX=$MPI_ROOT
export PATH=$MPI_ROOT/bin:$PATH
export PYTHON=/usr/bin/python3.7
echo "export MPI_ROOT=${MPI_ROOT}" | sudo tee -a /etc/profile.d/habanalabs.sh
echo "export OPAL_PREFIX=${MPI_ROOT}" | sudo tee -a /etc/profile.d/habanalabs.sh
echo 'export LD_LIBRARY_PATH=${MPI_ROOT}/lib:${LD_LIBRARY_PATH}' | sudo tee -a /etc/profile.d/habanalabs.sh
Expand All @@ -46,14 +47,13 @@ wget --no-verbose https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-"$
rm -rf openmpi-"${OPENMPI_VER}"* && \
sudo /sbin/ldconfig

python3 -m pip install mpi4py==3.0.3
${PYTHON} -m pip install --user mpi4py==3.0.3

#install base tensorflow package
python3 -m pip install tensorflow-cpu==2.5.1
#instal Habana tensorflow bridge & Horovod
python3 -m pip install habana-tensorflow==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
python3 -m pip install habana-horovod==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
${PYTHON} -m pip install --user tensorflow-cpu==2.5.1
#install Habana tensorflow bridge & Horovod
${PYTHON} -m pip install --user habana-tensorflow==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple
${PYTHON} -m pip install --user habana-horovod==1.0.1.81 --extra-index-url https://vault.habana.ai/artifactory/api/pypi/gaudi-python/simple

source /etc/profile.d/habanalabs.sh
python3 -c 'import tensorflow as tf;import habana_frameworks.tensorflow as htf;htf.library_loader.load_habana_module();x = tf.constant(2);y = x + x;assert y.numpy() == 4, "Sanity check failed: Wrong Add output";assert "HPU" in y.device, "Sanity check failed: Operation not executed on Habana";print("Sanity check passed")'

${PYTHON} -c 'import tensorflow as tf;import habana_frameworks.tensorflow as htf;htf.load_habana_module();x = tf.constant(2);y = x + x;assert y.numpy() == 4, "Sanity check failed: Wrong Add output";assert "HPU" in y.device, "Sanity check failed: Operation not executed on Habana";print("Sanity check passed")'

0 comments on commit 0a8aad1

Please sign in to comment.