Wav2vec 2.0

Please follow the instructions provided in the Gaudi Installation Guide to set up the environment including the $PYTHON environment variable. The guide will walk you through the process of setting up your system to run the model on Gaudi. To achieve the best performance, please follow the methods outlined in the Optimizing Training Platform guide.

Clone Intel Gaudi Model-References

In the docker container, clone this repository and switch to the branch that matches your Intel Gaudi software version. You can run the hl-smi utility to determine the Intel Gaudi software version.

git clone -b [Intel Gaudi software version] https://github.com/HabanaAI/fairseq

Note: If the repository is not in the PYTHONPATH, make sure to update by running the below.

export PYTHONPATH=/path/to/fairseq:$PYTHONPATH

Install Model Requirements

In the docker container, go to Fairseq directory and install fairseq along with the required packages using pip:

cd fairseq
pip install --editable .

Set up the Dataset

Follow the steps below to set up Wav2vec dataset:

Download the dataset from http://www.openslr.org/12.
Create the train-960 directory comprised of the untared train-clean-100, train-clean-360 train-other-500 ( totaling 960 hours of speech).
Run the following command to create the manifest file:

$PYTHON wav2vec_manifest.py /path/to/dataset/train-960/ --dest /path/to/dataset/train-960/manifest --valid-percent 0.05

You can obtain “wav2vec_manifest.py” file from /path/to/fairseq/examples/wav2vec.

An example layout of the dataset will look like below:

100/
1001/
1006/
101/
1012/
...
manifest/

Note:

Please make sure the first line in /path/to/dataset/train-960/manifest/train.tsv and /path/to/dataset/train-960/manifest/valid.tsv points to the correct directory. e.g. /data/pytorch/wav2vec/data/LibriSpeech/train-960.
Going forward we assume the above Wav2vec dataset is available at path /data/pytorch/wav2vec/data/LibriSpeech/train-960.

Training and Examples

Single Card Training

Run training on 1 HPU:

Run training on 1 HPU, Gradient accumulation=64, mixed precision (BF16):

fairseq-hydra-train task.data=/data/pytorch/wav2vec/data/LibriSpeech/train-960/manifest/ --config-dir examples/wav2vec/config/pretraining --config-name wav2vec2_base_librispeech_hpu

Multi-Card Training

To run multi-card demo, the following is required:

The host machine has 512 GB of RAM installed.
Make sure to follow the Gaudi Setup and Installation Guide to install and set up the docker, so that it has access to all 8 cards required for multi-card demo.
Before executing the multi-card demo scripts, make sure all server network interfaces are up. You can change the state of each network interface managed by the habanalabs driver using the following command:
```
sudo ip link set <interface_name> up
```
To identify if a specific network interface is managed by the habanalabs driver type, run:
```
sudo ethtool -i <interface_name>
```

Run training on 8 HPUs:

Note: The number of cards can be configured using --world_size option in the demo script as shown below.

Modify the wav2vec2_base_librispeech_hpu.yaml under /path/to/fairseq/examples/wav2vec/config/pretraining/.
Set distributed_world_size to 8:

distributed_training:
  distributed_world_size: 8

Set update_freq to 8:

optimization:
  max_update: 400000
  lr: [0.0005]
  update_freq: [8]

Run the following command (first-gen Gaudi):

fairseq-hydra-train task.data=/data/pytorch/wav2vec/data/LibriSpeech/train-960/manifest/ --config-dir examples/wav2vec/config/pretraining --config-name wav2vec2_base_librispeech_hpu

Run the following command (Gaudi 2):

PT_RECIPE_CACHE_PATH="./cache_dir/" common.log_interval=111 common.hpu_graphs=true fairseq-hydra-train task.data=/data/pytorch/wav2vec/data/LibriSpeech/train-960/manifest/ --config-dir examples/wav2vec/config/pretraining --config-name wav2vec2_base_librispeech_hpu

Supported Configurations

Device	Intel Gaudi Software Version	PyTorch Version	Mode
Gaudi	1.13.0	2.1.0	Training
Gaudi 2	1.13.0	2.1.0	Training

Changelog

v1.9.0

Added HPU graph support to model script. Enabled HPU graph flags for Gaudi 2 only.

Past changes (ported from HabanaAI/model_garden repo)

Marked copy to device(inputs) as async.
Added async allreduce for sample_size.
Removed host barrier in Wav2vec.
Replaced isnonzero with where op to unblock the host.
Only fetch the log statistics to CPU when needed.
Replaced broadcast+sum with equal algorithm to save memory in Quantizer module.
Created a customized version of cos_similarity via removing the broadcast operations.
Moved negative indices generation to HPU.
Changed the data type of randint to int16 to save the memory copyfrom host to device when generating negative indics.
Replaced conv1d with equivalent conv2d.
Replaced group norm with equivalent instance norm.

Training Script Modifications

The following are the changes made to the training scripts:

Added support for Gaudi devices:
- Defined certain environment variables Gaudi devices.
- Added support to run training in lazy mode.
- mark_step() is performed to trigger execution.
- Added support of bucketting, padding, and Precompute loss for HPU.
- Added support to use HPU accelerator plugin, DDP plugin(for multi-HPU training) and mixed precision plugin.
- Added support of fairseq_hydra_train for multi-node training.
- Disabled auto dynamic shape support.

Known Issues

Only the above configurations mentioned are supported and verified.
Training on 1 HPU with FP32 data type has OOM issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Wav2vec 2.0

Table of Contents

Model Overview

Setup

Clone Intel Gaudi Model-References

Install Model Requirements

Set up the Dataset

Training and Examples

Single Card Training

Multi-Card Training

Supported Configurations

Changelog

v1.9.0

Past changes (ported from HabanaAI/model_garden repo)

Training Script Modifications

Known Issues

Files

README.md

Latest commit

History

README.md

File metadata and controls

Wav2vec 2.0

Table of Contents

Model Overview

Setup

Clone Intel Gaudi Model-References

Install Model Requirements

Set up the Dataset

Training and Examples

Single Card Training

Multi-Card Training

Supported Configurations

Changelog

v1.9.0

Past changes (ported from HabanaAI/model_garden repo)

Training Script Modifications

Known Issues