Skip to content

Commit

Permalink
Merge branch 'main' into export_wordlist_fix
Browse files Browse the repository at this point in the history
  • Loading branch information
oyilmaz-nvidia authored Apr 22, 2024
2 parents 494e4a1 + a452a4f commit 37ead2b
Show file tree
Hide file tree
Showing 24 changed files with 1,533 additions and 116 deletions.
101 changes: 74 additions & 27 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,12 +188,15 @@ The NeMo Framework can be installed in a variety of ways, depending on your need
* This is recommended for Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) domains.
* When using a Nvidia PyTorch container as the base, this is the recommended installation method for all domains.

* Docker - Refer to the `Docker containers <#docker-containers>`_ section for installation instructions.
* Docker Containers - Refer to the `Docker containers <#docker-containers>`_ section for installation instructions.

* This is recommended for Large Language Models (LLM), Multimodal and Vision domains.
* NeMo LLM & Multimodal Container - `nvcr.io/nvidia/nemo:24.01.01.framework`
* NeMo LLM & Multimodal Container - `nvcr.io/nvidia/nemo:24.03.framework`
* NeMo Speech Container - `nvcr.io/nvidia/nemo:24.01.speech`

* LLM and Multimodal Dependencies - Refer to the `LLM and Multimodal dependencies <#llm-and-multimodal-dependencies>`_ section for isntallation instructions.
* It's higly recommended to start with a base NVIDIA PyTorch container: `nvcr.io/nvidia/pytorch:24.02-py3`

Conda
~~~~~

Expand Down Expand Up @@ -330,23 +333,59 @@ Note that RNNT requires numba to be installed from conda.
pip uninstall numba
conda install -c conda-forge numba
LLM and Multimodal Dependencies
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The LLM and Multimodal domains require three additional dependencies:
NVIDIA Apex, NVIDIA Transformer Engine, and NVIDIA Megatron Core.

When working with the `main` branch these dependencies may require a recent commit.
The most recent working versions of these dependencies are:

.. code-block:: bash
export apex_commit=810ffae374a2b9cb4b5c5e28eaeca7d7998fca0c
export te_commit=bfe21c3d68b0a9951e5716fb520045db53419c5e
export mcore_commit=fbb375d4b5e88ce52f5f7125053068caff47f93f
export nv_pytorch_tag=24.02-py3
When using a released version of NeMo,
please refer to the `Software Component Versions <https://docs.nvidia.com/nemo-framework/user-guide/latest/softwarecomponentversions.html>`_
for the correct versions.

If starting with a base NVIDIA PyTorch container first launch the container:

.. code-block:: bash
docker run \
--gpus all \
-it \
--rm \
--shm-size=16g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
nvcr.io/nvidia/pytorch:$nv_pytorch_tag
Then install the dependencies:

Apex
~~~~
NeMo LLM Domain training requires NVIDIA Apex to be installed.
Install it manually if not using the NVIDIA PyTorch container.
NeMo LLM Multimodal Domains require that NVIDIA Apex to be installed.
Apex comes installed in the NVIDIA PyTorch container but it's possible that
NeMo LLM and Multimodal may need to be updated to a newer version.

To install Apex, run

.. code-block:: bash
git clone https://github.com/NVIDIA/apex.git
cd apex
git checkout b496d85fb88a801d8e680872a12822de310951fd
pip install -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam" ./
git checkout $apex_commit
pip install . -v --no-build-isolation --disable-pip-version-check --no-cache-dir --config-settings "--build-option=--cpp_ext --cuda_ext --fast_layer_norm --distributed_adam --deprecated_fused_adam --group_norm"
It is highly recommended to use the NVIDIA PyTorch or NeMo container if having issues installing Apex or any other dependencies.
While installing Apex, it may raise an error if the CUDA version on your system does not match the CUDA version torch was compiled with.
While installing Apex outside of the NVIDIA PyTorch container,
it may raise an error if the CUDA version on your system does not match the CUDA version torch was compiled with.
This raise can be avoided by commenting it here: https://github.com/NVIDIA/apex/blob/master/setup.py#L32

cuda-nvprof is needed to install Apex. The version should match the CUDA version that you are using:
Expand All @@ -366,35 +405,43 @@ With the latest versions of Apex, the `pyproject.toml` file in Apex may need to

Transformer Engine
~~~~~~~~~~~~~~~~~~
NeMo LLM Domain has been integrated with `NVIDIA Transformer Engine <https://github.com/NVIDIA/TransformerEngine>`_
Transformer Engine enables FP8 training on NVIDIA Hopper GPUs.
`Install <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html>`_ it manually if not using the NVIDIA PyTorch container.

.. code-block:: bash

pip install --upgrade git+https://github.com/NVIDIA/TransformerEngine.git@stable
The NeMo LLM Multimodal Domains require that NVIDIA Transformer Engine to be installed.
Transformer Engine comes installed in the NVIDIA PyTorch container but it's possible that
NeMo LLM and Multimodal may need Transformer Engine to be updated to a newer version.

It is highly recommended to use the NVIDIA PyTorch or NeMo container if having issues installing Transformer Engine or any other dependencies.
Transformer Engine enables FP8 training on NVIDIA Hopper GPUs and many performance optimizations for transformer-based model training.
Documentation for installing Transformer Engine can be found `here <https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html>`_.

Transformer Engine requires PyTorch to be built with CUDA 11.8.
.. code-block:: bash
git clone https://github.com/NVIDIA/TransformerEngine.git && \
cd TransformerEngine && \
git checkout $te_commit && \
git submodule init && git submodule update && \
NVTE_FRAMEWORK=pytorch NVTE_WITH_USERBUFFERS=1 MPI_HOME=/usr/local/mpi pip install .
Flash Attention
~~~~~~~~~~~~~~~
When traning Large Language Models in NeMo, users may opt to use Flash Attention for efficient training. Transformer Engine already supports Flash Attention for GPT models. If you want to use Flash Attention for non-causal models, please install `flash-attn <https://github.com/HazyResearch/flash-attention>`_. If you want to use Flash Attention with attention bias (introduced from position encoding, e.g. Alibi), please also install triton pinned version following the `implementation <https://github.com/Dao-AILab/flash-attention/blob/main/flash_attn/flash_attn_triton.py#L3>`_.
Transformer Engine requires PyTorch to be built with at least CUDA 11.8.

.. code-block:: bash
Megatron Core
~~~~~~~~~~~~~

pip install flash-attn
pip install triton==2.0.0.dev20221202
The NeMo LLM Multimodal Domains require that NVIDIA Megatron Core to be installed.
Megatron core is a library for scaling large transfromer base models.
NeMo LLM and Multimodal models leverage Megatron Core for model parallelism,
transformer architectures, and optimized pytorch datasets.

NLP inference UI
~~~~~~~~~~~~~~~~~~~~
To launch the inference web UI server, please install the gradio `gradio <https://gradio.app/>`_.
NeMo LLM and Multimodal may need Megatron Core to be updated to a recent version.

.. code-block:: bash
pip install gradio==3.34.0
git clone https://github.com/NVIDIA/Megatron-LM.git && \
cd Megatron-LM && \
git checkout $mcore_commit && \
pip install . && \
cd megatron/core/datasets && \
make
NeMo Text Processing
~~~~~~~~~~~~~~~~~~~~
Expand All @@ -404,7 +451,7 @@ Docker containers
~~~~~~~~~~~~~~~~~
We release NeMo containers alongside NeMo releases. For example, NeMo ``r1.23.0`` comes with container ``nemo:24.01.speech``, you may find more details about released containers in `releases page <https://github.com/NVIDIA/NeMo/releases>`_.

To use built container, please run
To use a pre-built container, please run

.. code-block:: bash
Expand Down
2 changes: 1 addition & 1 deletion docs/source/asr/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Model Classes

.. _confidence-ensembles-api:

.. autoclass:: nemo.collections.asr.models.confidence_ensembles.ConfidenceEnsembleModel
.. autoclass:: nemo.collections.asr.models.confidence_ensemble.ConfidenceEnsembleModel
:show-inheritance:
:members: transcribe

Expand Down
4 changes: 4 additions & 0 deletions docs/source/asr/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -806,13 +806,17 @@ We recommend to pre-compute the bucket duration bins in order to accelerate the
The following script may be used:

.. code-block:: bash
$ python scripts/speech_recognition/estimate_duration_bins.py -b 30 manifest.json
Use the following options in your config:
num_buckets=30
bucket_duration_bins=[1.78,2.34,2.69,...
<other diagnostic information about the dataset>
For multi-dataset setups, one may provide multiple manifests and even their weights:
.. code-block:: bash
$ python scripts/speech_recognition/estimate_duration_bins.py -b 30 [[manifest.json,0.7],[other.json,0.3]]
Use the following options in your config:
num_buckets=30
Expand Down
2 changes: 1 addition & 1 deletion docs/source/asr/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ For more details about this model, see the `paper <https://arxiv.org/abs/2306.15
or read our `tutorial <https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/Confidence_Ensembles.ipynb>`_.

NeMo support Confidence-based Ensembles through the
:ref:`nemo.collections.asr.models.confidence_ensembles.ConfidenceEnsembleModel <confidence-ensembles-api>` class.
:ref:`nemo.collections.asr.models.confidence_ensemble.ConfidenceEnsembleModel <confidence-ensembles-api>` class.

A typical workflow to create and use the ensemble is like this

Expand Down
13 changes: 13 additions & 0 deletions nemo/collections/asr/parts/submodules/ctc_decoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,10 @@ class AbstractCTCDecoding(ConfidenceMixin):
Which aggregation type to use for collapsing per-token confidence into per-word confidence.
Valid options are `mean`, `min`, `max`, `prod`.
tdt_include_duration: Bool flag indicating that the duration confidence scores are to be calculated and
attached to the regular frame confidence,
making TDT frame confidence element a pair: (`prediction_confidence`, `duration_confidence`).
method_cfg:
A dict-like object which contains the method name and settings to compute per-frame
confidence scores.
Expand Down Expand Up @@ -911,10 +915,15 @@ class CTCDecoding(AbstractCTCDecoding):
exclude_blank:
Bool flag indicating that blank token confidence scores are to be excluded
from the `token_confidence`.
aggregation:
Which aggregation type to use for collapsing per-token confidence into per-word confidence.
Valid options are `mean`, `min`, `max`, `prod`.
tdt_include_duration: Bool flag indicating that the duration confidence scores are to be calculated and
attached to the regular frame confidence,
making TDT frame confidence element a pair: (`prediction_confidence`, `duration_confidence`).
method_cfg:
A dict-like object which contains the method name and settings to compute per-frame
confidence scores.
Expand Down Expand Up @@ -1122,6 +1131,10 @@ class CTCBPEDecoding(AbstractCTCDecoding):
Which aggregation type to use for collapsing per-token confidence into per-word confidence.
Valid options are `mean`, `min`, `max`, `prod`.
tdt_include_duration: Bool flag indicating that the duration confidence scores are to be calculated and
attached to the regular frame confidence,
making TDT frame confidence element a pair: (`prediction_confidence`, `duration_confidence`).
method_cfg:
A dict-like object which contains the method name and settings to compute per-frame
confidence scores.
Expand Down
Loading

0 comments on commit 37ead2b

Please sign in to comment.