Skip to content

Commit

Permalink
TensorRT 10.0 Release
Browse files Browse the repository at this point in the history
Signed-off-by: Asfiya Baig <asfiyab@nvidia.com>
  • Loading branch information
asfiyab-nvidia authored and rajeevsrao committed Apr 3, 2024
1 parent 3d97932 commit 147005f
Show file tree
Hide file tree
Showing 941 changed files with 50,512 additions and 37,626 deletions.
129 changes: 129 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,134 @@
# TensorRT OSS Release Changelog

## 10.0.0 EA - 2024-04-02

Key Features and Updates:

- Samples changes
- Added a [sample](samples/python/sample_weight_stripping) showcasing weight-stripped engines.
- Added a [sample](samples/python/python_plugin/circ_pad_plugin_multi_tactic.py) demonstrating the use of custom tactics with IPluginV3.
- Added a [sample](samples/sampleNonZeroPlugin) to showcase plugins with data-dependent output shapes, using IPluginV3.
- Parser changes
- Added a new class `IParserRefitter` that can be used to refit a TensorRT engine with the weights of an ONNX model.
- `kNATIVE_INSTANCENORM` is now set to ON by default.
- Added support for `IPluginV3` interfaces from TensorRT.
- Added support for `INT4` quantization.
- Added support for the `reduction` attribute in `ScatterElements`.
- Added support for `wrap` padding mode in `Pad`
- Plugin changes
- A [new plugin](plugin/scatterElementsPlugin) has been added in compliance with [ONNX ScatterElements](https://github.com/onnx/onnx/blob/main/docs/Operators.md#ScatterElements).
- The TensorRT plugin library no longer has a load-time link dependency on cuBLAS or cuDNN libraries.
- All plugins which relied on cuBLAS/cuDNN handles passed through `IPluginV2Ext::attachToContext()` have moved to use cuBLAS/cuDNN resources initialized by the plugin library itself. This works by dynamically loading the required cuBLAS/cuDNN library. Additionally, plugins which independently initialized their cuBLAS/cuDNN resources have also moved to dynamically loading the required library. If the respective library is not discoverable through the library path(s), these plugins will not work.
- bertQKVToContextPlugin: Version 2 of this plugin now supports head sizes less than or equal to 32.
- reorgPlugin: Added a version 2 which implements IPluginV2DynamicExt.
- disentangledAttentionPlugin: Fixed a kernel bug.
- Demo changes
- HuggingFace demos have been removed. For all users using TensorRT to accelerate Large Language Model inference, please use [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/).
- Updated tooling
- Polygraphy v0.49.9
- ONNX-GraphSurgeon v0.5.1
- TensorRT Engine Explorer v0.1.8
- Build Containers
- RedHat/CentOS 7.x are no longer officially supported starting with TensorRT 10.0. The corresponding container has been removed from TensorRT-OSS.

## 9.3.0 GA - 2024-02-09

Key Features and Updates:

- Demo changes
- Faster Text-to-image using SDXL & INT8 quantization using AMMO
- Updated tooling
- Polygraphy v0.49.7

## 9.2.0 GA - 2023-11-27

Key Features and Updates:

- `trtexec` enhancement: Added `--weightless` flag to mark the engine as weightless.
- Parser changes
- Added support for Hardmax operator.
- Changes to a few operator importers to ensure that TensorRT preserves the precision of operations when using strongly typed mode.
- Plugin changes
- Explicit INT8 support added to `bertQKVToContextPlugin`.
- Various bug fixes.
- Updated HuggingFace demo to use transformers v4.31.0 and PyTorch v2.1.0.


## 9.1.0 GA - 2023-10-18

Key Features and Updates:

- Update the [trt_python_plugin](samples/python/python_plugin) sample.
- Python plugins API reference is part of the offical TRT Python API.
- Added samples demonstrating the usage of the progress monitor API.
- Check [sampleProgressMonitor](samples/sampleProgressMonitor) for the C++ sample.
- Check [simple_progress_monitor](samples/python/simple_progress_monitor) for the Python sample.
- Remove dependencies related to python<3.8 in python samples as we no longer support python<3.8 for python samples.
- Demo changes
- Added LAMBADA dataset accuracy checks in the [HuggingFace](demo/HuggingFace) demo.
- Enabled structured sparsity and FP8 quantized batch matrix multiplication(BMM)s in attention in the [NeMo](demo/NeMo) demo.
- Replaced deprecated APIs in the [BERT](demo/BERT) demo.
- Updated tooling
- Polygraphy v0.49.1


## 9.0.1 GA - 2023-09-07

Key Features and Updates:

- TensorRT plugin autorhing in Python is now supported
- See the [trt_python_plugin](samples/python/python_plugin) sample for reference.
- Updated default CUDA version to 12.2
- Support for BLIP models, Seq2Seq and Vision2Seq abstractions in HuggingFace demo.
- demoDiffusion refactoring and SDXL enhancements
- Additional validation asserts for NV Plugins
- Updated tooling
- TensorRT Engine Explorer v0.1.7: graph rendering for TensorRT 9.0 `kgen` kernels
- ONNX-GraphSurgeon v0.3.29
- PyTorch quantization toolkit v2.2.0


## 9.0.0 EA - 2023-08-06

Key Features and Updates:

- Added the NeMo demo to demonstrate the performance benefit of using E4M3 FP8 data type with the GPT models trained with the [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) and [TransformerEngine](https://github.com/NVIDIA/TransformerEngine).
- Demo Diffusion updates
- Added SDXL 1.0 txt2img pipeline
- Added ControlNet pipeline
- Huggingface demo updates
- Added Flan-T5, OPT, BLOOM, BLOOMZ, GPT-Neo, GPT-NeoX, Cerebras-GPT support with accuracy check
- Refactored code and extracted common utils into Seq2Seq class
- Optimized shape-changing overhead and achieved a >30% e2e performance gain
- Added stable KV-cache, beam search and fp16 support for all models
- Added dynamic batch size TRT inference
- Added uneven-length multi-batch inference with attention_mask support
- Added `chat` command – interactive CLI
- Upgraded PyTorch and HuggingFace version to support Hopper GPU
- Updated notebooks with much simplified demo API.

- Added two new TensorRT samples: sampleProgressMonitor (C++) and simple_progress_reporter (Python) that are examples for using Progress Monitor during engine build.
- The following plugins were deprecated:
- ``BatchedNMS_TRT``
- ``BatchedNMSDynamic_TRT``
- ``BatchTilePlugin_TRT``
- ``Clip_TRT``
- ``CoordConvAC``
- ``CropAndResize``
- ``EfficientNMS_ONNX_TRT``
- ``CustomGeluPluginDynamic``
- ``LReLU_TRT``
- ``NMSDynamic_TRT``
- ``NMS_TRT``
- ``Normalize_TRT``
- ``Proposal``
- ``SingleStepLSTMPlugin``
- ``SpecialSlice_TRT``
- ``Split``

- Ubuntu 18.04 has reached end of life and is no longer supported by TensorRT starting with 9.0, and the corresponding Dockerfile(s) have been removed.
- Support for aarch64 builds will not be available in this release, and the corresponding Dockerfiles have been removed.

## [8.6.1 GA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/#rel-8-6-1) - 2023-05-02

TensorRT OSS release corresponding to TensorRT 8.6.1.6 GA release.
Expand Down
60 changes: 32 additions & 28 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,41 @@ include(cmake/modules/find_library_create_target.cmake)
set_ifndef(TRT_LIB_DIR ${CMAKE_BINARY_DIR})
set_ifndef(TRT_OUT_DIR ${CMAKE_BINARY_DIR})

# Converts Windows paths
if(CMAKE_VERSION VERSION_LESS 3.20)
file(TO_CMAKE_PATH "${TRT_LIB_DIR}" TRT_LIB_DIR)
file(TO_CMAKE_PATH "${TRT_OUT_DIR}" TRT_OUT_DIR)
else()
cmake_path(SET TRT_LIB_DIR ${TRT_LIB_DIR})
cmake_path(SET TRT_OUT_DIR ${TRT_OUT_DIR})
endif()

# Required to export symbols to build *.libs
if(WIN32)
add_compile_definitions(TENSORRT_BUILD_LIB 1)
endif()

# Set output paths
set(RUNTIME_OUTPUT_DIRECTORY ${TRT_OUT_DIR} CACHE PATH "Output directory for runtime target files")
set(LIBRARY_OUTPUT_DIRECTORY ${TRT_OUT_DIR} CACHE PATH "Output directory for library target files")
set(ARCHIVE_OUTPUT_DIRECTORY ${TRT_OUT_DIR} CACHE PATH "Output directory for archive target files")

if(WIN32)
set(STATIC_LIB_EXT "lib")
else()
set(STATIC_LIB_EXT "a")
endif()

file(STRINGS "${CMAKE_CURRENT_SOURCE_DIR}/include/NvInferVersion.h" VERSION_STRINGS REGEX "#define NV_TENSORRT_.*")

foreach(TYPE MAJOR MINOR PATCH BUILD)
string(REGEX MATCH "NV_TENSORRT_${TYPE} [0-9]" TRT_TYPE_STRING ${VERSION_STRINGS})
string(REGEX MATCH "[0-9]" TRT_${TYPE} ${TRT_TYPE_STRING})
endforeach(TYPE)

foreach(TYPE MAJOR MINOR PATCH)
string(REGEX MATCH "NV_TENSORRT_SONAME_${TYPE} [0-9]" TRT_TYPE_STRING ${VERSION_STRINGS})
string(REGEX MATCH "[0-9]" TRT_SO_${TYPE} ${TRT_TYPE_STRING})
string(REGEX MATCH "NV_TENSORRT_${TYPE} [0-9]+" TRT_TYPE_STRING ${VERSION_STRINGS})
string(REGEX MATCH "[0-9]+" TRT_${TYPE} ${TRT_TYPE_STRING})
endforeach(TYPE)

set(TRT_VERSION "${TRT_MAJOR}.${TRT_MINOR}.${TRT_PATCH}" CACHE STRING "TensorRT project version")
set(ONNX2TRT_VERSION "${TRT_MAJOR}.${TRT_MINOR}.${TRT_PATCH}" CACHE STRING "ONNX2TRT project version")
set(TRT_SOVERSION "${TRT_SO_MAJOR}" CACHE STRING "TensorRT library so version")
set(TRT_SOVERSION "${TRT_MAJOR}" CACHE STRING "TensorRT library so version")
message("Building for TensorRT version: ${TRT_VERSION}, library version: ${TRT_SOVERSION}")

if(NOT DEFINED CMAKE_TOOLCHAIN_FILE)
Expand Down Expand Up @@ -88,8 +108,8 @@ endif()
############################################################################################
# Dependencies

set(DEFAULT_CUDA_VERSION 12.0.1)
set(DEFAULT_CUDNN_VERSION 8.8)
set(DEFAULT_CUDA_VERSION 12.2.0)
set(DEFAULT_CUDNN_VERSION 8.9)
set(DEFAULT_PROTOBUF_VERSION 3.20.1)

# Dependency Version Resolution
Expand Down Expand Up @@ -118,20 +138,12 @@ endif()

include_directories(
${CUDA_INCLUDE_DIRS}
${CUDNN_ROOT_DIR}/include
)
find_library(CUDNN_LIB cudnn HINTS
${CUDA_TOOLKIT_ROOT_DIR} ${CUDNN_ROOT_DIR} PATH_SUFFIXES lib64 lib/x64 lib)
find_library(CUBLAS_LIB cublas HINTS
${CUDA_TOOLKIT_ROOT_DIR} PATH_SUFFIXES lib64 lib lib/x64 lib/stubs)
find_library(CUBLASLT_LIB cublasLt HINTS
${CUDA_TOOLKIT_ROOT_DIR} PATH_SUFFIXES lib64 lib lib/x64 lib/stubs)
if(BUILD_PARSERS)
configure_protobuf(${PROTOBUF_VERSION})
endif()

find_library_create_target(nvinfer nvinfer SHARED ${TRT_LIB_DIR})
find_library_create_target(nvuffparser nvparsers SHARED ${TRT_LIB_DIR})

find_library(CUDART_LIB cudart_static HINTS ${CUDA_TOOLKIT_ROOT_DIR} PATH_SUFFIXES lib lib/x64 lib64)

Expand All @@ -149,18 +161,11 @@ if (DEFINED GPU_ARCHS)
separate_arguments(GPU_ARCHS)
else()
list(APPEND GPU_ARCHS
53
60
61
70
75
)

string(REGEX MATCH "aarch64" IS_ARM "${TRT_PLATFORM_ID}")
if (IS_ARM)
# Xavier (SM72) only supported for aarch64.
list(APPEND GPU_ARCHS 72)
endif()

if (CUDA_VERSION VERSION_GREATER_EQUAL 11.0)
# Ampere GPU (SM80) support is only available in CUDA versions > 11.0
Expand Down Expand Up @@ -189,10 +194,10 @@ if (${LATEST_SM} GREATER_EQUAL 70)
endif()

if(NOT MSVC)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler -Wno-deprecated-declarations")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr -Xcompiler -Wno-deprecated-declarations")
else()
set(CMAKE_CUDA_SEPARABLE_COMPILATION ON)
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -Xcompiler")
set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} --expt-relaxed-constexpr -Xcompiler")
endif()

############################################################################################
Expand All @@ -207,7 +212,6 @@ endif()
if(BUILD_PARSERS)
add_subdirectory(parsers)
else()
find_library_create_target(nvcaffeparser nvparsers SHARED ${TRT_OUT_DIR} ${TRT_LIB_DIR})
find_library_create_target(nvonnxparser nvonnxparser SHARED ${TRT_OUT_DIR} ${TRT_LIB_DIR})
endif()

Expand Down
Loading

0 comments on commit 147005f

Please sign in to comment.