Skip to content

Commit

Permalink
Merge branch 'rocm-3.9.x'
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Lowell committed Oct 27, 2020
2 parents 807c08e + 600d591 commit add232c
Show file tree
Hide file tree
Showing 42 changed files with 3,544 additions and 1,663 deletions.
11 changes: 8 additions & 3 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ if(NOT WIN32 AND NOT APPLE)
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -s")
endif()

rocm_setup_version(VERSION 2.7.0)
rocm_setup_version(VERSION 2.8.0)

list( APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake )
include(TargetFlags)
Expand Down Expand Up @@ -115,12 +115,17 @@ set(MIOPEN_INSTALL_CXX_HEADERS Off CACHE BOOL "Install MIOpen's C++ header inter


# Embedded Build Configuration
set(MIOPEN_EMBED_DB "" CACHE STRING "Semi-colon separated list of of architecture CU pairs to embed on-disk DBs in the binary. Example gfx906_60:gfx900_56")
set(MIOPEN_EMBED_DB "" CACHE STRING "Semi-colon separated list of architecture CU pairs to embed on-disk DBs in the binary. Example gfx906_60;gfx900_56")
if(NOT MIOPEN_EMBED_DB STREQUAL "")
option(MIOPEN_DISABLE_SYSDB "Disable sys database access" Off)
else()
option(MIOPEN_DISABLE_SYSDB "Disable sys database access" ${MIOPEN_EMBED_BUILD})
endif()
set(MIOPEN_BINCACHE_PATH "" CACHE STRING "URL or path containing binary cache files to embed")
option(MIOPEN_EMBED_BUILD "Build with the set of embed flags." Off)
option(MIOPEN_USE_COMGR "Use comgr to build kernels instead of offline tools" ${MIOPEN_EMBED_BUILD})
option(MIOPEN_DISABLE_USERDB "Disable user database access" ${MIOPEN_EMBED_BUILD})
option(MIOPEN_DISABLE_SYSDB "Disable sys database access" ${MIOPEN_EMBED_BUILD})


# MIOPEN_USE_HIP_KERNELS is a Workaround for COMgr issues
if(MIOPEN_EMBED_BUILD)
Expand Down
93 changes: 93 additions & 0 deletions doc/src/embed.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@

Building MIOpen for Embedded Systems
====================================



### Install dependencies
Install minimum dependencies (default location /usr/local):
```
cmake -P install_deps.cmake --minimum --prefix /some/local/dir
```

Create build directory:
```
mkdir build; cd build;
```

### Configuring for an embedded build
Minimal static build configuration line without embedded precompiled kernels package, or Find-Db:
```
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_BACKEND=HIP -DMIOPEN_EMBED_BUILD=On -DCMAKE_PREFIX_PATH="/some/local/dir" ..
```

### Embedding Find-Db and Performance database:
The Find-db provides a database of known convolution inputs. This allows user to have the best tuned kernels for their network. Embedding find-db requires a semi-colon separated list of architecture CU pairs to embed on-disk DBs in the binary; e.g., gfx906_60;gfx900_56.

Example:
```
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_EMBED_BUILD=On -DMIOPEN_EMBED_DB=gfx900_56 ..
```

This will configure the build directory for embedding not just the find-db, but also the performance database.

### Embedding the precompiled kernels package:
To prevent the loss of performance due to compile time overhead, a build of MIOpen can take advantage of embedding the precompiled kernels package. The precompiled kernels package contains convolution kernels of known inputs and allows the user to avoid compiling kernels during runtime.

### Embedding precompiled package

#### Using a package install
To install the precompiled kernels package use the command:
```
apt-get install miopenkernels-<arch>-<num cu>
```
Where `<arch>` is the GPU architecture (for example, gfx900, gfx906) and `<num cu>` is the number of CUs available in the GPU (for example 56 or 64 etc).

Not installing the precompiled kernel package would not impact the functioning of MIOpen, since MIOpen will compile these kernels on the target machine once the kernel is run, however, the compilation step may significantly increase the startup time for different operations.

The script `utils/install_precompiled_kernels.sh` provided as part of MIOpen automates the above process, it queries the user machine for the GPU architecture and then installs the appropriate package. It may be invoked as:
```
./utils/install_precompiled_kernels.sh
```

To embed the precompiled kernels package, configure cmake using the `MIOPEN_BINCACHE_PATH`
Example:
```
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_BINCACHE_PATH=/path/to/package/install -DMIOPEN_EMBED_BUILD=On ..
```

#### Using the URL to a kernels binary
Alternatively, the flag `MIOPEN_BINCACHE_PATH` can be used with a URL that contains the binary.
Example:
```
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_BINCACHE_PATH=/URL/to/binary -DMIOPEN_EMBED_BUILD=On ..
```

Precompiled kernels packages are installed in `/opt/rocm/miopen/share/miopen/db`.
An example with the architecture gfx900 with 56 compute units:
```
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_BINCACHE_PATH=/opt/rocm/miopen/share/miopen/db/gfx900_56.kdb -DMIOPEN_EMBED_BUILD=On ..
```


As of ROCm 3.8 / MIOpen 2.7 precompiled kernels binaries are located at [repo.radeon.com](http://repo.radeon.com/rocm/miopen-kernel/)
For example for the architecture gfx906 with 64 compute units:
```
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_BINCACHE_PATH=http://repo.radeon.com/rocm/miopen-kernel/rel-3.8/gfx906_60.kdb -DMIOPEN_EMBED_BUILD=On ..
```

### Full configuration line:
Putting it all together, building MIOpen statically, and embedding the performance database, find-db, and the precompiled kernels binary:
```
CXX=/opt/rocm/llvm/bin/clang++ cmake -DMIOPEN_BINCACHE_PATH=/path/to/package/install -DMIOPEN_EMBED_BUILD=On -DMIOPEN_EMBED_DB=gfx900_56 ..
```

After configuration is complete, run:
```
make -j
```





5 changes: 3 additions & 2 deletions doc/src/find_and_immediate.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,13 +167,14 @@ MIOpen provides a set of Find modes which are used to accelerate the Find calls.
- `NORMAL`, or `1`: Normal Find: This is the full Find mode call, which will benchmark all the solvers and return a list.
- `FAST`, or `2`: Fast Find: Checks the [Find-Db](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/finddb.html) for an entry. If there is a Find-Db hit, use that entry. If there is a miss, utilize the Immediate mode fallback. If Start-up times are expected to be faster, but worse GPU performance.
- `HYBRID`, or `3`, or unset `MIOPEN_FIND_MODE`: Hybrid Find: Checks the [Find-Db](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/finddb.html) for an entry. If there is a Find-Db hit, use that entry. If there is a miss, use the existing Find machinery. Slower start-up times than Fast Find, but no GPU performance drop.
- `FAST_HYBRID`, or `4`: Fast Hybrid Find: Checks the [Find-Db](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/finddb.html) for an entry. If there is a Find-Db hit, use that entry. If there is a miss, use the existing Find machinery with skipping slow-compiling kernels. Faster start-up times than Hybrid Find, but no GPU performance drop.

As of MIOpen 2.6, the default mode is set to `HYBRID` mode as default. To run the full `NORMAL` Find mode, set the environment as:
As of MIOpen 2.7, the default mode is set to `HYBRID` mode as default. To run the full `NORMAL` Find mode, set the environment as:
```
export MIOPEN_FIND_MODE=NORMAL
```
Or,
```
export MIOPEN_FIND_MODE=1
```


1 change: 1 addition & 0 deletions doc/src/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Sources and binaries can be found at `MIOpen's GitHub site <https://github.com/R
releasenotes
citation
install
embed
driver
DebugAndLogging
cache
Expand Down
19 changes: 18 additions & 1 deletion doc/src/releasenotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,24 @@
## MIOpen Release notes


### 09/14/2020 [ 2.7.0 ]

### 10/28/2020 [ 2.8.0 ]

This release provides additional bug fixes and support for embedded build using MIOpen as a static library.

- Fixed workspace size calculation for GEMM group convolutions
- Fixed performance regression for M/N
- Fixed issue with faulty compiler option
- Fixed typo in components dependency variable in CMakeLists.txt
- Fixed issues with COMgr backed online compilation for HIP kernels
- Added cmake flag for embedding system databases when building a static library
- Added a way to disable building MIOpenDriver when building a static library
- Added CC compiler detection in ROCm environment
- Known issue: This release may show warnings for "obsolete configs" in the performance database. This can be fixed by rerunning tuning on a specfic network; [see tuning documentation](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/perfdatabase.html#miopen-find-enforce)



### 09/18/2020 [ 2.7.0 ]

- This release contains a new reduction API; see [API documentation](https://rocmsoftwareplatform.github.io/MIOpen/doc/html/apireference.html) for more information. Additional features for embedded builds have been added, and further support for 3D convolutional networks.

Expand Down
4 changes: 2 additions & 2 deletions driver/driver.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ void PadBufferSize(size_t& sz, int datatype_sz)
printf(
"Supported Base Arguments: conv[fp16|int8|bfp16], CBAInfer[fp16], pool[fp16], lrn[fp16], "
"activ[fp16], softmax[fp16], bnorm[fp16], rnn[fp16], gemm, ctc, dropout[fp16], "
"tensorop[fp16]\n");
"tensorop[fp16], reduce[fp16]\n");
exit(0);
}

Expand All @@ -150,7 +150,7 @@ std::string ParseBaseArg(int argc, char* argv[])
arg != "softmax" && arg != "softmaxfp16" && arg != "bnorm" && arg != "bnormfp16" &&
arg != "rnn" && arg != "rnnfp16" && arg != "gemm" /*&& arg != "gemmfp16"*/ && arg != "ctc" &&
arg != "dropout" && arg != "dropoutfp16" && arg != "tensorop" && arg != "tensoropfp16" &&
arg != "--version")
arg != "reduce" && arg != "reducefp16" && arg != "--version")
{
printf("Invalid Base Input Argument\n");
Usage();
Expand Down
9 changes: 9 additions & 0 deletions driver/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
#include "ctc_driver.hpp"
#include "dropout_driver.hpp"
#include "tensorop_driver.hpp"
#include "reduce_driver.hpp"
#include "miopen/config.h"

int main(int argc, char* argv[])
Expand Down Expand Up @@ -165,6 +166,14 @@ int main(int argc, char* argv[])
{
drv = new TensorOpDriver<float16, float>();
}
else if(base_arg == "reduce")
{
drv = new ReduceDriver<float, float>();
}
else if(base_arg == "reducefp16")
{
drv = new ReduceDriver<float16, float>();
}
else
{
printf("Incorrect BaseArg\n");
Expand Down
Loading

0 comments on commit add232c

Please sign in to comment.