Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump MIOpen version to 3.1.0 and update CI docker #2519

Merged
merged 38 commits into from
Dec 20, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
309309b
Update docker and miopen version
atamazov Nov 4, 2023
61e1721
update FIN to develop
atamazov Nov 4, 2023
58c4d97
update a few more requirements
junliume Nov 11, 2023
39572c0
overrisde existing installed files
junliume Nov 12, 2023
4a011d1
purge CK before installing new
junliume Nov 12, 2023
58f8d80
fix hip tidy issues
junliume Nov 13, 2023
7eefad5
add a few missing ones
junliume Nov 13, 2023
ed1a732
adopt review opinion
junliume Nov 13, 2023
8053a0c
Update CMakeLists.txt
junliume Nov 14, 2023
f958b46
fix issues typo and hiprtc header
junliume Nov 14, 2023
08a1ca3
Merge branch 'develop' into bump_version_fin
junliume Nov 14, 2023
699cf08
Keep base docker at Ubuntu 20.04
junliume Nov 17, 2023
3e847da
Revert limitations on limits header
junliume Nov 17, 2023
72eb180
add fixes
umangyadav Dec 5, 2023
10e9581
changes that works for MIGraphX
umangyadav Dec 5, 2023
f498789
rever changes for checknumerics
umangyadav Dec 5, 2023
8bc23eb
Merge branch 'develop' into fp8_fix
umangyadav Dec 5, 2023
69c0d99
Formatting
umangyadav Dec 5, 2023
aa09f36
Merge branch 'fp8_fix' into bump_version_fin
junliume Dec 5, 2023
cf02a64
update docker file
junliume Dec 5, 2023
ebeef1d
avoid ldd conflicts
junliume Dec 6, 2023
ee34b45
update CK commit hash
junliume Dec 6, 2023
dab5d3b
Merge branch 'develop' into bump_version_fin
junliume Dec 12, 2023
c6b4352
update dockerfile
junliume Dec 15, 2023
b35b497
Merge branch 'develop' into bump_version_fin
junliume Dec 15, 2023
9990a5d
update dockerfile
junliume Dec 16, 2023
5b00521
Merge branch 'develop' into bump_version_fin
junliume Dec 16, 2023
0133956
workaround build issues of CK
junliume Dec 17, 2023
dc55bba
fix the real issue in compiling rocMLIR
junliume Dec 17, 2023
0d48eb3
bump CK commit hash
junliume Dec 17, 2023
19b1a95
WA for Issue 2600 and turn on smoke tests by default
junliume Dec 17, 2023
592d5cb
ROCm 6.0 replaces all __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__
junliume Dec 19, 2023
e74ed6d
Merge branch 'develop' into bump_version_fin
junliume Dec 19, 2023
f92cd4c
Update src/comgr.cpp
junliume Dec 19, 2023
b81d3a5
fix clang format issue
junliume Dec 19, 2023
c43de0c
Merge branch 'develop' into bump_version_fin
junliume Dec 19, 2023
b4d1236
Bump CK commit hash to d0f355a31a341b0a885ff65231781f332a20cc5f
junliume Dec 19, 2023
41c6172
Merge branch 'develop' into bump_version_fin
junliume Dec 20, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ if(NOT WIN32 AND NOT APPLE)
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -s")
endif()

rocm_setup_version(VERSION 3.00.0)
rocm_setup_version(VERSION 3.1.0)

list( APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake )
include(TargetFlags)
Expand Down Expand Up @@ -756,6 +756,19 @@ enable_cppcheck(
knownConditionTrueFalse
shadowFunction
moduloofone
###################################################################
# TODO Code Quality WORKAROUND ROCm 6.0 &&
# Ubuntu 22.04 && cppcheck 2.12.1 update
###################################################################
duplInheritedMember
constParameterCallback
constParameterReference
constParameterPointer
constVariableReference
constVariablePointer
useStlAlgorithm
uselessOverride
unusedScopedObject
FORCE
SOURCES
addkernels/
Expand Down
20 changes: 13 additions & 7 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,16 @@ DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
ENV APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=DontWarn
RUN curl -fsSL https://repo.radeon.com/rocm/rocm.gpg.key | gpg --dearmor -o /etc/apt/trusted.gpg.d/rocm-keyring.gpg

RUN wget https://repo.radeon.com/amdgpu-install/5.7.1/ubuntu/focal/amdgpu-install_5.7.50701-1_all.deb --no-check-certificate
RUN wget https://repo.radeon.com/amdgpu-install/.6.0/ubuntu/focal/amdgpu-install_6.0.60000-1_all.deb --no-check-certificate
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
./amdgpu-install_5.7.50701-1_all.deb
./amdgpu-install_6.0.60000-1_all.deb

# Add rocm repository
RUN export ROCM_APT_VER=5.7.1;\
RUN export ROCM_APT_VER=6.0;\
echo $ROCM_APT_VER &&\
sh -c 'echo deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/$ROCM_APT_VER/ubuntu focal main > /etc/apt/sources.list.d/amdgpu.list' &&\
sh -c 'echo deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/$ROCM_APT_VER focal main > /etc/apt/sources.list.d/rocm.list'
sh -c 'echo deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/amdgpu/.$ROCM_APT_VER/ubuntu focal main > /etc/apt/sources.list.d/amdgpu.list' &&\
sh -c 'echo deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/rocm-keyring.gpg] https://repo.radeon.com/rocm/apt/.apt_$ROCM_APT_VER focal main > /etc/apt/sources.list.d/rocm.list'
RUN sh -c "echo deb http://mirrors.kernel.org/ubuntu focal main universe | tee -a /etc/apt/sources.list"

RUN amdgpu-install -y --usecase=rocm --no-dkms
Expand Down Expand Up @@ -96,11 +96,17 @@ RUN tar zxvf /tmp/ccache.tar.gz -C /tmp/ && mkdir /tmp/ccache-${CCACHE_COMMIT}/b
cd /tmp/ccache-${CCACHE_COMMIT}/build && \
cmake -DZSTD_FROM_INTERNET=ON -DHIREDIS_FROM_INTERNET=ON .. && make -j install && rm -rf /tmp/*
RUN ccache -s

# purge existing composable kernel installed with ROCm
# hence cannot use autoremove since it will remove more components
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get purge -y --allow-unauthenticated \
composablekernel-dev
ARG COMPILER_LAUNCHER=""
RUN if [ "$USE_FIN" = "ON" ]; then \
rbuild prepare -s fin -d $PREFIX -DAMDGPU_TARGETS=${GPU_ARCH} -DCMAKE_CXX_COMPILER_LAUNCHER="${COMPILER_LAUNCHER}"; \
rbuild prepare -s fin -d $PREFIX -DGPU_TARGETS=${GPU_ARCH} -DCMAKE_CXX_COMPILER_LAUNCHER="${COMPILER_LAUNCHER}"; \
else \
rbuild prepare -s develop -d $PREFIX -DAMDGPU_TARGETS=${GPU_ARCH} -DCMAKE_CXX_COMPILER_LAUNCHER="${COMPILER_LAUNCHER}"; \
rbuild prepare -s develop -d $PREFIX -DGPU_TARGETS=${GPU_ARCH} -DCMAKE_CXX_COMPILER_LAUNCHER="${COMPILER_LAUNCHER}"; \
fi

RUN ccache -s
Expand Down
2 changes: 1 addition & 1 deletion dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ROCmSoftwarePlatform/rocm-recipes@d7b71f8ff71572833c8cf15b74279dd034e66f9d
-f requirements.txt
danmar/cppcheck@2.9
danmar/cppcheck@2.12.1
google/googletest@v1.14.0
2 changes: 1 addition & 1 deletion docs/DebugAndLogging.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ Direct Solutions:
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD11X11` - `ConvOclDirectFwd11x11`.
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_FWDGEN` - `ConvOclDirectFwdGen`.
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD` - `ConvOclDirectFwd`.
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD1X1` - `ConvOclDirectFwd`.
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_FWD1X1` - `ConvOclDirectFwd1x1`.
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_WRW2` - `ConvOclBwdWrW2<n>` (where n = `{1,2,4,8,16}`), and `ConvOclBwdWrW2NonTunable`.
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_WRW53` - `ConvOclBwdWrW53`.
* `MIOPEN_DEBUG_CONV_DIRECT_OCL_WRW1X1` - `ConvOclBwdWrW1x1`
Expand Down
2 changes: 1 addition & 1 deletion fin
Submodule fin updated from ae2ff1 to 044f5e
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
sqlite3@3.43.2 -DCMAKE_POSITION_INDEPENDENT_CODE=On
boost@1.83 -DCMAKE_POSITION_INDEPENDENT_CODE=On --build -DCMAKE_CXX_FLAGS=" -std=c++14 -Wno-enum-constexpr-conversion -Wno-deprecated-builtins -Wno-deprecated-declarations "
ROCmSoftwarePlatform/half@10abd99e7815f0ca5d892f58dd7d15a23b7cf92c --build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Q] Where we'll take half after this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about apt-get install half ? :)

Now if we use amdgpu-install -y --usecase=rocm as part of the dockerfile script, both half and composablekernel will be pre-installed. Then if we try to install it via building dependencies there will be errors.

Copy link
Contributor

@atamazov atamazov Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junliume I recommend keeping this, but commenting this out with # (and maybe putting some short explanation). This may be useful for clients who use previous version of ROCm which does not install half by default.

# ROCmSoftwarePlatform/half@10abd99e7815f0ca5d892f58dd7d15a23b7cf92c --build
ROCmSoftwarePlatform/rocMLIR@rocm-5.5.0 -H sha256:a5f62769d28a73e60bc8d61022820f050e97c977c8f6f6275488db31512e1f42 -DBUILD_FAT_LIBROCKCOMPILER=1 -DCMAKE_IGNORE_PATH=/opt/conda/envs/py_3.9 -DCMAKE_IGNORE_PREFIX_PATH=/opt/conda
nlohmann/json@v3.11.2 -DJSON_MultipleHeaders=ON -DJSON_BuildTests=Off
ROCmSoftwarePlatform/FunctionalPlus@v0.2.18-p0
Expand Down
2 changes: 1 addition & 1 deletion src/convolution.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -289,7 +289,7 @@ ConvolutionDescriptor::GetForwardOutputTensorWithLayout(const TensorDescriptor&
}
}

std::size_t out_c;
std::size_t out_c = 0;
std::vector<std::size_t> out_lens(spatial_dim + 2);

auto out_spatial = boost::adaptors::slice(out_lens, 2, 2 + spatial_dim);
Expand Down
2 changes: 1 addition & 1 deletion src/rnn_api.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -529,7 +529,7 @@ static void LogCmdRNN(const miopenTensorDescriptor_t* xDesc,
const int seqLength,
const RNNDir_t dir)
{
if(miopen::IsLoggingCmd())
if(miopen::IsLoggingCmd() && seqLength > 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Notice] line 549 miopen::deref(xDesc[seqLength - 1]) is UB when seqLength == 0. Shall we MIOPEN_THROW in the callers (miopenRNNForwardTraining() etc)?

/cc @shurale-nkn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is internal function for API logging. If user passed seqLength == 0 to API this is UB at any RNN_API function because this is empty zero sized tensor.

{
std::string mode;
miopenRNNMode_t rnnMode = miopen::deref(rnnDesc).rnnMode;
Expand Down
2 changes: 1 addition & 1 deletion test/na_train.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -802,7 +802,7 @@ struct na_fusion_driver : test_driver

std::size_t input_n, input_c, input_h, input_w;
std::tie(input_n, input_c, input_h, input_w) = miopen::tien<4>(input.desc.GetLengths());
this->tolerance = 80 * float(input.desc.GetElementSize());
this->tolerance = 80 * double(input.desc.GetElementSize());
ptr_activdesc = GetManagedActivDesc();
miopenSetActivationDescriptor(ptr_activdesc.get(), activ_mode, alpha, beta, gamma);
auto&& handle = get_handle();
Expand Down