Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a build test CI for the HIP Examples #147

Merged
merged 3 commits into from
Aug 14, 2024

Conversation

dgaliffiAMD
Copy link
Collaborator

@dgaliffiAMD dgaliffiAMD commented Aug 2, 2024

The scope of this change is limited to the HIP-Basic folder.
There was not enough disk space on the basic GitHub Runners to install the full ROCm stack and run the Library examples.

@dgaliffiAMD dgaliffiAMD force-pushed the dgaliffi/build-workflow branch 8 times, most recently from 809217a to 566395e Compare August 2, 2024 23:37
@dgaliffiAMD dgaliffiAMD marked this pull request as draft August 6, 2024 18:47
@dgaliffiAMD dgaliffiAMD force-pushed the dgaliffi/build-workflow branch 11 times, most recently from 5532f2b to 3e0ea75 Compare August 7, 2024 13:18
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
@dgaliffiAMD dgaliffiAMD changed the title Build Workflow Create a build test CI for the HIP Examples Aug 7, 2024
@dgaliffiAMD dgaliffiAMD marked this pull request as ready for review August 7, 2024 16:58
@Beanavil
Copy link
Collaborator

Beanavil commented Aug 9, 2024

Thanks for adding this! Perhaps we can have one workflow for each relevant folder (Applications, HIP-Basic and Libraries), that works fine for me.

@dgaliffiAMD
Copy link
Collaborator Author

Thanks for adding this! Perhaps we can have one workflow for each relevant folder (Applications, HIP-Basic and Libraries), that works fine for me.

That's amazing. Thanks for the suggestion, @Beanavil

Thanks @Beanavil

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
This package shouldn't be required for these workflows.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
@dgaliffiAMD dgaliffiAMD merged commit 80958ca into develop Aug 14, 2024
8 checks passed
@dgaliffiAMD dgaliffiAMD deleted the dgaliffi/build-workflow branch August 14, 2024 13:16
Copy link

@amd-isparry amd-isparry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this requires quite a lot of work.

wget https://repo.radeon.com/amdgpu-install/${{ env.ROCM_VERSION }}/ubuntu/jammy/amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb
apt-get -y install ./amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb &&
apt-get update -qq &&
apt-get install -y \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I missing something? The point of the installer that you are getting in line 38 is to install rROCm, so why are you guessing what parts needs to be installed manually here?

apt-get install -y build-essential g++ glslang-tools \
python3 python3-pip libglfw3-dev libvulkan-dev locales wget
python3 -m pip install --upgrade pip
python3 -m pip install cmake

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things like cmake can be installed in various ways, in particular via the package manager and via pip. You should be prepared to handle errors if you try and install one way when the other way has previously been used.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each run starts with a fresh docker image, so this shouldn't be an issue. You are correct, I could have used apt-get for cmake too. I think I chose pip because I'll generally get a newer version.

run: |
cd Applications && mkdir build && cd build
cmake -DGPU_ARCHITECTURES=all -S ..
cmake --build . -j 4

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4? Why not 56 if you have a machine with 56 cores?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the job uses ubuntu I think it can be done like so:

Suggested change
cmake --build . -j 4
cmake --build . -j `nproc`

jobs:
build:
name: "Build HIP Examples"
runs-on: ubuntu-latest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without me looking, I wonder how "ubuntu-latest" interacts with "ubuntu:22.04" a few lines further down. I think this needs a explanation. Perhaps a README file?

apt-get install -y build-essential g++ glslang-tools \
python3 python3-pip libglfw3-dev libvulkan-dev locales wget
python3 -m pip install --upgrade pip
python3 -m pip install cmake

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See previous comment on where you get cmake from.

apt-get -y install ./amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb &&
apt-get update -qq &&
apt-get -y install rocm-dev rocm-llvm-dev
echo "/opt/rocm/bin" >> $GITHUB_PATH

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/opt/rocm/bin might not exist, particularly if you are bypassing the installer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we're still using apt install rocm-dev. This will create it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/opt/rocm is managed by update-alternatives. This is needed for the use case of wanting to have more than one version of ROCm installed at the same time on the same machine - think very large multi-user machines. /opt/rocm therefor may or may not point to the current version of ROCm that you are wanting to test.

If you are going to fall back on the "We are starting this in a fresh docker container so it will all be OK", then you need to spell out this assumption and that this code is of no value when testing the multi-version rocm use case. Otherwise it just looks like the code is wrong because you have missed this case.

apt-get update -qq &&
apt-get -y install rocm-dev rocm-llvm-dev
echo "/opt/rocm/bin" >> $GITHUB_PATH
echo "ROCM_PATH=/opt/rocm" >> $GITHUB_ENV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed. If something fails when it is missing then it is a bug and should be reported. Consider this another test :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I give this a try.

apt-get -y install rocm-dev rocm-llvm-dev
echo "/opt/rocm/bin" >> $GITHUB_PATH
echo "ROCM_PATH=/opt/rocm" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=/opt/rocm/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong in many ways. It should not be needed, ROCm should be able to find its own libraries, /opt/rocm may not exist or it can point to the wrong place. If there is no existing LD_LIBRARY_PATH then you stick a spurious ":" on the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I give this a try.

echo "/opt/rocm/bin" >> $GITHUB_PATH
echo "ROCM_PATH=/opt/rocm" >> $GITHUB_ENV
echo "LD_LIBRARY_PATH=/opt/rocm/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV
apt-get autoclean

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scripts should not be doing autoclean. The administrator of the machine may be keeping some package that was originally automatically installed but is no longer strictly required around. You have no right to delete that package.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow is running in a docker image on a basic GitHub runner. This was done to save space. We were having issues installing all of rocm on these basic runners, due to lack of disk space.

wget https://repo.radeon.com/amdgpu-install/${{ env.ROCM_VERSION }}/ubuntu/jammy/amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb
apt-get -y install ./amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb &&
apt-get update -qq &&
apt-get install -y \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Installer the installer and then ignore it????

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer is being used to setup the repo files.
This CIs are just running on the basic GitHub runners and I found that they were running out of disk space when trying to install all of rocm. Instead, we've had to take a more piecemeal approach.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the runners then! Are they self hosted runners?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, not. It is one of the basic runners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants