-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a build test CI for the HIP Examples #147
Conversation
809217a
to
566395e
Compare
5532f2b
to
3e0ea75
Compare
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
3e0ea75
to
05df594
Compare
Thanks for adding this! Perhaps we can have one workflow for each relevant folder (Applications, HIP-Basic and Libraries), that works fine for me. |
Thanks @Beanavil Signed-off-by: David Galiffi <David.Galiffi@amd.com>
This package shouldn't be required for these workflows. Signed-off-by: David Galiffi <David.Galiffi@amd.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this requires quite a lot of work.
wget https://repo.radeon.com/amdgpu-install/${{ env.ROCM_VERSION }}/ubuntu/jammy/amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb | ||
apt-get -y install ./amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb && | ||
apt-get update -qq && | ||
apt-get install -y \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I missing something? The point of the installer that you are getting in line 38 is to install rROCm, so why are you guessing what parts needs to be installed manually here?
apt-get install -y build-essential g++ glslang-tools \ | ||
python3 python3-pip libglfw3-dev libvulkan-dev locales wget | ||
python3 -m pip install --upgrade pip | ||
python3 -m pip install cmake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things like cmake can be installed in various ways, in particular via the package manager and via pip. You should be prepared to handle errors if you try and install one way when the other way has previously been used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each run starts with a fresh docker image, so this shouldn't be an issue. You are correct, I could have used apt-get for cmake too. I think I chose pip because I'll generally get a newer version.
run: | | ||
cd Applications && mkdir build && cd build | ||
cmake -DGPU_ARCHITECTURES=all -S .. | ||
cmake --build . -j 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 4? Why not 56 if you have a machine with 56 cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that the job uses ubuntu I think it can be done like so:
cmake --build . -j 4 | |
cmake --build . -j `nproc` |
jobs: | ||
build: | ||
name: "Build HIP Examples" | ||
runs-on: ubuntu-latest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without me looking, I wonder how "ubuntu-latest" interacts with "ubuntu:22.04" a few lines further down. I think this needs a explanation. Perhaps a README file?
apt-get install -y build-essential g++ glslang-tools \ | ||
python3 python3-pip libglfw3-dev libvulkan-dev locales wget | ||
python3 -m pip install --upgrade pip | ||
python3 -m pip install cmake |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment on where you get cmake from.
apt-get -y install ./amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb && | ||
apt-get update -qq && | ||
apt-get -y install rocm-dev rocm-llvm-dev | ||
echo "/opt/rocm/bin" >> $GITHUB_PATH |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/opt/rocm/bin might not exist, particularly if you are bypassing the installer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we're still using apt install rocm-dev
. This will create it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/opt/rocm is managed by update-alternatives. This is needed for the use case of wanting to have more than one version of ROCm installed at the same time on the same machine - think very large multi-user machines. /opt/rocm therefor may or may not point to the current version of ROCm that you are wanting to test.
If you are going to fall back on the "We are starting this in a fresh docker container so it will all be OK", then you need to spell out this assumption and that this code is of no value when testing the multi-version rocm use case. Otherwise it just looks like the code is wrong because you have missed this case.
apt-get update -qq && | ||
apt-get -y install rocm-dev rocm-llvm-dev | ||
echo "/opt/rocm/bin" >> $GITHUB_PATH | ||
echo "ROCM_PATH=/opt/rocm" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be needed. If something fails when it is missing then it is a bug and should be reported. Consider this another test :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I give this a try.
apt-get -y install rocm-dev rocm-llvm-dev | ||
echo "/opt/rocm/bin" >> $GITHUB_PATH | ||
echo "ROCM_PATH=/opt/rocm" >> $GITHUB_ENV | ||
echo "LD_LIBRARY_PATH=/opt/rocm/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong in many ways. It should not be needed, ROCm should be able to find its own libraries, /opt/rocm may not exist or it can point to the wrong place. If there is no existing LD_LIBRARY_PATH then you stick a spurious ":" on the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I give this a try.
echo "/opt/rocm/bin" >> $GITHUB_PATH | ||
echo "ROCM_PATH=/opt/rocm" >> $GITHUB_ENV | ||
echo "LD_LIBRARY_PATH=/opt/rocm/lib:${LD_LIBRARY_PATH}" >> $GITHUB_ENV | ||
apt-get autoclean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scripts should not be doing autoclean. The administrator of the machine may be keeping some package that was originally automatically installed but is no longer strictly required around. You have no right to delete that package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This workflow is running in a docker image on a basic GitHub runner. This was done to save space. We were having issues installing all of rocm on these basic runners, due to lack of disk space.
wget https://repo.radeon.com/amdgpu-install/${{ env.ROCM_VERSION }}/ubuntu/jammy/amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb | ||
apt-get -y install ./amdgpu-install_${{ env.AMDGPU_INSTALLER_VERSION }}_all.deb && | ||
apt-get update -qq && | ||
apt-get install -y \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Installer the installer and then ignore it????
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The installer is being used to setup the repo files.
This CIs are just running on the basic GitHub runners and I found that they were running out of disk space when trying to install all of rocm. Instead, we've had to take a more piecemeal approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the runners then! Are they self hosted runners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, not. It is one of the basic runners.
The scope of this change is limited to the HIP-Basic folder.
There was not enough disk space on the basic GitHub Runners to install the full ROCm stack and run the Library examples.