-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building on Void Linux #117
Comments
Hmm, newer seen that error myself. One thing to try quickly is to add the "-fopenmp" to CFLAGS and LDFLAGS. Another possibility is that the openmp libraries are not installed in your Linux distro? Do you have for example |
This is an automake usage fail; autoconf detects that OpenMP is not installed but the project fails to do anything with it. Remove your ucx builddir, issue Building this revealed another issue that breaks the build on my machine; issued openucx/ucx#10041 to fix that. |
Well, it's more than just deps, in fact. Void makes the bold choice of symlinking Of course a simpler alternative is to tell everyone with a distro that doesn't symlink |
I may have couple of bashism also, is the "declare -A DICTIONARY_PATCHED_PROJECTS" thing from babs.sh working ok with the dash? |
While you are working with the void linux dependencies, here is one problem to think. But as there are now the binfo/extra apps folder for addons, it would be nice to have some dependency handling support also for them. (I consider those apps like a one with little less support and not fully quarantee that they will support on everyblase). For example binfo/extra/corectrl has quite a many dependencies and so-far I have only but the dependencies I found on mageia and fedora to comments of that binfo file and expect that the user will handle the install of those manually before building the app. In addition for example on Mageia 9, I needed to backport one app by myself to newer version to be able to build it. Not sure should I print some advice for that or just let the builder to figure it out, |
Your own bashisms are always fine, because... you're using bash. :P If the script uses Note that bash is installed on Void Linux and is even the default shell, it just chooses not to use it for |
For the extra deps, there are basically two choices:
Note that for rolling distros like Arch and Manjaro, if there is no dependency on ROCm and there is a package for it it's basically not worth building from scratch ( |
I was about to come back to this thread to mention this. I did follow your advice @jeroen-mostert and Also yeah its best to rename this thread at this point. |
For now the build fails on rocBLAS_Tensile/rocBLAS; the latter complains that it can't find Tensile ( For now this is the list of dependencies I've recorded (which is not complete, but may allow others to proceed on a build):
Note that although |
I see, Here's the list of packages I installed. I took arch linux's list (in hindsight, not the best idea) and would replace dependencies that had different/wrong names. Whether or not this is an extensive list of packages remains to be seen.
As both Ubuntu and arch call for this dependency in some way, I also added this line for Void (void has no official package). |
There's a lot of fluff in the list; for example all |
Also the rocBLAS build continues after installing Note that some deps we still have left in There's still something weird going on where rocBLAS builds Tensile (and succeeds) but then hipBLASLt and hipSPARSELt also try to build it, and fail, giving a very strange error involving gcc headers ("non-virtual member function marked 'override' hides virtual member function"). I dread having to do a full rebuild, maybe @phriv can catch up before I have to retry. If not, I'll get there when I get there, it's been a long day's work. :P |
Hipblaslt and hipsparselt use different version of Tensile (something called tensilelite) and these two projects are more for MI-level of cards at the moment. |
Ah, so it's actually different code bases. That explains the seemingly redundant builds, though not why they fail (the errors have nothing to do with the gfx arches but are some include snag from pulling in Personally I would be fine pretending these cards don't exist and not build these projects either (let AMD have their proprietary builds for that) but I understand it would be nicer to have them anyway, as long as they want to work. :P |
Well, it took a while, but I found it. There are two CMakeLists that specify |
I have a successful build, with the caveat that the latest DeepSpeed doesn't build for my gfx1030 -- it chokes on a |
Alright, after a day of non-stop compiling (rip my poor 7700k), I have a working SDK. All in-suite tests passed. There are two minor issues (probably related to whatever weird stuff void does to shell routing). Invoking The only other (possible) issue is I get when I run the pytorch test. The following is dumped to terminal before passing the test: |
What shell are you using? The PyTorch warning is expected; you will get this no matter what card you use (I have an actual gfx1030). it doesn't inhibit acceleration and is due to the way PyTorch loads this stuff at runtime. It does look scary so it would be better if it didn't do that; I don't know if there's a way to circumvent it. Maybe there's one specific binary we shouldn't be compiling with hipcc (or maybe it's unavoidable because comgr checks Python itself -- we might get cheeky and build our Python with hipcc. :P) |
Alright I have an update to issue. It is not a shell issue whatsoever. It's a terminal issue. I usually use tilix, which will close upon sourcing the file, however sourcing the file using xterm and xfce-terminal causes no issue. I'm looking into this. I know you have to export vars to get some of tilix's functionality to work on xfce, this may be related to that. Regardless, the terminal closing looks like a local issue and not some sort of distro issue. |
The file does have a guard at the top which exits the script if you're not sourcing it but attempting to directly execute it, which uses a bash-specific way of checking things (but it does have a shebang at the top that explicitly invokes bash, so that's OK). This should never cause the outer terminal to exit, because when you're sourcing it that stuff should be skipped, but it's the only thing I can think of that could force a premature exit -- maybe tilix is performing some sort of "optimization" where it fails to put bash in the right mood. The rest is just plain |
- Add the necessary dependencies - Include necessary patches for systems where /bin/sh is not bash (including but not limited to Void Linux) - Fix an incompatibility in hipBLASLt between Clang 17 and GCC 13 - Take latest version of ucc (latest release is compatible with ROCm 6.x) - Remove building the "native" architecture from ucc -- this doesn't work in headless setups. The default list should be complete. Signed-off-by: Jeroen Mostert <jeroen.mostert@cm.com>
Update: It will be fixed with openucx/ucx#10043. |
Thanks for the update. Once openucx/ucx#10043 is applied in upstream we will jump to that version on rocm sdk builder. |
- Add the necessary dependencies - Include necessary patches for systems where /bin/sh is not bash (including but not limited to Void Linux) - Fix an incompatibility in hipBLASLt between Clang 17 and GCC 13 - Take latest version of ucc (latest release is compatible with ROCm 6.x) - Remove building the "native" architecture from ucc -- this doesn't work in headless setups. The default list should be complete. Signed-off-by: Jeroen Mostert <jeroen.mostert@cm.com>
I'm thinking we can close this one? The necessary patches have not all been integrated into upstream repos yet, but that doesn't prevent us from building (and will likely take a very long time for inactive projects like |
What about the ucx patch? I think it's now merged? If you can verify that the upstream version now builds. we could jump to that version. |
The thing with that is that we're currently checking out v1.17.0 explicitly. The necessary change was merged (openucx/ucx@e9fac97), but there are many other changes between 1.17.0 and that commit. So it's not just verifying that it builds, but also that it works, and going to an unreleased commit just for this change looks like a bad risk/reward ratio to me. It makes more sense to me to wait for v1.18.0 and remove the patch when we switch to that. |
ah, I mixed to ucc from which we are at the moment picking the latest version. You are right that better to wait 1.18.0. I will close this now as you suggested. (I have not been able to test void linux by myself) |
I am currently getting a patch for Void Linux up and running and haven't had any issues until I had to compile UCX.
So far I'm getting the following errors:
/media/LI/rocm_sdk_builder/src_projects/ucx/src/tools/perf/perftest.c:222: error: ignoring '#pragma omp barrier' [-Werror=unknown-pragmas] 222 | #pragma omp barrier
/media/LI/rocm_sdk_builder/src_projects/ucx/src/tools/perf/perftest.c:224: error: ignoring '#pragma omp master' [-Werror=unknown-pragmas] 224 | #pragma omp master
/media/LI/rocm_sdk_builder/src_projects/ucx/src/tools/perf/perftest.c:242: error: ignoring '#pragma omp barrier' [-Werror=unknown-pragmas] 242 | #pragma omp barrier
Searching through stackoverflow suggests something about pass '-fopenmp' to gcc but I'll admit I don't know automake/autoconf well enough to figure out which args I need to edit in order to do so.
The text was updated successfully, but these errors were encountered: