-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add conda-forge CI #38
Conversation
I opened this PR to reprodude the test failures in conda-forge/staged-recipes#21894 (comment), but unfortunatly I can't, I need to thing more about this. |
Hi @traversaro! I don't understand either. |
I was able to reproduce the problem by using the exact same docker image used in conda-forge to build packages. That Docker Image is based on Centos 7, that is rather old, so I guess this could be a Kernel or glibc problem that somehow is interacting with the pytorch/jax/adam stack. |
Wow. Thanks a lot @traversaro for the investigation! What can be done? |
I am trying to understand a bit more. I added a few tests that helped to understand the problem:
For all tests, the Linux version is 5.15.0-1037-azure .
Actually the kernel is always the same in Docker (the one of the host) so the main thing we can check is the glibc version. |
Thanks to Fedora docker images (the Ubuntu one for distros like 16.40 or 17.04 are not working), it is possible to bisect the glibc versions, and it seems that the problem appear for glibc <= 2.25 and disappears for glibc >= 2.26 . |
Given that we debugged the problem with glibc <= 2.25, I cleaned up the PR, and re-enabled Windows and macOS tests. I think once CI is happy we can merge, just rember to Squash and merge, as otherwise there are 26 useless commit in the history. |
macOS CI is again angry about gravity tests, something is fishy. |
Bingo: the problem was much simpler. The test were using some not initialized memory, as the Twist were initialized as:
and used if their value was zero, that sometime it was not. The fix is simply to use instead:
|
I ported the fix obtained after the debug in #40, let's close this PR, if necessary I will propose a conda-forge-based CI in a different PR. |
No description provided.