Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add conda-forge CI #38

Closed
wants to merge 32 commits into from
Closed

Add conda-forge CI #38

wants to merge 32 commits into from

Conversation

traversaro
Copy link
Contributor

No description provided.

@traversaro
Copy link
Contributor Author

I opened this PR to reprodude the test failures in conda-forge/staged-recipes#21894 (comment), but unfortunatly I can't, I need to thing more about this.

@Giulero
Copy link
Collaborator

Giulero commented May 5, 2023

Hi @traversaro! I don't understand either. gravity_force uses the rnea. I would aspect at least also bias_force and coriolis_force to fail at least.

@traversaro
Copy link
Contributor Author

I was able to reproduce the problem by using the exact same docker image used in conda-forge to build packages. That Docker Image is based on Centos 7, that is rather old, so I guess this could be a Kernel or glibc problem that somehow is interacting with the pytorch/jax/adam stack.

@Giulero
Copy link
Collaborator

Giulero commented May 13, 2023

Wow. Thanks a lot @traversaro for the investigation! What can be done?

@traversaro
Copy link
Contributor Author

traversaro commented May 13, 2023

Wow. Thanks a lot @traversaro for the investigation! What can be done?

I am trying to understand a bit more. I added a few tests that helped to understand the problem:

OS glib version Working?
CentOS 7 2.17
Fedora 25 2.24
Fedora 26 2.25
Fedora 27 2.26
Fedora 28 2.27
Fedora 29 2.28
Ubuntu 16.04 2.23
Ubuntu 18.04 2.27
Ubuntu 20.04 2.31
Ubuntu 22.04 2.35

For all tests, the Linux version is 5.15.0-1037-azure .

I guess this could be a Kernel or glibc problem that somehow is interacting with the pytorch/jax/adam stack.

Actually the kernel is always the same in Docker (the one of the host) so the main thing we can check is the glibc version.

@traversaro
Copy link
Contributor Author

Thanks to Fedora docker images (the Ubuntu one for distros like 16.40 or 17.04 are not working), it is possible to bisect the glibc versions, and it seems that the problem appear for glibc <= 2.25 and disappears for glibc >= 2.26 .

@traversaro
Copy link
Contributor Author

Given that we debugged the problem with glibc <= 2.25, I cleaned up the PR, and re-enabled Windows and macOS tests. I think once CI is happy we can merge, just rember to Squash and merge, as otherwise there are 26 useless commit in the history.

@traversaro
Copy link
Contributor Author

Given that we debugged the problem with glibc <= 2.25, I cleaned up the PR, and re-enabled Windows and macOS tests. I think once CI is happy we can merge, just rember to Squash and merge, as otherwise there are 26 useless commit in the history.

macOS CI is again angry about gravity tests, something is fishy.

@traversaro
Copy link
Contributor Author

Bingo: the problem was much simpler. The test were using some not initialized memory, as the Twist were initialized as:

vb = idyntree.Twist()

and used if their value was zero, that sometime it was not. The fix is simply to use instead:

vb = idyntree.Twist.Zero()

@traversaro
Copy link
Contributor Author

I ported the fix obtained after the debug in #40, let's close this PR, if necessary I will propose a conda-forge-based CI in a different PR.

@traversaro traversaro closed this May 14, 2023
@traversaro traversaro deleted the add-conda-forge-ci branch May 15, 2023 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants