Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Use of constant in tests/slate/test_hdg_poisson.py::test_hdg_convergence causes errors #3802

Closed
JDBetteridge opened this issue Oct 16, 2024 · 5 comments · Fixed by #3808
Assignees
Labels

Comments

@JDBetteridge
Copy link
Member

Describe the bug
There is something strange happening in the tests/slate/test_hdg_poisson.py::test_hdg_convergence test if run in parallel, but not in the "default pytest order". This is seen in the test suite cleanup PR #3385 when using pytest-split:

https://github.com/firedrakeproject/firedrake/actions/runs/11355091421/job/31583739399?pr=3385

This error goes away if a function in the real space is used

Steps to Reproduce
Steps to reproduce the behavior:
This is quite difficult you have to run

pytest \
    --splitting-algorithm least_duration \
    --splits 4 \
    --group ??? \
    --timeout=1800 \
    --timeout-method=thread \
    -o faulthandler_timeout=1860 \
    -m "parallel[3] and not broken" \
    -v tests

With ??? replaced with 1 to 4 and one of the runs should fail. Reproducing the error locally has proven tricky though.

Expected behavior
This test should pass with a Constant it seems something in some part of the test suite is throwing off the numbering 😕

Error message
https://github.com/firedrakeproject/firedrake/actions/runs/11355091421/job/31583739399?pr=3385

Environment:

  • CI

Additional Info
Add any other context about the problem here.

@connorjward
Copy link
Contributor

Can you share the current "fix" for this so it's clear where I need to look?

@wence-
Copy link
Contributor

wence- commented Oct 16, 2024

This line

tau = Constant(1)

wants to be a function in Real, I suspect.

My guess would be Constants are not renumbered correctly by UFL and one process gets out of sync?

@JDBetteridge
Copy link
Member Author

The fix was in this commit: 911b0ad

@connorjward
Copy link
Contributor

This line

tau = Constant(1)

wants to be a function in Real, I suspect.

My guess would be Constants are not renumbered correctly by UFL and one process gets out of sync?

I think it's very likely to be an issue with out-of-sync ranks. We also do some renumbering tricks in PyOP2 so the problem isn't necessarily from UFL. I will investigate.

@connorjward
Copy link
Contributor

Fixed in #3808. We were hardcoding constant names in SLATE (which I wrote some time ago).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants