Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: No warning/check when resolution is not a power of 2 in 1D with auto-decomposition #299

Closed
nscepi opened this issue Dec 12, 2024 · 6 comments
Assignees
Labels
bug Something isn't working question Further information is requested

Comments

@nscepi
Copy link

nscepi commented Dec 12, 2024

Describe the issue:

There is no warning/check when the resolution is not a power of 2 in 1D problem and when using auto-decomposition, see here https://github.com/idefix-code/idefix/blob/d0e82202ca90d0b664c71582962c2ae255bc6528/src/grid.cpp#L166C7-L167C32. Apparently, there used to be a warning/check but it was changed in the commit dbe3405. @neutrinoceros do you remember why?

Error message:

No response

runtime information:

I am using the version d0e8220 of Idefix.

@nscepi nscepi added the bug Something isn't working label Dec 12, 2024
@neutrinoceros
Copy link
Collaborator

neutrinoceros commented Dec 12, 2024

If I recall correctly, there is no ambiguity because resources do not need to be spread across multiple dimensions, so automatic decomposition is perfectly achievable in 1D regarless of box size, so I made sure it worked (or so I hope) and removed the warning. Is it causing you any trouble in production ?

(since I don't think there's a bug for now, I'm going to re-triage this as a question, but feel free to overwrite my judgment if there's an actual problem here)

@neutrinoceros neutrinoceros added question Further information is requested and removed bug Something isn't working labels Dec 12, 2024
@nscepi
Copy link
Author

nscepi commented Dec 12, 2024

I think this is a bug! If you take, for example, the sod HD test problem and run it with a grid of 500 you get exactly the same result whether or not you use MPI, as you would expect. If you now take a grid of 501 you get an error of 2.5e-4 on the density (measured using the L2 norm as in idfx_test.py) between the serial and MPI run... When plotting the two solutions I see that the mismatch between serial and mpi is coming from the last cell of the (total) domain. The issue appears with other test problems as well.

I haven't got into the details of the MPI implementation so I don't know if IDEFIX is supposed to be able to handle subdomains of different sizes or not. If it's not, there should be a flag in 1D as well. If it is, there is something wrong.

@neutrinoceros
Copy link
Collaborator

Can you provide more detail on how to reproduce ? If I understand correctly, you are implying that you are using automatic domain decomposition in all your tests, but I'm not sure that's what you mean. Did you try the same test with manual domain decomposition too ?

@glesur
Copy link
Contributor

glesur commented Dec 12, 2024

I think the issue is that Idefix is unable to generate correct unbalanced domain decomposition, even in 1d. It not a problem of knowing how to balance the number of processes in each direction, it's how to decompose a grid that can't be divided by the number of processes.

It doesn't mean that your commit @neutrinoceros need to be reverted, it's just that we need to implement the possibility to have unbalanced domain decomposition (this takes place in the datablock constructor)

@glesur glesur added the bug Something isn't working label Dec 13, 2024
@nscepi
Copy link
Author

nscepi commented Dec 13, 2024

After talking with @neutrinoceros I will clarify my previous posts.

There are two separate but related issues:

  1. Idefix cannot handle for now unbalanced domain decomposition as @glesur pointed out.
  2. As a result, there should be an error popping up when the resolution of the grid is not a multiple of the number of mpi processes. This condition is checked for manual decomposition but not for automatic decompostion. I suggest to do it for both. I will make a pull request with a potential fix.

Note that 2) will have no reason to be once 1) is fixed.

@glesur
Copy link
Contributor

glesur commented Dec 16, 2024

Issue is fixed by #301

@glesur glesur closed this as completed Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants