-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement by using private declaration for gfortran #2850
Comments
As discussed in the telco: indeed, the symbol is set to be private, and that works. But: the symbol table is never used to actually create code. In gocean 1p0, around line 120:
At this stage, |
It still uses f2pygen, with #2834 we switch to the psyir fortran backend which should add the private attribute from the symbol table. |
The following patch fixes the issue, but it entirely changes how the source code is created: atm most of the source code first lowered, written by a While this patch looks small, internally it changes a lot: it just lowers the whole container and converts it to a string using a FortranWriter. This is then returned (and the generator layer has some tests to see if it got a fparser tree (which needs This all seems to work fine ... except of course that it will break pretty much any gocean test we have, since the FortranWriter output is different from the fparser FWIW, here the patch to fix the output behaviour:
|
Thanks. I have added the patch here to this ticket. There surprisingly doesn't seem to be any clash with #2834. Great job with #2834 btw ❤️ |
During our training, one of my team members added a private declaration for all module-inlined kernel, and got a significant speedup. I verified this, so we have now this confirmed for gfortran 8.4, 11.4, and 14-something. Timing (of my game-of-life test in the training):
Looking at the assembly output indicates that without the private declaration only two of four kernels are inlined. Without private:
So there are still two calls left. Adding the private directive (details below):
Test case: https://github.com/stfc/PSyclone/tree/1623_add_training/tutorial/training/gocean/2.6-GameOfLife-fuse/solution
(just modify the Makefile to use inline.py instead of fuse_loops.py).
Then manually add:
When also using fuse, it gets even worse: fusing the first three loops (the fourth one can't) with inlining results in a runtime of 30 seconds. Adding the above private declarations brings down the runtime to 7.5 seconds.
The text was updated successfully, but these errors were encountered: