Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement by using private declaration for gfortran #2850

Open
hiker opened this issue Jan 15, 2025 · 4 comments
Open

Performance improvement by using private declaration for gfortran #2850

hiker opened this issue Jan 15, 2025 · 4 comments

Comments

@hiker
Copy link
Collaborator

hiker commented Jan 15, 2025

During our training, one of my team members added a private declaration for all module-inlined kernel, and got a significant speedup. I verified this, so we have now this confirmed for gfortran 8.4, 11.4, and 14-something. Timing (of my game-of-life test in the training):

No inlining (-O3):                                      38.933 seconds
(module) inlining:                                      27.603 seconds
(module) inlining + private declaration of all kernels: 10.159 seconds

Looking at the assembly output indicates that without the private declaration only two of four kernels are inlined. Without private:

~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ gfortran -S  -c -O3 -I/home/joerg/work/psyclone/tutorial/training/gocean/gol-lib -I/home/joerg/work/psyclone/external/dl_esm_inf/finite_difference/src time_step_alg_mod_psy.f90
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ grep call time_step_alg_mod_psy.s 
	call	__psy_time_step_alg_mod_MOD_compute_die_code
	call	__psy_time_step_alg_mod_MOD_count_neighbours_code
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ 

So there are still two calls left. Adding the private directive (details below):

~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ gfortran -S  -c -O3 -I/home/joerg/work/psyclone/tutorial/training/gocean/gol-lib -I/home/joerg/work/psyclone/external/dl_esm_inf/finite_difference/src time_step_alg_mod_psy.f90
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ grep call time_step_alg_mod_psy.s 
~/work/psyclone/tutorial/training/gocean/2.6-GameOfLife-fuse/solution$ 

Test case: https://github.com/stfc/PSyclone/tree/1623_add_training/tutorial/training/gocean/2.6-GameOfLife-fuse/solution
(just modify the Makefile to use inline.py instead of fuse_loops.py).
Then manually add:

    private:: count_neighbours_code, compute_born_code, compute_die_code, combine_code

When also using fuse, it gets even worse: fusing the first three loops (the fourth one can't) with inlining results in a runtime of 30 seconds. Adding the above private declarations brings down the runtime to 7.5 seconds.

@hiker
Copy link
Collaborator Author

hiker commented Jan 23, 2025

As discussed in the telco: indeed, the symbol is set to be private, and that works. But: the symbol table is never used to actually create code. In gocean 1p0, around line 120:

        # create an empty PSy layer module
        psy_module = ModuleGen(self.name)
        # include the kind_params module
        psy_module.add(UseGen(psy_module, name="kind_params_mod"))
        # include the field_mod module
        psy_module.add(UseGen(psy_module, name="field_mod"))
        self.invokes.gen_code(psy_module)
        return psy_module.root

At this stage, self.container.symbol_table contains a RoutineSymbol (set to private), but it is just not used. So the private declaration of the module-inlined routine symbol is missing.

@sergisiso
Copy link
Collaborator

It still uses f2pygen, with #2834 we switch to the psyir fortran backend which should add the private attribute from the symbol table.

@hiker
Copy link
Collaborator Author

hiker commented Jan 23, 2025

The following patch fixes the issue, but it entirely changes how the source code is created: atm most of the source code first lowered, written by a FortranWriter(), which is then parsed by fparser. The rest of the source code (top level module info) is also created using fparser UseGen etc calls. Then this combined fparser tree is converted to a string.

While this patch looks small, internally it changes a lot: it just lowers the whole container and converts it to a string using a FortranWriter. This is then returned (and the generator layer has some tests to see if it got a fparser tree (which needs str() to convert to a string), or if it already has a string (e.g. for line length processing).

This all seems to work fine ... except of course that it will break pretty much any gocean test we have, since the FortranWriter output is different from the fparser str() output (e.g. keywords lower case vs upper case, different use of spaces, ...)

FWIW, here the patch to fix the output behaviour:

one/generator.py b/src/psyclone/generator.py
index f4e3583fc..3643d2d8a 100644
--- a/src/psyclone/generator.py
+++ b/src/psyclone/generator.py
@@ -605,10 +605,16 @@ def main(arguments):
             # Limit the line length of the output Fortran to ensure it conforms
             # to the 132 characters mandated by the standard.
             fll = FortLineLength()
-            psy_str = fll.process(str(psy))
+            if isinstance(psy, str):
+                psy_str = fll.process(psy)
+            else:
+                psy_str = fll.process(str(psy))
             alg_str = fll.process(str(alg))
         else:
-            psy_str = str(psy)
+            if isinstance(psy, str):
+                psy_str = psy
+            else:
+                psy_str = str(psy)
             alg_str = str(alg)
         if args.oalg is not None:
             with open(args.oalg, mode='w', encoding="utf8") as alg_file:
diff --git a/src/psyclone/gocean1p0.py b/src/psyclone/gocean1p0.py
index 483be1653..63de23a6c 100644
--- a/src/psyclone/gocean1p0.py
+++ b/src/psyclone/gocean1p0.py
@@ -115,14 +115,9 @@ class GOPSy(PSy):
         :rtype: ast
 
         '''
-        # create an empty PSy layer module
-        psy_module = ModuleGen(self.name)
-        # include the kind_params module
-        psy_module.add(UseGen(psy_module, name="kind_params_mod"))
-        # include the field_mod module
-        psy_module.add(UseGen(psy_module, name="field_mod"))
-        self.invokes.gen_code(psy_module)
-        return psy_module.root
+        from psyclone.psyir.backend.fortran import FortranWriter
+        self.container.lower_to_language_level()
+        return FortranWriter()(self.container)
 
 
 class GOInvokes(Invokes):


@hiker
Copy link
Collaborator Author

hiker commented Jan 23, 2025

Thanks. I have added the patch here to this ticket. There surprisingly doesn't seem to be any clash with #2834. Great job with #2834 btw ❤️
I'll wait for #2834 to be merged in - I won't be doing any other work with this patch, I was just keen to see what else goes wrong ... and then I was quite surprised when nothing went wrong :) We are getting there, it looked really great to see all these gen functions being removed!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants