Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(#Closes #2592 and 2582(?)) Routine has a _symbol attribute. #2596

Merged
merged 55 commits into from
Oct 2, 2024

Conversation

LonelyCat124
Copy link
Collaborator

@sergisiso @arporter I have a very initial implementation of the changes I feel are needed for the Routine node specifically to support this change. I'm a bit concerned about what I needed to do w.r.t to handling the creation of a Routine that doesn't have a parent and adding a parent in later (which is probably something reasonable to be able to do), as I needed to override _parent as both a property and setter.

If either you have time to have a look and let me know how you feel about this implementation before I move much further, and any suggestions/alternatives would be appreciated.

@arporter
Copy link
Member

Thanks @LonelyCat124, I think what you're missing is use of the attach and detach mechanism. See https://psyclone-dev.readthedocs.io/en/latest/psyir.html#the-parent-child-relationship and possibly the section after it on dynamic updates.

@sergisiso
Copy link
Collaborator

@LonelyCat124 I had a look at this, I think overriding the _parent setter is a good possible alternative and it should not cause problems with the ChildrenList automatic parent handling. (the other alternative is using Container._update_signal but this will be executed any time there is ANY update below a container, which could quickly be very expensive, so I prefer to explore your _parent.setter). I belive you also need to delete the symbol when parent is disconnected (the _parent.setter is given a parent different than _parent_node, before discarding the old _parent_node, it needs something like: _parent_node.symbol_table.remove(self._symbol)?)

What I am not sure is about what you did inside the name.setter "if not self._symbol" branch. I think if the Scope already have a symbol with the given name it could be because another routine or import already has it, and should fail. Maybe for new Routines should just set self._symbol = RoutineSymbol("name", NoType()) in the __init__, before calling the supper constructor, would that work?

@sergisiso
Copy link
Collaborator

I think if the Scope already have a symbol with the given name it could be because another routine or import already has it.

I see that one problem with this is that the fparser2reader forward-declares routine symbols (with the _process_routine_symbols calls). Because Fortran can have a Call to a routine that is declared below itself. Maybe we could modify the frontend to put empty Routines in a first pass (of the caller of _process_routine_symbols), so the symbols exist, and then populate them on a second pass.

@LonelyCat124
Copy link
Collaborator Author

I've merged master in (and fixed the conflict) and I now get some failing tests that aren't clear as to why they're failling for me.

One such example is test_generate_container in fparser2_container_test.

The code seems to assume that the container generated from generate_container shouldn't have any children, wheras now it does (the subroutine it contains). Was there a reason this doesn't have a child on master? I guess creating the Routine is now adding it to the parent container, but if you use the full fparser chain it would already do that...

@arporter
Copy link
Member

generate_container()'s days are numbered - #1115. Perhaps they've now run out?

@LonelyCat124
Copy link
Collaborator Author

LonelyCat124 commented Jun 26, 2024

I'm not sure - what was their original use-cases? If they're used in any scripts or any other part of PSyclone its not great to remove them, I can just fix the tests (there are also other failing tests I'm yet to examine).

Edit: grepping in PSyclone only shows the tests using it, so it depends on any other uses externally.

@arporter
Copy link
Member

Edit: grepping in PSyclone only shows the tests using it, so it depends on any other uses externally.

In that case I would remove it. I'm not aware of any external uses of it. @JulienRemy do you use fparser2reader.generate_container()?

@JulienRemy
Copy link
Collaborator

We've never used the attribute I implemented in the closed PR nor fparser2reader.generate_container(). No problem with doing this differently as far as I know!

@LonelyCat124
Copy link
Collaborator Author

Moving this into Routine looks to have some complex side effects w.r.t the DSL frontends.

One example is in GOcean where simply initialising a GOcean PSyIR tree is problematic. From what I understand, it looks like we first get a standard PSyIR tree, and then the raise_psyir_2_gocean_kern_trans takes standard PSyIR nodes and replaces them with the GOcean specific ones (e.g. GOceanKern).

The problem now is that since the Routine has added its' symbol to the GOceanContainer, there are also Call nodes to the RoutineSymbol, so when the transformation attempts to replace the Routine with the GOceanKern, the Routine attempts to remove its own symbol from the Container's symbol table, which fails in _validate_remove_routinesymbol.

I'm not really sure how to resolve this issue - any advice would be appreciated.

@arporter
Copy link
Member

That does sound difficult. Could we perhaps overload the replace_with() method of Node for a Routine and do something special in there?

@LonelyCat124
Copy link
Collaborator Author

Yeah - I'm just not sure what the "special" thing should be. I can't decide if we should force removal of the symbol (somehow) in this case, but I'm not sure what the result of that would be, nor if we have that functionality.

I did wonder about not adding the symbol to the parent if its a GOceanContainer or an LFRicContainer (or perhaps if its not either specifically a Container or FileContainer with type(..) != comparisons), but I'm not sure if that would break lowering of Kerns to Routines, since I'm not sure how they're lowered. It appears like it might work, but I'm not sure when the children of a GOceanContainer are lowered based on its lower_to_language_level call, which doesn't appear to loop through its children and lower them.

@LonelyCat124
Copy link
Collaborator Author

LonelyCat124 commented Jul 11, 2024

There's also some issues with outputs it seems like due to copying the tree.

When copying the Container, it keeps the symbol in the new containers symbol table (i think? Its not clear that this always happens since not every test fails...), but for one of the inline trans tests (that i'm currently looking at) it fails here:

psyir/backend/visitor.py:182: in __call__
    tree_copy = node.root.copy()
psyir/nodes/node.py:1602: in copy
    new_instance._refine_copy(self)
psyir/nodes/scoping_node.py:102: in _refine_copy
    super(ScopingNode, self)._refine_copy(other)
psyir/nodes/node.py:1585: in _refine_copy
    self.children.extend([child.copy() for child in other.children])
psyir/nodes/node.py:261: in extend
    self._set_parent_link(item)
psyir/nodes/node.py:185: in _set_parent_link
    node._parent = self._node_reference
psyir/nodes/routine.py:194: in _parent
    parent.symbol_table.add(self._symbol)

because sub is already in the SymbolTable, which I can only assume was due to copying in, but I'm not sure how I avoid this.

Edit: this happens at the fortran_writer(psyir) call.

@LonelyCat124
Copy link
Collaborator Author

TODO:

  • When replacing a Routine with a Routine - pass the symbol from the original routine to the new one (and adjust name accordingly?).

  • ScopingNode check for copied Routines and remove symbols corresponding to them.

@LonelyCat124
Copy link
Collaborator Author

LonelyCat124 commented Jul 12, 2024

I fixed the above things, just roaming through failing tests and fixing those that I understand that are caused by these changes.

I've hit one that I don't understand how I can have changed the behaviour of the code.
In frontend/fparser2_test.py there is test_deferred_array_size which now fails because:
KeyError: "Symbol table already contains a symbol with name 'n'.

I don't think I modified the behaviour of process_declarations so I'm confused how this passed before - fparser2.py adds the n symbol to the symbol table as an unresolved integer type, and then it fails when it comes across the actual symbol n's declaration.
@sergisiso @arporter Any ideas on what I might have messed up? I'll keep investigating other tests and update to master when I'm done and see if its still an issue.

Edit - will continue documenting tests i don't understand:
Also kernel_module_inline_trans_test.py::test_validate_unsupported_symbol_shadowing fails on "But it is fine if it shadows itself."
I think this change maybe straight up broke ModuleInlineTrans...

Also some tests in tests/psyir/tools/call_tree_utils_test.py are also failing, i don't know what this does so will need to discuss I expect.

Also having issues in test_lfric_adjoint test test_generate_lfric_adjoint_multi_routine as after copying I can't remove the old RoutineSymbol that I want to replace with the new one. Maybe the option here is to remove all the references to things before re-adding them but that seems tricky.

@LonelyCat124
Copy link
Collaborator Author

I also seem to have some failing tests in inline_trans because the order of Routines inside a Container has changed (and doesn't match the input code). I'll try to work out what can have changed to cause this first as it may fix a number of tests.

@LonelyCat124
Copy link
Collaborator Author

LonelyCat124 commented Jul 12, 2024

Ok it again looks like not an easy fix. The test is simple:

    code = (
        "module test_mod\n"
        "contains\n"
        "  subroutine run_it()\n"
        "    integer :: i\n"
        "    real :: a(10)\n"
        "    do i=1,10\n"
        "      call sub(a, i)\n"
        "    end do\n"
        "  end subroutine run_it\n"
        "  subroutine sub(x, ivar)\n"
        "    real, intent(inout), dimension(10) :: x\n"
        "    integer, intent(in) :: ivar\n"
        "    integer :: i\n"
        "    do i = 1, 10\n"
        "      x(i) = 2.0*ivar\n"
        "    end do\n"
        "  end subroutine sub\n"
        "end module test_mod\n")

During fparser, we create empty Routine for run_it and sub and add them to the test_mod Container. The implementation we decided on for filling them was we then add the internal ASTs, and then detach them (to be reconnected to their parent in process_nodes as is currently done).
Unfortunately for this example case this doesn't work as expected since run_it contains a call to sub, when calling routine.detach() for sub it returns a ValueError as its not safe to detach the node as there is a reference to its symbol.

An alternative solution to this would be to not detach Routine nodes as current and return None from the _subroutine_handler, however there is a bad case where a Routine should result in a CodeBlock, but its symbol is called elsewhere in the file, which I expect would mean the Routine fails to detach and adding a CodeBlock of this Routine would result in two implementations of the Routine in the tree, where one is potentially partly filled with the code and the other is a full implementation from a CodeBlock.

I wonder if I should force routine.detach() to work even if its symbol can't be removed, but I'm not sure what downsides this might have? I guess there is the "what if i'm attaching a routine to a container where its symbol already exists" issue, but I can check if its symbol is already present before trying to add it. Any thoughts @sergisiso?

Edit: This inability to remove Routines due to their symbols being prevented from removal also breaks _find_or_create_psyclone_internal_cmp at container.children.extend(dummymod.pop_all_children())

Remaining failing tests are in these files:

tests/psyir/backend/fortran_unsupported_declns_test.py
tests/psyir/frontend/fparser2_part_ref_test.py
tests/psyir/frontend/fparser2_test.py
tests/psyir/tools/call_tree_utils_test.py
tests/psyad/domain/lfric/test_lfric_adjoint.py
tests/domain/common/transformations/raise_psyir_2_alg_trans_test.py
tests/domain/common/transformations/kernel_module_inline_trans_test.py
tests/psyir/transformations/inline_trans_test.py
tests/psyir/frontend/fparser2_select_case_test.py
tests/domain/lfric/transformations/raise_psyir_2_lfric_kern_trans_test.py
tests/domain/lfric/transformations/raise_psyir_2_lfric_alg_trans_test.py
tests/domain/lfric/lfric_kern_test.py

I'd guesstimate that 2/3rds or more fail because Routine.detach() fails at various points as the RoutineSymbol is referenced.

@LonelyCat124
Copy link
Collaborator Author

LonelyCat124 commented Jul 15, 2024

I'm now forcing removal of symbols in ScopingNode._refine_copy, and keeping a list of "symbols_to_fix", but I'm not sure they ever actually need fixing as I think maybe something else is fixing the attachment of the Call nodes to the new Routine nodes, and since the new Routine nodes' symbols are added to the new symbol table its fixed itself.
I'm running the test suite with an assert False statement if I find a Call node whose symbol is in the list of symbols_to_fix, but otherwise I'll work with this for now.

It does have some un-pythonic/OO implementations in various points now, and once this is closer to review (i.e. the tests all work) then I probably need to discuss this with the reviewer probably.

@LonelyCat124
Copy link
Collaborator Author

LonelyCat124 commented Jul 15, 2024

The changes break the test for lines 404-410 in psyir/tools/call_tree_utils.py, I'm not quite sure I understand this code, but I think it becomes impossible to fail to find a (resolved) Routine in a module now so I'm not sure this error can happen? Would need @hiker to check that section I think. test_get_non_local_read_write_info_errors also fails - I assume for a similar reason but I'm not sure.

Still one test failing in fparser2.py. test_deferred_array_size is one, as previously discussed. I also broke some error code in _process_decln which is no longer reachable as the own_routine_symbol tag is no more in existence - I instead did a workaround by searching for the ancestor Routine and checking for return_symbol which seems to resolve those failing tests. @sergisiso or @arporter might be good to discuss and see if we can work out what has changed with the deferred array size declaration/how to fix it .

My remaining concern is that I broke something in adjoint @arporter. The test test_lfric_adjoint::test_generate_lfric_adjoint_multi_routine fails when outputting during symbol_table.deep_copy at this section:

for routine in symbol.routines:
    new_routines.append((new_st.lookup(routine.symbol.name), routine.from_container))
symbol.routines = new_routines

What seems to be happening is there are some symbols that refer to a "routine" test_code_r4 which has (I assume) been adjointed into adj_test_code_r4, and so when it tries to lookup the test_code_r4 symbol that it needs to sort out the GenericInterfaceSymbol it fails because its not been copied into the new symbol table. Its not there because its no longer present in the old one now either. This all happens during _refine_copy of the fortran_writer - it might be best if we can have a discussion about this to work out whats going wrong/what I've changed that has broken expectations.

All the other remaining tests are around kernel_module_inline_trans - I've left those alone for now - I think they're all happening due to symbols already being present in symbol tables but I need to dive into why this happens for some (but not all) tests.

Edit:
adjoint_visitor.py does node.copy() at line 437 which copies the GenericInterfaceSymbol containing the names of routines that will then be adjointed. Need to check what the correct intent here is and what the master branch currently outputs @arporter / me. Best option is maybe to run the lfric_adjoint_multi_routine test? Maybe relates to #1946 in some ways too but I'm not sure.

@LonelyCat124
Copy link
Collaborator Author

I'm going to rerun the integration tests now after the error from alst week. Not sure why project coverage decreases though.

@arporter
Copy link
Member

A drop in coverage often indicates 'indirect changes' - i.e. changes in coverage of lines that aren't directly touched in this PR. If you look at https://app.codecov.io/gh/stfc/PSyclone/pull/2596/indirect-changes you'll see you have a few.

@LonelyCat124
Copy link
Collaborator Author

If this now works with coverage I'll run integration tests once more as I removed what I believe was unreachable code now with this branch, but I want to double check integration tests don't break.

@LonelyCat124
Copy link
Collaborator Author

@sergisiso
I'm still having NEMOv4 fail with this error:

nvfortran -c -o storng.o -I/home/gh_runner/NEMO/cfgs/SPITZ12_passthrough/BLD/inc -i4 -Mr8 -g -O3 -Mflushz -Mfma -Minline -fastsse -Mcray=pointer -Mvect=prefetch,tile,fuse -Mpre -mp=gpu -gpu=managed -I/home/gh_runner/actions-runner/_work/PSyclone-mirror/PSyclone-mirror/lib/profiling/nvidia/ -I/home/aporter/NEMO/xios_trunk/inc -I/not/defiled/build/lib/mct -I/not/defiled/build/lib/psmile.MPI1 -I/apps/packages/netcdf_fortran/4.6.1-nvidia-hpcsdk/include -c /home/gh_runner/NEMO/cfgs/SPITZ12_passthrough/WORK/storng.f90
NVFORTRAN-S-0155-Redefinition of kiss (/home/gh_runner/NEMO/cfgs/SPITZ12_passthrough/WORK/storng.f90: 28)

We tested this file I thought and it was fine - I'll try to have another look at this tomorrow.

@LonelyCat124
Copy link
Collaborator Author

@sergisiso I found the issue, the output contains this:

  subroutine kiss()


  end subroutine kiss
  ! PSyclone CodeBlock (unsupported code) reason:
  !  - PSyclone doesn't yet support 'Contains' inside a Subroutine or Function
  FUNCTION kiss()

What seems to happen is the fparser2.py:5311 doesn't remove the fake Subroutine.

I have a test that checks this subroutine is removed and will fix this up now.

@LonelyCat124
Copy link
Collaborator Author

@sergisiso This should be ready for another look now - it made it past the failing part of integration so should be ok now.

Copy link
Collaborator

@sergisiso sergisiso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LonelyCat124 All the comments have been addressed, all the tests and integration test pass and the documentations builds. This is ready to merge

Reading and grepping the branch I found a bunch of symbol_table.lookup_with_tag('own_routine_symbol') and routine.symbol_table.lookup(routine.name) or similar instances, and I replaced all of these with routine.symbol. This has not broken any test, but I will run a final integration test just in case.

@sergisiso
Copy link
Collaborator

Integration test passed again, there was just some flake8 issues that I introduced myself and I have now fixed.

@sergisiso sergisiso merged commit 6aad1d4 into master Oct 2, 2024
13 checks passed
@sergisiso sergisiso deleted the 2592_Routine_symbol branch October 2, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants