Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a decoding mask option to only include subset of grid nodes in m2g #34

Open
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

joeloskarsson
Copy link
Contributor

@joeloskarsson joeloskarsson commented Oct 21, 2024

Describe your changes

For LAM forecasting it is reasonable to not always predict values in nodes that are considered part of the boundary forcing. This is in particular a change planned for neural-lam. When we consider the graphs involved, this means that the g2m edges should only connect to a subset of the grid nodes.

This PR introduces an option decode_mask that allows for specifying an Iterable of booleans (e.g. a numpy-array) specifying which of the grid nodes should be included in the decoding-part of the graph (m2g). This allows in the LAM case to specify such a mask with True for the inner region nodes.

This builds on #32, which should be merged first. Here is a diff for only this PR in the meantime: joeloskarsson/weather-model-graphs@general_coordinates...decoding_mask

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

Copy link
Contributor

@sadamov sadamov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Agree that this is required. Testing for "less than" seems sufficient 👍

More general comment: I am slightly worried about the mask being generated in mllam-data-prep and then being used here and in neural-lam. We probably need some solid checks in neural-lam about the dimensions of the graph, the data-tensors and the mask.

@sadamov sadamov added the enhancement New feature or request label Oct 22, 2024
@joeloskarsson
Copy link
Contributor Author

joeloskarsson commented Nov 19, 2024

Good suggestions all around @SimonKamuk ! Was there something specific introduced by this PR that you think should be added to the readme? I was thinking that the example in the readme can continue to use default parameters, which is decode_mask=None.

Otherwise this is the plan for this PR now:

EDIT: Given later changes we might want to take another look over this before merging.

@leifdenby leifdenby added this to the v0.3.0 milestone Nov 20, 2024
@SimonKamuk
Copy link
Contributor

Well, it's not strictly necessary, and I think it's actually better suited for the documentation notebooks rather than the readme, but I just thought it would be nice to describe this feature with an example to make it easier for people to find. But again, it might not be necessary 😄

@joeloskarsson
Copy link
Contributor Author

I do agree though that it could be nice to have a documentation notebook with an example of using this. Will see if I can put together something small.

@leifdenby
Copy link
Member

I just had a quick skim using the diff to #32 link you provided (thanks!). In general this looks great. I will make a proper review once #32 is merged in. I hope that's ok!

@joeloskarsson
Copy link
Contributor Author

There is one small problem currently, which is one reason why a6b6137 was introduced.

When using the decoding mask (maybe this could happen also without it, not sure?), not all mesh nodes might be connected to the grid in m2g, i.e. there are mesh nodes that are not endpoints of any edges in m2g. This is desirable. However, when you split up a graph using

def split_graph_by_edge_attribute(graph, attr):

only nodes that are endpoint of any edge will be included. Node subsetting is only implicit through the edge sub-graphing:
subgraphs[edge_value] = graph.copy().edge_subgraph(
# Subgraphs only contain edges that actually has attribute,
# and where it is correct value
[
edge
for edge in graph.edges
if attr in graph.edges[edge] and graph.edges[edge][attr] == edge_value
]
)

This means that something like
graph_components = wmg.split_graph_by_edge_attribute(graph=graph, attr="component")
will implicitly filter the mesh nodes in m2g to just the ones that are connected to the grid. This messes up the indexing of mesh nodes, and makes graphs saved this way incompatible with neural-lam.

This is why the option to return separate components was introduced, since it is unnecessary to merge everything and then split it up again (especially as splitting now breaks the node indices). Do you have any thoughts about good way to fix this @leifdenby ? Or do we want to fix it? Not sure if if has to be fixed in this PR, since in a sense it's a more general issue with the edge-attribute-based sub-graphing.

@joeloskarsson
Copy link
Contributor Author

joeloskarsson commented Nov 29, 2024

Now all changes from #32 is merged into main and here so this diff is readable.

@leifdenby
Copy link
Member

leifdenby commented Dec 5, 2024

This messes up the indexing of mesh nodes, and makes graphs saved this way incompatible with neural-lam.

Ok, that sounds like an issue. I don't fully understand why yet, but I believe you. So rather than merging to create a graph that represents the whole encode-process-decode graph of nodes and edges, you introduced return_components in a6b6137#diff-72daea2494478edf48e319a44c42b839b9a6175a03dd1b2fa97ca8db212a79caR184 to instead return g2m, m2m and m2g separately. Is that right?

This is why the option to return separate components was introduced, since it is unnecessary to merge everything and then split it up again (especially as splitting now breaks the node indices). Do you have any thoughts about good way to fix this @leifdenby ? Or do we want to fix it? Not sure if if has to be fixed in this PR, since in a sense it's a more general issue with the edge-attribute-based sub-graphing.

The reason why I implemented the graph creation by including a merging to one big graph is so that each node to could be given a globally (as in across the whole graph) unique ID. I then assumed that this id could then be used for constructing the adjacency matrices for different components (g2m, m2m, m2g say) which are then saved to individual files. Are you saying that this of having a global index doesn't work when masking out which grid-nodes are used in the m2g component? Or is it because the index in the grid is actually not a global index but instead an index that only runs over grid points maybe?

I might have to create a little toy example to understand the issue properly

@leifdenby
Copy link
Member

I might have to create a little toy example to understand the issue properly

Maybe if you have time you could add a test/notebook @joeloskarsson with an assert that checks for the indexing that you expect? I assume it is the .pt-file that represents the m2g adjacency matrix that is incorrect.

I tried creating a notebook just now, but I am not quite sure about what the correct answer would be...

@joeloskarsson
Copy link
Contributor Author

Yes, the issue does appear when you split the combined graph. But the problem is not that the indices are wrong, it is that you lose some nodes (which make the indices wrong on those remaining nodes). So your toy example could be:

  1. Create a graph where not all mesh nodes are connected to the grid in m2g, i.e. there are no edges in m2g that start in these mesh nodes.
  2. Split this graph as in
    graph_components = wmg.split_graph_by_edge_attribute(graph=graph, attr="component")
  3. Now the m2g component will no longer contain these mesh nodes that are not connected to the grid. So e.g. g2m and m2g will have a different amount of mesh nodes.

@leifdenby
Copy link
Member

I have dug a bit deeper @joeloskarsson and I have realised where the issue is. I was under the impression that torch_geometric.utils.convert.from_networkx() will use the node labels as the indicies in the adjacency matrix (and that is why I set the node labels as a globally unique index in https://github.com/mllam/weather-model-graphs/blob/main/src/weather_model_graphs/create/base.py#L176). But that I realise now that is not the case, instead the node indicies are always in the range [0,n_grid_points-1] as defined in the following line: https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/utils/convert.py#L231). I also release that was maybe a bad assumption given that node labels can be strings. This means, as you say, that splitting the graphs before saving will mean that indicies in the adjacency matrices will be inconsistent between g2m and m2g.

What I still don't quite understand is what neural-lam assumes about the grid-index values for different parts of the whole encode-process-decode graph. I think (based on the fact that we're having this discussion) that g2m and m2g should share indices so that the same grid-point in either g2m or m2g has the same index in both. But is that also the case for the m2m graph (or subgraphs when using the hierarchical graph)? Or does neural-lam expect that the mesh-nodes start with index value 0?

I have added a notebook with my experimentation here: https://github.com/leifdenby/weather-model-graphs/blob/decoding_mask/docs/decoding_mask.ipynb, and a started on a test for the decoding mask too: https://github.com/leifdenby/weather-model-graphs/blob/decoding_mask/tests/test_graph_decode_gridpoints_mask.py#L17

@joeloskarsson
Copy link
Contributor Author

I was under the impression that torch_geometric.utils.convert.from_networkx() will use the node labels as the indicies in the adjacency matrix (and that is why I set the node labels as a globally unique index in https://github.com/mllam/weather-model-graphs/blob/main/src/weather_model_graphs/create/base.py#L176). But that I realise now that is not the case, instead the node indicies are always in the range [0,n_grid_points-1] as defined in the following line:

Yes, this is exactly what I also realized. I wish I would have had a bit more time to write out an explanation for this and I could have saved you the work 😓 Well well, I hope you feel like you gained some insights on the way. In my view the arbitraryness of indexing when converting networkx tto pyg feels really bad and not very thought through. But I guess this is all just because networkx doesn't really have a notion of an integer node index (closer to how we think of a graph in theory, with sets, bad for practical work imo 😛).

I think (based on the fact that we're having this discussion) that g2m and m2g should share indices so that the same grid-point in either g2m or m2g has the same index in both.

I am not sure if that is actually necessary, but we could do that. My implementation using this in neural-lam makes the hard decision that the nodes that you want to decode to have the first indices, follows by the nodes only used in g2m. This does mean that it doesn't matter if you include the masked out node indices in your m2g graph (you'll get the same edge_index tensor in the end anyway). On a more conceptual level, one could either argue that m2g should only index into the set of nodes we decode to, or argue that m2g should index into the set of all grid nodes, but only contain edges to the unmasked ones. Not obvious to me what makes more sense.

What I still don't quite understand is what neural-lam assumes about the grid-index values for different parts of the whole encode-process-decode graph.

Yea, there is quite a lot to this. I have made very minimal assumptions wrt node indexing in neural-lam. Given that I know that we have some different perspectives on the graph (I have this perspective of separate graphs, rather than one joined), I feel like it is something that should be explained better. Maybe I could try to write up some explanation for what neural-lam expects (in that is mllam/neural-lam#93) when I find some time, or we could sit down and talk it through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants