Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider splits and merges in tdating #349

Merged
merged 13 commits into from
Jul 22, 2024

Conversation

ritvje
Copy link
Contributor

@ritvje ritvje commented Mar 7, 2024

Hi! I've been working on including information on splits and merges of storm cells in tdating. This is a somewhat working version of that. Here currently,

  • a cell at time t is considered to split, if the advected version of the cell overlaps more than 10% with more than one cell at time t+1
  • a cell at time t is considered to be merged, if more than one advected cell (advected from t to t+1) overlaps more than 10% with it

The results are stored as additional columns in the cell dataframes:

  • splitted (bool): cell is considered split
  • split_IDs (list): list of IDs at next timestep that the cell split to
  • merged (bool): cell is considered a merge of multiple cells
  • merged_IDs (list): list of IDs from previous timestep that merged into this cell
  • results_from_split (bool): convenience attribute that is True if the cell is a result of split (i.e. the ID of the cell is present in the split_IDs of some cell at previous timestep)
  • will_merge (bool): convenience attribute that is True if the cell will merge at next timestep (i.e. the ID of the cell is present in the merge_IDs of some cell at next timestep). Will of course be empty, if the next timestep is not tracked

Additionally, this PR contains a bug fix in the pysteps.tracking.tdating.match function. As far as I can tell, in this bug the label 0 was included in possible match IDs, even though 0 denotes pixels that are not part of any cell. So there could be cases, where the advected cell would have e.g. 55% of 0 and 45% of some cell and it would not be matched to that cell. In my opinion, this bug needs to be fixed in the codebase even if the merge/split is not.

The code could probably be improved and e.g. it would benefit from some unit tests. But I'm creating this PR already for discussion especially for @pulkkins and @dnerini since this relates to my current work that you're involved in.

Since 0 in labels means that no cell was identified in that pixel, we need to
ignore 0 when matching. Otherwise we could have a situation where e.g the
cell overlaps 0.55 with 0 and 0.45 with some cell and it would end up unmatched
(`ID_coverage` would be 0 and a new track would be initialized)
If the advected cell overlaps > 10% with more than one cell at next timestep, consider
the advected cell as a split cell. Store information of which IDs the cell split to.
Also for cells at current timestep, mark cells that resulted from splits
If the cell at current timestep is overlapped > 10% by more than one advected cell,
consider it merged and store IDs of previous cells.
Also mark cells from previous timestep is they will merge at next timestep.
@ritvje ritvje requested review from dnerini and pulkkins March 7, 2024 15:41
@dnerini dnerini requested a review from feldmann-m March 7, 2024 19:16
@dnerini
Copy link
Member

dnerini commented Mar 7, 2024

Very nice contribution @ritvje ! I hope to be able to look into it soon, meanwhile let's ping in the original author of this routine

@feldmann-m in case you wants to leave a feedback on this PR

@dnerini
Copy link
Member

dnerini commented Mar 12, 2024

Additionally, this PR contains a bug fix in the pysteps.tracking.tdating.match function. As far as I can tell, in this bug the label 0 was included in possible match IDs, even though 0 denotes pixels that are not part of any cell. So there could be cases, where the advected cell would have e.g. 55% of 0 and 45% of some cell and it would not be matched to that cell. In my opinion, this bug needs to be fixed in the codebase even if the merge/split is not.

@ritvje would it be possible to fix this bug in a separate PR?

@ritvje
Copy link
Contributor Author

ritvje commented Mar 14, 2024

Sure, I made a new PR with that commit.

@dnerini
Copy link
Member

dnerini commented Jul 12, 2024

dear @ritvje, next week @ladc and colleagues are organizing a pysteps hackathon, see https://github.com/orgs/pySTEPS/projects/2

I was thinking that this might be a good opportunity to advance on this PR? what do you think?

@ritvje
Copy link
Contributor Author

ritvje commented Jul 15, 2024

Sure, it would be good to finally advance this! I should have time to work on this PR this week on Wednesday or Thursday (most likely evening-time).

Regarding the failing tests, I guess the main issue is the newly-added columns for the cell dataframe. Like @dnerini suggested, those (or the whole splits/merges analysis, since it's irrelevant if the information is not returned?) can be made optional.

Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 97.19626% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.74%. Comparing base (953f799) to head (5c8fdce).
Report is 3 commits behind head on master.

Files Patch % Lines
pysteps/tests/test_feature_tstorm.py 84.21% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #349      +/-   ##
==========================================
+ Coverage   83.52%   83.74%   +0.22%     
==========================================
  Files         159      159              
  Lines       12575    12724     +149     
==========================================
+ Hits        10503    10656     +153     
+ Misses       2072     2068       -4     
Flag Coverage Δ
unit_tests 83.74% <97.19%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ritvje
Copy link
Contributor Author

ritvje commented Jul 17, 2024

I now made the split/merge analysis optional with the output_splits_merges argument in dating function. Also now the overlap fractions required for matching cells are given as arguments match_frac, split_frac and merge_frac. Before, match_frac was hard-coded to 0.4 (which is now the default value), and split_frac/merge_frac are added in this PR.

Regarding the tests, I added a new argument for the option to get the split/merge output and a some test cases to check that the output is correct. But there are no tests to check the actual functionality.

Copy link
Member

@dnerini dnerini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

The best way to document this new feature is to show it in the example gallery. What do you think, could you maybe briefly expand the existing tdating example to showcase this optional feature?

@pulkkins
Copy link
Member

Nice work! Looks good to me.

@dnerini
Copy link
Member

dnerini commented Jul 20, 2024

@ritvje I included a short example and fixed some deprecation warnings. Let me know what do you think. I think this is now ready to be merged and released with the next pysteps version!

@ritvje
Copy link
Contributor Author

ritvje commented Jul 22, 2024

@dnerini, thanks for adding the example! I also think it's ready to be merged.

@dnerini dnerini merged commit 3e433ff into pySTEPS:master Jul 22, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants