Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update rainfarm.py #311

Merged
merged 23 commits into from
Jun 10, 2024
Merged

Update rainfarm.py #311

merged 23 commits into from
Jun 10, 2024

Conversation

simonbeylat
Copy link
Contributor

@simonbeylat simonbeylat commented Sep 21, 2022

Add smoothing operator

see #287

Add smoothing operator
@codecov
Copy link

codecov bot commented Sep 21, 2022

Codecov Report

Attention: Patch coverage is 95.80420% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.50%. Comparing base (8ece5be) to head (c38f4e1).
Report is 45 commits behind head on master.

Current head c38f4e1 differs from pull request most recent head 8827515

Please upload reports for the commit 8827515 to get more accurate results.

Files Patch % Lines
pysteps/downscaling/rainfarm.py 95.20% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #311      +/-   ##
==========================================
+ Coverage   82.79%   83.50%   +0.71%     
==========================================
  Files         161      159       -2     
  Lines       12339    12562     +223     
==========================================
+ Hits        10216    10490     +274     
+ Misses       2123     2072      -51     
Flag Coverage Δ
unit_tests 83.50% <95.80%> (+0.71%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dnerini dnerini self-requested a review September 22, 2022 19:33
Copy link
Member

@dnerini dnerini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution, @simonbeylat !

First of all, I'd be curious to know if you already tried to run the code with some data? Do you see an improvement with the new smoothing operator?

I had a first look a your code and made few minor comments, mostly concerning "aesthetics". These, however, are important to make the code base readable and usable by others (I like to think that we write code not for machines, but for other humans!). You can find some more details and guidelines in our dedicated developer page:
https://pysteps.readthedocs.io/en/stable/developer_guide/contributors_guidelines.html

You can also find more suggestions to improve your code in the codacy checks included in the test suite of this PR: https://app.codacy.com/gh/pySTEPS/pysteps/pullRequest?prid=10213794

More importantly, I was wondering about the difference between the smoothing operator from @jhardenberg (which you nicely reimplemented in this PR) and the _balanced_spatial_average already used in the code. Aren't they doing the same thing? I wouldn't mind replacing the existing method with a more sophisticated one, we just need to make sure that we are not applying the same thing twice.

Finally, your new code needs to be tested (you'll notice that the coverage would decrease after merging, since we are adding untested lines). I suggest that you have a look at the https://github.com/pySTEPS/pysteps/blob/master/pysteps/tests/test_downscaling_rainfarm.py file and try to add a new test that will run rainfarm with and without smoothing.

pysteps/downscaling/rainfarm.py Outdated Show resolved Hide resolved
pysteps/downscaling/rainfarm.py Outdated Show resolved Hide resolved
pysteps/downscaling/rainfarm.py Outdated Show resolved Hide resolved
pysteps/downscaling/rainfarm.py Outdated Show resolved Hide resolved
pysteps/downscaling/rainfarm.py Outdated Show resolved Hide resolved
pysteps/downscaling/rainfarm.py Show resolved Hide resolved
pysteps/downscaling/rainfarm.py Outdated Show resolved Hide resolved
pysteps/downscaling/rainfarm.py Outdated Show resolved Hide resolved
@jhardenberg
Copy link

@simonbeylat as I said, it is great that you are adding this.

More importantly, I was wondering about the difference between the smoothing operator from @jhardenberg (which you nicely reimplemented in this PR) and the _balanced_spatial_average already used in the code. Aren't they doing the same thing? I wouldn't mind replacing the existing method with a more sophisticated one, we just need to make sure that we are not applying the same thing twice.

Indeed _balanced_spatial_average is a generic function which does convolution with a kernel (I also was going to suggest some pre-made convolution function from scipy or numpy to this end). So to implement Gaussian smoothing, it would be enough to define a Gaussian kernel instead of the tophat which is currently being used in the code and then call _balanced_spatial_average. What is different in your new _smoothconv function is also the treatment of missing values, which is indeed completely missing in _balanced_spatial_average (and which could in theory be added to this function).

Both a tophat and a Gaussian are valid options and it would be fine if the user could select which one to use. In the original Rainfarm we usually also leave the option of no smoothing at all, just making sure that the aggregattion over boxes corresponding to the coarse resolution is conserved, but that is another issue.

Btw I notice that in your code if the smooth option is selected you call both the tophat smoothing and the Gaussian smoothing in sequence. That is a double smoothing, If we use the Gaussian smoothing we do not need to smooth before also with the tophat.

Another comment: in other languages we always implemented smoothing using the equivalent of the scipy.ndimage.convolve option mode='wrap' to treat the boundaries because we implemented it dierctly in Fourier space. There is nothing wrong with the default 'reflect' used in _balanced_spatial_average but obviously it will lead to different results. The new implementation of _smoothconv implictly is using 'wrap' since it operates in Fourier space.

To be clear, my suggestion would indeed be to expand _balanced_spatial_average to treat also missing values and then call it either with a tophat or with a Gaussian kernel as needed. This is much more compact code. No need to have to functions, one should be enough.

A final comment regards speed: as far as I understand it scipy.ndimage.convolve which is used in _balanced_spatial_average does 'regular' convolution with matrix multiplication. The same done in Fourier space (as we do in other languase and as you are now doing in your new function) can be much more efficient. Actually scipy.signal.convolve as of version 0.19.0 chooses automatically the fastest option for you, so it might be better to use that instad of scipy.ndimage.convolve in _balanced_spatial_average.

@simonbeylat have you tried comparing the speed of your new function to that of the current _balanced_spatial_average ? And with that of a _balanced_spatial_average using scipy.signal.convolve ? (should be similar)

@simonbeylat
Copy link
Contributor Author

simonbeylat commented Sep 23, 2022

Hello

First of all, thank you for your feedback!

@dnerini , I also like the idea of having a clear and aesthetic code that could be easily understood by others. I will make some changes by looking at the different documentations you sent me! I will also use the test code before I add the change.

Both a tophat and a Gaussian are valid options and it would be fine if the user could select which one to use. In the original Rainfarm we usually also leave the option of no smoothing at all, just making sure that the aggregattion over boxes corresponding to the coarse resolution is conserved, but that is another issue.

I like @jhardenberg idea where we let the users choose which technique they want to use. I think I can try to "update" the _balanced_spatial_average method to add the treatment of missing values.

So I suggest changing the smooth argument and having 3 options:

  • None: no smooth operatore.
  • 'tophat' (default): use the _balanced_spatial_average method.
  • 'Gaussian': use _smoothconv or _balanced_spatial_average with a Gaussian kernel.

About testing the code and the speed of time. I tested the code using several data (I was using those data when I was using the R version of RainFARM). I didn't see a noticeable improvement but that may be because I was using _balanced_spatial_average and _smoothconv in a row. In terms of speed, both methods were quite fast. I didn't focus on that point.

(out of context, I wanted to introduce myself quickly (as I have never done it before) I have just graduated in a master of data science/engineering. (which ended with an internship in downscaling and bais correction of climate models), and I am actively looking for a PhD in these (or similar) subjects. This development is also a way for me to learn and improve my skills. I wanted to thank you for the help and the feed back you are giving me.)

@dnerini
Copy link
Member

dnerini commented Sep 28, 2022

Excellent, sounds all very good to me! I'll have another closer look once you push the next changes ;-)

And thanks for introducing yourself, always a pleasure to learn more about the person behind a simple name!

simonbeylat and others added 2 commits October 3, 2022 16:17
…d_spatial_average method to add the treatment of missing values, add None option, black autoformatter, add Terzago ref
@dnerini
Copy link
Member

dnerini commented Oct 7, 2022

Thanks @simonbeylat for the new changes, well done!

I noticed that you still have both smoothing methods implemented: _smoothconv and _balanced_spatial_average. From what I gathered in our discussion above with @jhardenberg, we would rather join them into one single method which can accept any arbitrary kernel. In practice, we suggested that you start with the existing _balanced_spatial_average and expand it with the treatment of missing values (as in the new function). Then, based on a input argument kernel, you could build and pass the corresponding kernel function or just skip the smoothing part kernel is set to None.

Does it make sense?

@simonbeylat
Copy link
Contributor Author

Hi @dnerini, @jhardenberg

I managed to add the 2nd pull request which is about the part added by D'onofrio et al in 2014 on spectral fusion. It took me a while but I found something that looks good.

I look forward to your reviews.

@dnerini
Copy link
Member

dnerini commented Nov 2, 2022

hey @simonbeylat just wanted to quickly mention that I haven't forgot about this, it's just a bit of a busy period. I hope to be able to look into your latest changes soon!

@simonbeylat
Copy link
Contributor Author

Hello @dnerini ,

Will you have the time to review the new commit ?

Or should I add the last commit (with the computation of weight) ?

@dnerini
Copy link
Member

dnerini commented Nov 27, 2022

hey @simonbeylat I just created a PR on your branch: simonbeylat#1
let's continue discussing there for now, we'll come back here once those changes are discussed and merged.

@simonbeylat
Copy link
Contributor Author

simonbeylat commented Dec 6, 2022

I had the changes you made and I had small changes because it wasn't working.

I was implementing the _apply_spectral_fusion function, I tried to translate @jhardenberg 's julia version.

But I don't really understand how this version and the version written in D'onofrio's article are equivalent. Also, when I tried the code, it seems to be wrong and gives very large extremes.

The first version I wrote (you should be able to see it by looking at this commit )seemed to give good results and I tried to close to the mathematics written in the article, but I might be wrong.

I think I misunderstood something and translated @jhardenberg's code wrong.
I'll focus on this part.

@dnerini
Copy link
Member

dnerini commented Dec 11, 2022

Hello @simonbeylat thanks for the update, happy to see that we are making some progress on this!

Concerning the _apply_spectral_fusion, it would be perfectly fine for me to follow the maths from the original paper if these give more sensible results. We might then ask for help to @jhardenberg to reimplement his julia/R versions in python in a second step.

@dnerini
Copy link
Member

dnerini commented Jan 23, 2023

hey @simonbeylat, getting back on this PR I realized I might have misunderstood your last message -- are you still working on it or is it in principle complete?

@simonbeylat
Copy link
Contributor Author

Hello, sorry I have several important projects and a big problem with my personal computer where I was working on this project... (as I was doing this in my spare time).
I will work on this project as soon as I can !

@dnerini
Copy link
Member

dnerini commented Jan 23, 2023

Hey @simonbeylat , absolutely, no worries! Just wanted to make sure that you weren't waiting for a feedback from my side ;-)

@simonbeylat
Copy link
Contributor Author

Hi @dnerini, I'm very busy as I started my PhD and so I started many projects, I'm not sure if I can finish the development of rainfarm at the moment. I will try to do it when I have some free time, but I can't know when.

I'm sorry for that.

@ecasellas
Copy link

Hello @dnerini, @jhardenberg and @simonbeylat,

Firstly, thank you very much for providing this tool. We began working with the RainFARM implementation in PySTEPS, and upon discovering this pull request, we decided to try the spectral fusion update by Simon, as it represents an improvement in merging between coarse-scale precipitation fields and fine synthetic precipitation fields. The code we used is in this commit.

The appearance of the field seems to be more realistic compared to the current RainFARM implementation in PySTEPS. However, we have encountered some unrealistic precipitation values at certain grid boundaries. Attached is an example comparing the different RainFARM implementations: From left to right, Spectral fusion in Python using Simon's code (note the unrealistic precipitation at the boundary on the right), the original RainFARM in PySTEPS, and R's implementation of RainFARM.

rainfarm_example

We checked Simon's code, but so far, we haven’t identified anything that could lead to this problem. During the development of the code, did you encounter similar results or problems? We consulted resources online and found suggestions that this issue may be related to spectral leakage. However, we are not experts in this field. One proposed solution is to apply windowing methodologies before calculating the Fast Fourier Transform, although we have found that this does not always resolve the problem. We are not sure if there are additional insights or solutions we may be overlooking.

Thank you once again.

@simonbeylat
Copy link
Contributor Author

Hello @ecasellas,

I'd like to start by apologising for the lack of progress,
I started updating RainFARM in my free time during my master's degree, but when I started my phd (on data assimilation), I didn't have time to get back to it.

Actually the implementation was not completely finished, the initial plan was to come back with the same version of RainFARM as the Terzago et al. 2018 paper.
but I'm quite surprised by the results (even with the boundary issue). Unfortunately I can't manage to finish this job (although I would have loved to)..

I'll be very happy to follow developments if you decide to update the code I've added.

Best,

Simon

@dnerini
Copy link
Member

dnerini commented Apr 26, 2024

Dear @ecasellas many thanks for following up on this! This PR is currently stale, which is a bit of a pity since it's probably not very far from being completed. Since the original author is not able to work on it anymore, I suppose this is now open to anyone motivated to continue and finalize the work!

Concerning the actual issue reported above, the windowing approach that you mention makes sense to me. Another thing worth trying might be padding the field with zeros to mitigate such boundary effects, have you already tried something in this direction?

edit: just out of curiosity, could you add a plot of the original low res field that you downscaled in your example?

@ecasellas
Copy link

Hi @dnerini and @simonbeylat,

Thank you very much for your comments. Since @simonbeylat granted us push permissions to the associated repository, we'll commit the changes to this PR. As Simon mentioned, we also think it's better to keep everything in the same place and to maintain the history of changes and all the great work done by Simon.

We'll return here once the changes are pushed.

Thank you once again.

@ecasellas
Copy link

Hello @dnerini and @simonbeylat,

We've made some updates to this PR, which include:

  • Implementation of spectral fusion, translated from R's RainFARM version. Initially, we used a custom function for radially averaged spectra in Python, but later discovered that the raspd function was already implemented in PySTEPS, as @simonbeylat had done.
  • Adjustment of the consistency condition between the synthetic field and the original field. In the previous version, the original field was zoomed into the synthetic field shape before applying the consistency condition. Here, we aggregate the synthetic field to the original shape, interpolate it to match the synthetic field shape (using np.kron), and finally apply the consistency condition.
  • Added an option to skip smoothing. When aggregating a high-resolution field, it matches the original field if no smoothing kernels (already implemented by @simonbeylat) are applied. This is not the case if any of the available smoothing kernels are used.
  • If spectral_fusion set to True, the noise field is normalized using the shape of the field rather than its standard deviation. This is the same as in R, we tried using standard deviation, but then the results differ.
  • The alpha used in generating the synthetic field is divided by 2, as in the referenced articles. However, in R and Julia implementations, it’s calculated as (alpha + 1) / 4. Dividing alpha by 2 yields results consistent with R, although we're uncertain why this discrepancy exists.
  • Tests for precipitation high-resolution aggregation and spectral fusion. The latter was integrated into the existing test_rainfarm_shape function, with additional parameters included.

We appreciate the excellent work done by @simonbeylat and @dnerini, as it’s much easier to approach a problem when it’s already laid out.

We look forward to your reviews, thank you once again.

RAINFARM

@jhardenberg
Copy link

Hi folks, this is a great development, congrats to everybody!

@ecasellas
Copy link

Hello,

We tried this new version on several episodes and found that something might not be entirely correct in the changes we made. Upon reviewing the code, we realized that we misunderstood the calculation of the synthetic field amplitude. In the papers, alpha is divided by 2, and in our previous implementation, we used:

noise_field_complex = white_noise_field_complex * np.sqrt( freq_array_highres**-alpha / 2 )

However, the factor of 1/2 is already accounted for by np.sqrt. So adding 1/2 was resulting in a double np.sqrt. In the new commit, we removed the "/ 2". We then checked other precipitation events and compared the performance of Python RainFARM and R RainFARM, and the results were equivalent.

P.S. Sorry @jhardenberg for not mentioning you in the previous message, but we also thank you very much for providing RainFARM in R and Julia.

@dnerini
Copy link
Member

dnerini commented May 26, 2024

hi @ecasellas, thanks for the important fix! From my perspective, this looks good to go. I wonder if @simonbeylat and @jhardenberg are planning to review this PR?

edit: @ecasellas the formatting test is failing, can you run black on your code and push the changes to this branch?

@ecasellas
Copy link

Hello @dnerini, black formatter run on rainfarm.py, hope it'll pass the test!

@dnerini dnerini added enhancement New feature or request and removed good first issue Good for newcomers labels Jun 9, 2024
@dnerini
Copy link
Member

dnerini commented Jun 9, 2024

hello @ecasellas apologies for the slow action on this PR... I had a closer look today and pushed mostly cosmetic changes to make the code (hopefully) more readable. I also left one question above if you could look at it? once resolved that conversation I think we'll be ready to merge

@dnerini dnerini merged commit 10416e4 into pySTEPS:master Jun 10, 2024
8 checks passed
@ecasellas
Copy link

hi @dnerini, the code is far more readable now! Thank you very much for your reviews and changes, and for the opportunity to contribute to PySTEPS!

@dnerini
Copy link
Member

dnerini commented Jun 10, 2024

Thank you for completing the work of @simonbeylat and providing a great contribution to pysteps! OSS spirit at its best!

I'll need to look into a problem with the docs that are currently failing to compile, but I hope to be able to release a new version with your work soon!

dnerini added a commit that referenced this pull request Jun 28, 2024
dnerini added a commit that referenced this pull request Jul 2, 2024
* Update RainFARM example following #311
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants