Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Model Ensembles #156

Open
KumaGIS opened this issue May 22, 2024 · 17 comments
Open

Generate Model Ensembles #156

KumaGIS opened this issue May 22, 2024 · 17 comments

Comments

@KumaGIS
Copy link

KumaGIS commented May 22, 2024

Dear Developers,
I am currently using the LISFLOOD model to assess discharge uncertainty for my study area. I have been working with the test dataset that you previously provided and attempting to perform stochastic modeling.

To begin, I wanted to evaluate the uncertainty related to the forcing data. Specifically, I tried to add random Gaussian noise to the precipitation data by modifying the readmeteo.py module's source code. Despite adding the random noise to the precipitation variable, the resultant discharge values do not show any variation.

I have updated the settings.xml file to enable the Monte Carlo Simulation option. Additionally, I explicitly initialized the constructor of the Lisflood_MonteCarlo.py within the LisfloodModel() in main.py.

The sample directories were created as specified in the EnsMembers section of the settings.xml file, and dis_run.tss files were generated within each folder. However, when I plot these files, there is no variation in the discharge values.

The following code was added to the readmeteo.py module to perturb the precipitation data

mean = 1
std_dev = 0.15
netcdf_noise = np.random.normal(mean , std_dev , self.var.Precipitation.shape)
    
# Adding noice to the precipitation data
self.var.Precipitation = self.var.Precipitation + netcdf_noise
Ensure that no precipitation values are below 0
self.var.Precipitation = np.maximum(self.var.Precipitation, 0)

After applying some random noise as above I checked whether the precipitation values were changed or not. They were successfully perturbed.

I executed only the run_lat_lon.xml file with the perturbed precipitation data. Was that the reason for getting similar discharge output time series values, since it uses the same initial conditions (lzavin.map and avgdis.map) which were created by the pre_run_lat_lon.xml ? Or Do I have to execute pre_run_lat_lon.xml with the Monte Carlo Simulation?

Could you please tell me, what can be the possible issue here and how to solve this? I would greatly appreciate any help or suggestions you can offer to resolve this issue.

Looking forward to hearing from you soon.

Best Regards

@KumaGIS KumaGIS changed the title Stochastic Modelling Generate Model Ensembles May 24, 2024
@StefaniaGrimaldi
Copy link
Contributor

Dear @KumaGIS ,

Thanks for your enquiry.

The MonteCarlo module is not currently used within our operational workflow. Therefore, we cannot guarantee systematic assistance on such type of investigations. Nevertheless, we will do our best to provide support and we hope that, by joining our forces, we will succeed in the analysis!

If our understanding is correct, discharge values in each output folder are bit identical, despite the edits that you added to the source code.

Generally speaking, prerun and run must always use the same meteorological forcings, static maps, and parameters. The prerun computes the initial conditions for the run: https://ec-jrc.github.io/lisflood-code/3_step5_model-initialisation/.
Albeit this is true and required for the correct use of LISFLOOD, it is unlikely the cause of your issue.

We read in your message that you verified the expected changes in the precipitation. How was this verification done? We would like to recommend activating the following output flags in the lisflood xml settings file.

<setoption choice="1" name="repPrecipitationMaps"/>
<setoption choice="1" name="repRainMaps"/>
<setoption choice="1" name="repSnowMaps"/>

You will then find in each output folder the precipitation data used within the simulation: total precipitation, and the segregated components rain and snow. You might need to check whether the options above are already implemented in your .xml settings: you can find an example in https://github.com/ec-jrc/lisflood-code/blob/master/src/lisfloodSettings_reference.xml LINES 100 and 3544. Please do not hesitate to let us know whether you need help with this task.

Furthermore, could you please let us know at which line of the readmeteo.py you added your edits? This information will allow us to try to reproduce your issue.

Finally, are you running the python package or the source code (python src/lisf1.py settings.xml)?

Kind regards,
on behalf of the developers team,
Carlo and Stefania

@KumaGIS
Copy link
Author

KumaGIS commented May 27, 2024

Dear @StefaniaGrimaldi ,
Thank you for your response. I greatly appreciate your reply, even though this question is related to a part that is not currently in use.

  1. I use the source code (python src/lisf1.py settings.xml) for testing purposes. Following your instructions, I enabled the specified options and ran the model. However, as soon as the first simulation ended, the model terminated automatically, and I received the following error message:

File "/home/kumudu/LISFLOOD/lisflood-code/src/lisflood/global_modules/output.py", line 298, in write
return self.writer.write(self._start_date, self._rep_steps)
File "/home/kumudu/LISFLOOD/lisflood-code/src/lisflood/global_modules/output.py", line 159, in write
nf1.variables[self.map_name][step, :, :] = uncompress_array(data)
File "src/netCDF4/_netCDF4.pyx", line 5505, in netCDF4._netCDF4.Variable.setitem
File "src/netCDF4/_netCDF4.pyx", line 5788, in netCDF4._netCDF4.Variable._put
File "src/netCDF4/_netCDF4.pyx", line 2029, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Index exceeds dimension bound.

When I checked the output folders, I noticed that the output rainfall maps and discharge time series data were generated only in the first folder.
Because of this, I used the tp.nc precipitation NetCDF stack file separately and applied the perturbations as mentioned earlier. I then plotted the time series values at the outlet point to verify whether the rainfall values were correctly perturbed. By plotting the time series values of the perturbed rainfall data, I found that each Monte Carlo run generated different precipitation values. However, the issue was that despite the precipitation values being perturbed, this variation was not reflected in the discharge time series values in each folder.

  1. I modified the readmeteo.py module by adding the perturbation part at the end of the dynamic() method.

def dynamic(self):
""" dynamic part of the readmeteo module
read meteo input maps
"""
settings = LisSettings.instance()
option = settings.options
binding = settings.binding

    # ************************************************************
    # ***** READ METEOROLOGICAL DATA *****************************
    # ************************************************************
    if option['readNetcdfStack']:

        step = self.var.currentTimeStep() - self.var.firstTimeStep()
        # date = date_from_step(binding, self.var.currentTimeStep())

        # Read from NetCDF stack files
        
        self.var.Precipitation = self.forcings['PrecipitationMaps'][step] * self.var.DtDay * self.var.PrScaling
        self.var.Tavg = self.forcings['TavgMaps'][step]
        self.var.ETRef = self.forcings['ET0Maps'][step] * self.var.DtDay * self.var.CalEvaporation
        self.var.EWRef =self.forcings['E0Maps'][step] * self.var.DtDay * self.var.CalEvaporation
        

    else:
        # Read from stack of maps in Pcraster format
        
        self.var.Precipitation = readmapsparse(binding['PrecipitationMaps'], self.var.currentTimeStep(), self.var.Precipitation) * self.var.DtDay * self.var.PrScaling
        # precipitation (conversion to [mm] per time step)
        self.var.Tavg = readmapsparse(binding['TavgMaps'], self.var.currentTimeStep(), self.var.Tavg)
        # average DAILY temperature (even if you are running the model on say an hourly time step) [degrees C]
        self.var.ETRef = readmapsparse(binding['ET0Maps'], self.var.currentTimeStep(), self.var.ETRef) * self.var.DtDay * self.var.CalEvaporation
        # daily reference evapotranspiration (conversion to [mm] per time step)
        # potential evaporation rate from a bare soil surface (conversion to [mm] per time step)
        self.var.EWRef = readmapsparse(binding['E0Maps'], self.var.currentTimeStep(), self.var.EWRef) * self.var.DtDay * self.var.CalEvaporation
        # potential evaporation rate from water surface (conversion to [mm] per time step)
        

        
    
    self.var.ESRef = (self.var.EWRef + self.var.ETRef)/2

    if option['TemperatureInKelvin']:
        self.var.Tavg -= 273.15
    
    # Generate random noise with mean 1 and standard deviation 0.1
    # Initialize array to store perturbed data


    # *************************************************************
    # ****** ADD RANDOM NOISE TO METEOROLOGICAL DATA **************
    # *************************************************************
    # Generate random noise with mean 1 and standard deviation 0.1
    mean_pr = 1
    std_dev_pr = 0.15
    
    pr_noise = np.random.normal(mean_pr, std_dev_pr, size=self.var.Precipitation.shape)
    # Adding noice to the precipitation data
    self.var.Precipitation = self.var.Precipitation * pr_noise 
    # Ensure that no precipitation values are below 0
    self.var.Precipitation = np.maximum(self.var.Precipitation, 0)

I understand that you might not be able to provide full support for this, as it is not currently in use. However, if you modify the readmeteo.py file as described above, you could potentially check the output files on your end.
I would greatly appreciate any help or suggestions you can offer to resolve this issue.
Looking forward to hearing from you soon.
Thank you
Best Regards

@KumaGIS
Copy link
Author

KumaGIS commented May 28, 2024

Dear developers,
If above implementation is wrong, could you please tell me what should be the correct steps to enable the montecarlo simulation ? If you are unable to provide detailed information, could you please tell me which parts of the source code should be updated? Then i can work on that.

I would greatly appreciate any help or suggestions you can offer to resolve this issue.
Looking forward to hearing from you soon.
Thank you

@doc78
Copy link
Collaborator

doc78 commented May 29, 2024

Dear @KumaGIS

Can you please provide us your settings XML file and your full output messages on screen when you try to run Lisflood enabling MonteCarlo option?

Thanks
Carlo

@KumaGIS
Copy link
Author

KumaGIS commented Jun 2, 2024

Dear @doc78,
As requested, I have attached the settings XML file used to simulate the LF_lat_lon use case. I encountered several errors but was able to resolve them one by one. Additionally, I have attached a small PDF document with screenshots detailing the errors for your reference.

I currently have the following questions regarding the Monte Carlo simulation:

The Monte Carlo simulation is now working. Initially, I ran the simulation with 10 ensemble members without applying any perturbations, and I enabled additional options such as repSnowMaps and repRainMaps. The time series data files ‘dis_run.tss’ and ‘chanqWin.tss’ were successfully created within each sample folder. However, the rain.nc and snow.nc maps were only created in the first sample folder. Why are these maps not being created in the other sample folders?

In the research paper https://doi.org/10.1016/j.jhydrol.2016.10.041, the LISFLOOD model was successfully used to generate open-loop deterministic simulations by applying random sampling perturbations to the precipitation inputs to obtain probabilistic LISFLOOD simulations with 24 ensembles. Specifically, the precipitation forcing was scaled between 0 and 100 and perturbed with white noise with a mean of 1 and a standard deviation of 0.15 to prevent ensemble deterioration. Could you please advise on how to implement something similar? I understand you might not be able to provide the entire solution, but could you at least outline the main steps?

I would greatly appreciate any help or suggestions you can offer to resolve this issue.
Looking forward to hearing from you soon.

MC_Settings_XML.zip

@KumaGIS
Copy link
Author

KumaGIS commented Jun 4, 2024

dear @doc78, @StefaniaGrimaldi,
I successfully generated the ensemble simulations. However, there's an issue with the output: while the time series data is correctly created in the designated sample folders, the maps are only generated in the first sample folder. Is there a way to resolve this? I attempted to modify the output.py source code, but it still didn't work.

I would greatly appreciate any help or suggestions you can offer to resolve this issue.
Looking forward to hearing from you soon.

@doc78
Copy link
Collaborator

doc78 commented Jun 5, 2024

Dear @KumaGIS
Unfortunately the new output maps module is not compatible with the MonteCarlo option.
A quick workaround would be to re-initialize the output_maps instance at the first step of each dynamic loop, by adding the following lines after Line 57 in Lisflood_dynamic.py:

       if self.TimeSinceStart == 1 and settings.mc_set:
            # reset flag for netcdf output for all, steps and end
            _ = CDFFlags(uuid.uuid4())  # init CDF flags
           # initialize output maps again to match new folders in MonteCarlo simulation
           self.output_module.output_maps = OutputMapsFactory(self.output_module.var)

Hope it helps to continue your experiments.
Best Regards
Carlo

@KumaGIS
Copy link
Author

KumaGIS commented Jun 7, 2024

Dear @doc78,
Thank you for the update. Now it is working. I have one more question.
Could you please let me know, which version of the LISFLOOD model finally supported data assimilation using EnKF?

@koray-yilmaz
Copy link

Dear @KumaGIS,
I think it would be great help for the community if you could summarize the steps to allow Precipitation perturbation and ensemble LISFLOOD output. Looking forward to the discussions on data assimilation! Best.

@KumaGIS
Copy link
Author

KumaGIS commented Jun 17, 2024

Dear @StefaniaGrimaldi, @doc78
Sure @koray-yilmaz. I will try my best to contribute to the data assimilation part.

I have one final question: What was the last version of the Lisflood model that supported the data assimilation framework?

@StefaniaGrimaldi
Copy link
Contributor

Dear @KumaGIS,

the work on data assimilation with LISFLOOD has been performed quite some time ago: before the introduction of the versioning of the source code and before its open source publication. We do not have the source code that supported the data assimilation framework. We only have the part of the code that is still included in this repository (and which is not supported in the current version due to time and resources constraints).

Your willingness to develop your own data assimilation routine and to share your developments is highly appreciated! Thank you!

Kind regards,
on behalf of the developers team,
Stefania

@KumaGIS
Copy link
Author

KumaGIS commented Jun 24, 2024

Dear @doc78 @StefaniaGrimaldi @domeniconappo
Thank you for your support until now. I am a PhD student and I believe the Lisflood model can be used for data assimilation if I work on the model for a bit longer. I have a humble request. I will do the hard work and try to make the data assimilation framework work again. However, sometimes I need some guidance because certain parts of the model are a bit difficult to understand.

I don't have a team to work with and have to do everything alone. If I face some difficulties, I will ask for your assistance, and if possible, please try to help me however you can.

Thank you for your understanding and support
Best regards

@KumaGIS
Copy link
Author

KumaGIS commented Jun 28, 2024

Dear @doc78 @StefaniaGrimaldi @domeniconappo,

I appreciate your support until now. I have one thing to clarify. When it comes to assimilating streamflow data (discharge data), which state variable should I choose: DischargeMaps (ChanQAvg), ChanQState (ChanQ), or another state variable?

I would greatly appreciate any help or suggestions you can offer regarding this matter.
I look forward to hearing from you soon.

Thank you
Best Regards

@KumaGIS
Copy link
Author

KumaGIS commented Jul 16, 2024

Dear @doc78 @StefaniaGrimaldi @domeniconappo,

I appreciate your support until now. I have one thing to clarify. When it comes to assimilating streamflow data (discharge data), which state variable should I choose: DischargeMaps (ChanQAvg), ChanQState (ChanQ), or another state variable?

I would greatly appreciate any help or suggestions you can offer regarding this matter.
I look forward to hearing from you soon.

Thank you
Best Regard

@StefaniaGrimaldi
Copy link
Contributor

Dear @KumaGIS,

DischargeMaps (ChanQAvg) is the average discharge value in the time interval DtSec.
ChanQState (ChanQ) is the discharge value of the last DtSeChannel.

As an example, if your model use daily time steps (DtSec = 86400), and 6hourly computational steps for the routing module (DtSeChannel = 21600), DischargeMaps (ChanQAvg) represent the modelled daily average discharge values.

We hope that our answer helps,
on behalf of the developers team,
kind regards,
Stefania

@KumaGIS
Copy link
Author

KumaGIS commented Jul 18, 2024

Dear @StefaniaGrimaldi , @doc78 ,

Thank you for the quick response. That makes sense.

However, I noticed that "ChanQAvg," which I should update as the state variable, does not appear among the relevant state variables in the stateVar.py module.

In your user documentation, both ChanQ and ChanQAvg are listed as state variables. Yet, the stateVar.py module only includes ChanQ. Since we are assimilating streamflow data using ChanQAvg/DischargeMaps as the state variable, shouldn't it be included in the stateVar.py module? Or can we simply add the ChanQAvg statevariable into the statVar.py module?

I would greatly appreciate any assistance or suggestions you can provide on this matter. I look forward to your response.

Thank you.

@KumaGIS
Copy link
Author

KumaGIS commented Jul 18, 2024

Dear @StefaniaGrimaldi , @doc78

I have a question regarding the final state update process. I developed a sample Lisflood_EnKF.py script and within the resume() method in the EnKF.py module, we update the model state (discharge) based on the estimated model state (calculated using EnKF). After updating, we need to replace the old model state with the updated one right?

In this context, I am updating the model state (ChanQAvg), which is indicated by "dis". How can I replace the old "dis" state map with the updated "dis" state map when the stateVar.py module does not include the ChanQAvg state variable?

I hope my question is clear. I would greatly appreciate any assistance or suggestions you can provide on this matter. I look forward to your response.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants