Skip to content

Commit

Permalink
NAM precipitation (bcgov#3096)
Browse files Browse the repository at this point in the history
  • Loading branch information
dgboss authored Sep 7, 2023
1 parent f3a3e0c commit c2a85a3
Show file tree
Hide file tree
Showing 5 changed files with 307 additions and 78 deletions.
29 changes: 25 additions & 4 deletions api/app/jobs/common_model_fetchers.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,20 @@ def apply_data_retention_policy():
delete_weather_station_model_predictions(session, oldest_to_keep)


def accumulate_nam_precipitation(nam_cumulative_precip: list[float], prediction: ModelRunGridSubsetPrediction, model_run_hour: int):
""" Calculate overall cumulative precip and cumulative precip for the current prediction. """
# 00 and 12 hour model runs accumulate precipitation in 12 hour intervals, 06 and 18 hour accumulate in
# 3 hour intervals
nam_accumulation_interval = 3 if model_run_hour == 6 or model_run_hour == 18 else 12
cumulative_precip = nam_cumulative_precip
prediction_precip = prediction.apcp_sfc_0 or [0.0, 0.0, 0.0, 0.0]
current_precip = numpy.add(nam_cumulative_precip, prediction_precip)
if prediction.prediction_timestamp.hour % nam_accumulation_interval == 0:
# If we're on an 'accumulation interval', update the cumulative precip
cumulative_precip = current_precip
return (cumulative_precip, current_precip)


class ModelValueProcessor:
""" Iterate through model runs that have completed, and calculate the interpolated weather predictions.
"""
Expand All @@ -178,7 +192,7 @@ def __init__(self, session, station_source: StationSourceEnum = StationSourceEnu
self.stations = get_stations_synchronously(station_source)
self.station_count = len(self.stations)

def _process_model_run(self, model_run: PredictionModelRunTimestamp):
def _process_model_run(self, model_run: PredictionModelRunTimestamp, model_type: ModelEnum):
""" Interpolate predictions in the provided model run for all stations. """
logger.info('Interpolating values for model run: %s', model_run)
# Iterate through stations.
Expand All @@ -188,7 +202,7 @@ def _process_model_run(self, model_run: PredictionModelRunTimestamp):
index, self.station_count,
station.code, station.name)
# Process this model run for station.
self._process_model_run_for_station(model_run, station)
self._process_model_run_for_station(model_run, station, model_type)
# Commit all the weather station model predictions (it's fast if we line them all up and commit
# them in one go.)
logger.info('commit to database...')
Expand Down Expand Up @@ -320,7 +334,8 @@ def _calculate_delta_precip(self, station, model_run, prediction, station_predic

def _process_model_run_for_station(self,
model_run: PredictionModelRunTimestamp,
station: WeatherStation):
station: WeatherStation,
model_type: ModelEnum):
""" Process the model run for the provided station.
"""
# Extract the coordinate.
Expand Down Expand Up @@ -353,9 +368,15 @@ def _process_model_run_for_station(self,
query = get_model_run_predictions_for_grid(
self.session, model_run, grid)

nam_cumulative_precip = [0.0, 0.0, 0.0, 0.0]
# Iterate through all the predictions.
prev_prediction = None

for prediction in query:
# NAM model requires manual calculation of cumulative precip
if model_type == ModelEnum.NAM:
nam_cumulative_precip, prediction.apcp_sfc_0 = accumulate_nam_precipitation(
nam_cumulative_precip, prediction, model_run.prediction_run_timestamp.hour)
if (prev_prediction is not None
and prev_prediction.prediction_timestamp.hour == 18
and prediction.prediction_timestamp.hour == 21):
Expand Down Expand Up @@ -384,7 +405,7 @@ def process(self, model_type: ModelEnum):
logger.info('model %s', model)
logger.info('model_run %s', model_run)
# Process the model run.
self._process_model_run(model_run)
self._process_model_run(model_run, model_type)
# Mark the model run as interpolated.
self._mark_model_run_interpolated(model_run)
refresh_morecast2_materialized_view(self.session)
113 changes: 92 additions & 21 deletions api/app/jobs/model_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
- [Linear Interpolation](#linear-interpolation)
- [From grids to fire weather stations](#from-grids-to-fire-weather-stations)
- [To 12:00 PST (13:00 PDT)](#to-1200-pst-1300-pdt)
- [Calculating predicted 24 hour precipitation](#calculating-predicted-24-hour-precipitation)
- [Cronjobs](#cronjobs)


Expand Down Expand Up @@ -66,7 +67,7 @@ The GDPS, RDPS, and HRDPS models have different cycle lengths, different resolut

#### Relevant Weather Variables

The GDPS, RDPS, and HRDPS weather variables that we currently fetch for use in Morecast are `TMP_TGL_2m` (temperature), `RELH_TGL_2m` (relative humidity), `WIND_TGL_10m` (wind speed), `WDIR_TGL_10m` (wind direction), and `APCP_SFC_0` (accumulated precipitation).
The GDPS, RDPS, and HRDPS weather variables that we currently fetch for use in Morecast are `TMP_TGL_2m` (temperature), `RELH_TGL_2m` (relative humidity), `WIND_TGL_10m` (wind speed), `WDIR_TGL_10m` (wind direction), and `APCP_SFC_0` (accumulated precipitation since the start of the model run).

### NOAA

Expand All @@ -79,7 +80,7 @@ Bottom latitude: 48

#### GFS

We fetch GFS model data on a 0.25° scale (roughly equivalent to 24km at the centre of BC).
We fetch GFS model data on a 0.25° scale (roughly equivalent to 24km at the centre of BC).

##### Model Run Cycles

Expand All @@ -102,28 +103,86 @@ NAM model data is published on a Polar Stereographic projection at a 32km resolu

##### Model Run Cycles

The NAM model makes predictions for every hour for the first 36 hours, and then on a 3-hourly schedule to 84 hours [https://mag.ncep.noaa.gov/model-guidance-model-parameter.php?group=Model%20Guidance&model=NAM&area=NAMER&ps=area#]. As with the GFS model, we sometimes need to fetch data for 2 different hour offsets and then perform linear interpolation in order to have a weather prediction for 12:00 PST. The file naming convention that NAM uses indicates the model runtime as hours in UTC, and indexes that hour as 00. All subsequent hours in the model run are referenced as offsets of the 00 hour index. This means that for the 00:00 UTC model cycle, 20:00 UTC (or 12:00 PST) is referenced as hour 20; but for the 12:00 UTC model cycle, 20:00 UTC is referenced as hour 8 (20:00 - 12:00). Therefore, to fetch the noon PST data we need, the NAM cronjob pulls the following model hours:
The NAM model makes predictions for every hour for the first 36 hours, and then on a 3-hourly schedule to 84 hours [https://mag.ncep.noaa.gov/model-guidance-model-parameter.php?group=Model%20Guidance&model=NAM&area=NAMER&ps=area#]. As with the GFS model, we sometimes need to fetch data for 2 different hour offsets and then perform linear interpolation in order to have a weather prediction for 12:00 PST. The file naming convention that NAM uses indicates the model runtime as hours in UTC, and indexes that hour as 00. All subsequent hours in the model run are referenced as offsets of the 00 hour index. This means that for the 00:00 UTC model cycle, 20:00 UTC (or 12:00 PST) is referenced as hour 20; but for the 12:00 UTC model cycle, 20:00 UTC is referenced as hour 8 (20:00 - 12:00). Therefore, to fetch the noon PST data we need, the NAM cronjob pulls the following model hours. noon hours are equivalent to 20:00 UTC. before_noon hours are 18:00 UTC and used for linear interpolation along with after_noon hours which are 21:00 UTC. Accumulation hours represent the prediction hours required in oorder to calcaulte accumulative precipitation throughout the model run.

- For the 00:00 UTC model run time:
- Day 0: hour 20 (20:00 UTC = 12:00 PST)
- Day 1: hours 42 and 45, to interpolate for hour 44 (20:00 UTC)
- Day 2: hours 66 and 69, to interpolate for hour 68 (20:00 UTC)
- accumulation_hours = 0, 12, 24, 36, 48, 60
- noon = 20
- before_noon = 18, 42, 66
- after_noon = 21, 45, 69
- For the 06:00 UTC model run time:
- Day 0: hour 14 (6+14 = 20:00 UTC)
- Day 1: hours 36 and 39, to interpolate for hour 38 (38-24=14 hours, 14+06:00 = 20:00 UTC of the next day)
- Day 2: hours 60 and 63, to interpolate for hour 62 (20:00 UTC of 2 days later)
- accumulation_hours = 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66
- noon = 14
- before_noon = 12, 36, 60
- after_noon = 15, 39, 63
- For the 12:00 UTC model run time:
- Day 0: hour 8 (12+8 = 20:00 UTC)
- Day 1: hour 32 (12+32-24 = 20:00 UTC of the next day)
- Day 2: hours 54 and 57, to interpolate for hour 56 (56-48 hours = 8 hours, 8+12:00 = 20:00 UTC of 2 days later)
- accumulation_hours = 0, 12, 24, 36, 48, 60, 72
- noon = 8, 32
- before_noon = 6, 30, 54, 78
- after_noon = 9, 33, 57, 81
- For the 18:00 UTC model run time:
- Day 0: hour 2 (18+2 = 20:00 UTC)
- Day 1: hour 26 (18+26-24 = 20:00 UTC of the next day)
- Day 2: hours 48 and 51, to interpolate for hour 50 (50-48 hours = 2 hours, 2 + 18:00 = 20:00 UTC of 2 days later)
- accumulation_hours = 0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75
- noon = [2, 26]
- before_noon = [0, 24, 48, 72]
- after_noon = [3, 27, 51, 75]

Note that the accumulation hours are required in order for us to calculate the cumulative precipitation across the model run which is needed when we subsequently calculate cumulative 24 hour precipitation.

#### Relevant Weather Variables

To be consistent with the weather variables that we pull from the Environment Canada models, for GFS we use comparable weather variables. There is a slight exception for wind direction and speed: Env Canada models output the separate U and V components for wind, but they also output the calculated wind speed and direction using those U, V components. NOAA only outputs the U,V components for wind - we must do the wind direction and wind speed calculations ourselves.
To be consistent with the weather variables that we pull from the Environment Canada models, for GFS we use comparable weather variables. There is a slight exception for wind direction and speed: Env Canada models output the separate U and V components for wind, but they also output the calculated wind speed and direction using those U, V components. NOAA only outputs the U,V components for wind - we must do the wind direction and wind speed calculations ourselves. The NOAA grib files for GFS contain two layers related to cumulative precipitation. Both layers are named 'apcp_sfc_0' but accumulate precipitation differently. Raster band 6 accumulates precip in 3 hour increments for odd hours and six hour increments for even hours, but we actually ignore this band. We use band 7 which accumulats precipitation across the model run.

The NAM grib file does not contain a band for cumulative precipitation since the start of the model. This requires us to use band 6 which accumulates precipitation slightly differently for model run hours 00 and 12 versus 06 and 18. The 00 and 12 hour model runs contain cumulative precipitation in 12 hour cycles.

For the 00 and 12 hour model runs:

| Hour from start | Cumulative precip hours |
| --------------- | ----------------------- |
| 1 | 0 - 1 |
| 2 | 0 - 2 |
| 3 | 0 - 3 |
| ... | ... - ... |
| 12 | 0 - 12 |
| 13 | 12 - 13 |
| 14 | 12 - 14 |
| ... | ... - ... |
| 24 | 12 - 24 |
| 25 | 24 - 25 |
| ... | ... - ... |
| 36 | 24 - 36 |
| 39 | 36 - 39 |
| 42 | 36 - 42 |
| 45 | 36 - 45 |
| 48 | 36 - 48 |
| 51 | 48 - 51 |
| ... | ... - ... |

For the 06 and 18 hour model runs precipitation accumlates in 3 hour intervals:

| Hour from start | Cumulative precip hours |
| --------------- | ----------------------- |
| 1 | 0 - 1 |
| 2 | 0 - 2 |
| 3 | 0 - 3 |
| 4 | 3 - 4 |
| 5 | 3 - 5 |
| 6 | 3 - 6 |
| 7 | 6 - 7 |
| 8 | 6 - 8 |
| 9 | 6 - 9 |
| 10 | 9 - 10 |
| 11 | 9 - 11 |
| 12 | 9 - 12 |
| 13 | 12 - 13 |
| ... | ... - ... |
| 36 | 33 - 36 |
| 39 | 36 - 39 |
| 42 | 39 - 42 |
| 45 | 42 - 45 |
| 48 | 45 - 48 |
| ... | ... - ... |



## Downloading Data

Expand All @@ -135,22 +194,34 @@ There is a lot of data to download and analyze for these weather models, so to e

Once model data has been downloaded, we store the model data for "grid subsets" relevant to our use in our `wps` Postgres database in the `model_run_grid_subset_predictions` table. Each row in this table stores temperature, RH, precipitation, wind speed, and wind direction data, as we need for Morecast. For each of these weather variables, each row contains an array of 4 data points, corresponding to the 4 corners of one square of the grid. The `prediction_model_grid_subset_id` in this table corresponds to a grid cell that contains one of BC's fire weather station locations, as stored in the `prediction_model_grid_subsets` table. When our backend inserts data into the `model_run_grid_subset_predictions` table, it also adds metadata pertinent to the specific model run.

Once the `model_run_grid_subset_predictions` table has been populated for a given model run, our backend then performs linear interpolation based on the geographic location of each weather station to calculate an approximated value for each weather variable.
Our back-end used to perform spatial interpolation using the four values in the grid subset. We no longer follow this process. In order to align with SpotWX, we now use the value from the grid cell in the grib file in which the weather station is located. These values are stored in the `weather_station_model_predictions` table.

TODO - We need to amend our back-end code to only pull the necessary value from the grib files instead of pulling these grid subsets (4 values).

~~Once the `model_run_grid_subset_predictions` table has been populated for a given model run, our backend then performs linear interpolation based on the geographic location of each weather station to calculate an approximated value for each weather variable.~~

For example, a small portion of the grid for a temperature raster band of a GRIB file might look like the image below. In this example, each point on the grid is spaced 5km apart. You can see that a fire weather station is located within one cell of this grid, where the 4 corners of the cell have temperature values of 3.1°, 2.4°, 2.7°, and 2.7°C. In order to approximate the predicted temperature at the weather station's exact location, we perform linear interpolation of the 4 temperature values listed above, where the weight of each data point is inversely proportional to the distance from the weather station to that point on the grid. Consequently, the 2.4°C from the top right corner of the cell will have the highest weight as it is closest to the weather station, and the 2.7°C from the bottom left corner will have the lowest weight as it is the furthest away.
~~For example, a small portion of the grid for a temperature raster band of a GRIB file might look like the image below. In this example, each point on the grid is spaced 5km apart. You can see that a fire weather station is located within one cell of this grid, where the 4 corners of the cell have temperature values of 3.1°, 2.4°, 2.7°, and 2.7°C. In order to approximate the predicted temperature at the weather station's exact location, we perform linear interpolation of the 4 temperature values listed above, where the weight of each data point is inversely proportional to the distance from the weather station to that point on the grid. Consequently, the 2.4°C from the top right corner of the cell will have the highest weight as it is closest to the weather station, and the 2.7°C from the bottom left corner will have the lowest weight as it is the furthest away.~~

![Location-based linear interpolation for a weather station](../../../docs/images/Grid_wx_station.png)
~~![Location-based linear interpolation for a weather station](../../../docs/images/Grid_wx_station.png)~~

This linear interpolation process is repeated for each of the weather variables in a model run, and for each of the weather stations in BC. These calculations are stored in the `weather_station_model_predictions` table.
~~This linear interpolation process is repeated for each of the weather variables in a model run, and for each of the weather stations in BC. These calculations are stored in the `weather_station_model_predictions` table.~~

At the present time we are only performing linear interpolation from grid subsets to weather stations based on geographic distance (because it is the simplest method of interpolation). Forecasters have requested that we also perform linear interpolation based on station elevation as the results may be more accurate - this is a pending story in our backlog.
~~At the present time we are only performing linear interpolation from grid subsets to weather stations based on geographic distance (because it is the simplest method of interpolation).~~

Forecasters have requested that we also perform linear interpolation based on station elevation as the results may be more accurate - this is a pending story in our backlog.

### To 12:00 PST (13:00 PDT)

Fire weather forecasters in the BC Wildfire Service create daily "noon forecasts", where 'noon' refers to 12:00 PST, but during the fire weather season, most of BC is on Daylight Savings Time, so 12:00 PST = 13:00 PDT = 20:00 UTC. As explained above in the [NOAA section](#noaa), there is no model data specifically for 20:00 UTC, so instead we fetch the data for 18:00 and 21:00 UTC, and then perform additional linear interpolation (on top of the interpolation done for weather station locations) to predict weather behaviour for 20:00 UTC. This means that the modelled weather values for 21:00 UTC are weighted twice as heavily as those for 18:00 UTC, since the former is twice as close to our target time of 20:00.

This time-based linear interpolation is done as part of the data analysis process performed when the model data is retrieved from our third-party sources. See `process_grib.py`

### Calculating predicted 24 hour precipitation

For any given date, Morecast 2 displays the precipitation predicted to fall in the 24 hours previous to that date. For example, for August 30, 2023, the predicted precip covers a time period from August 29, 2023 20:00 UTC to August 30, 2023 20:00 UTC. The first 24 hours of a numerical weather model run present a challenge because there is no way to calculate the predicted 24 hour precip purely from the model run data. Consider a HRDPS model run on August 30 at 12:00 UTC. Morecast 2 needs to display the predicted precipitation for August 30 at 20:00, but at that point we only have 8 hours of accumulated precipitation (from 12:00 to 20:00) from our model run, we are potentially missing precipitation that could have fallen from Aug 29 20:00 to August 30 12:00 which means we're missing 16 hours of data. To work around this, we pull actual precip values for that 16 hour period from our hourly_actuals table which scrapes observed values from the WF1 API.

Continuing on with the above example, we also need to display 24 hour predicted cumulative precip for August 31 20:00. Since our model run contains data from more than 24 hours ago, we can perform simple subtraction of accumulated precipitation to get our 24 hour predicted cumulative precip (Aug 31 20:00 - Aug 30 20:00).

## Cronjobs

We have created a separate cronjob for each model, to retrieve the data from source at various times of day, depending on how often each model is run. Our cronjobs are set to execute at randomly selected minutes within the set hour, as a courtesy to our third-party data sources so that their servers are not overwhelmed with many simultaneous requests.
Loading

0 comments on commit c2a85a3

Please sign in to comment.