-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
53 changed files
with
897 additions
and
1,096 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,20 @@ | ||
.. _filters: | ||
|
||
######### | ||
Filters | ||
######### | ||
Filters | ||
======= | ||
|
||
.. warning:: | ||
|
||
This is still a work in progress. Some of the filters may be renamed | ||
later. | ||
This is still a work in progress. Some of the filters may be renamed later. | ||
|
||
Filters are used to modify the data or metadata in a dataset. | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
:maxdepth: 1 | ||
|
||
filters/select | ||
filters/rename | ||
filters/rotate_winds | ||
filters/unrotate_winds | ||
filters/noop | ||
filters/empty | ||
filters/select | ||
filters/rename | ||
filters/rotate_winds | ||
filters/unrotate_winds | ||
filters/noop | ||
filters/empty |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,5 @@ | ||
####### | ||
empty | ||
####### | ||
empty | ||
===== | ||
|
||
The ``empty`` filter is for debugging purposes. It always returns an | ||
empty set of fields. | ||
The ``empty`` filter is for debugging purposes. It always returns an empty set of | ||
fields. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,4 @@ | ||
###### | ||
noop | ||
###### | ||
noop | ||
==== | ||
|
||
The ``noop`` filter is for debugging purposes. It returns its input | ||
unchanged. | ||
The ``noop`` filter is for debugging purposes. It returns its input unchanged. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,2 @@ | ||
######## | ||
rename | ||
######## | ||
rename | ||
====== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,2 @@ | ||
############## | ||
rotate_winds | ||
############## | ||
rotate_winds | ||
============ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,2 @@ | ||
######## | ||
select | ||
######## | ||
select | ||
====== |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,2 @@ | ||
############### | ||
unrotate_wind | ||
############### | ||
unrotate_wind | ||
============= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,22 @@ | ||
######################## | ||
Handling missing dates | ||
######################## | ||
Handling missing dates | ||
====================== | ||
|
||
By default, the package will raise an error if there are missing dates. | ||
|
||
Missing dates can be handled by specifying a list of dates in the | ||
configuration file. The dates should be in the same format as the dates | ||
in the time series. The missing dates will be filled ``np.nan`` values. | ||
Missing dates can be handled by specifying a list of dates in the configuration file. | ||
The dates should be in the same format as the dates in the time series. The missing | ||
dates will be filled ``np.nan`` values. | ||
|
||
.. literalinclude:: yaml/missing_dates.yaml | ||
:language: yaml | ||
:language: yaml | ||
|
||
*Anemoi* will ignore the missing dates when computing the | ||
:ref:`statistics <gathering_statistics>`. | ||
*Anemoi* will ignore the missing dates when computing the :ref:`statistics | ||
<gathering_statistics>`. | ||
|
||
You can retrieve the list indices corresponding to the missing dates by | ||
accessing the ``missing`` attribute of the dataset object. | ||
You can retrieve the list indices corresponding to the missing dates by accessing the | ||
``missing`` attribute of the dataset object. | ||
|
||
.. literalinclude:: ../using/code/missing_.py | ||
:language: python | ||
:language: python | ||
|
||
If you access a missing index, the dataset will throw a | ||
``MissingDateError``. | ||
If you access a missing index, the dataset will throw a ``MissingDateError``. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,14 @@ | ||
######################### | ||
Handling missing values | ||
######################### | ||
Handling missing values | ||
======================= | ||
|
||
When handling data for machine learning models, missing values (NaNs) | ||
can pose a challenge, as models require complete data to operate | ||
effectively and may crash otherwise. Ideally, we anticipate having | ||
complete data in all fields. However, there are scenarios where NaNs | ||
naturally occur, such as with variables only relevant on land or at sea | ||
(such as sea surface temperature (`sst`), for example). In such cases, | ||
the default behavior is to reject data with NaNs as invalid. To | ||
accommodate NaNs and accurately compute statistics based on them, you | ||
can include the `allow_nans` key in the configuration. Here's an example | ||
of how to implement it: | ||
When handling data for machine learning models, missing values (NaNs) can pose a | ||
challenge, as models require complete data to operate effectively and may crash | ||
otherwise. Ideally, we anticipate having complete data in all fields. However, there are | ||
scenarios where NaNs naturally occur, such as with variables only relevant on land or at | ||
sea (such as sea surface temperature (`sst`), for example). In such cases, the default | ||
behavior is to reject data with NaNs as invalid. To accommodate NaNs and accurately | ||
compute statistics based on them, you can include the `allow_nans` key in the | ||
configuration. Here's an example of how to implement it: | ||
|
||
.. literalinclude:: yaml/nan.yaml | ||
:language: yaml | ||
:language: yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,154 +1,138 @@ | ||
.. _building-introduction: | ||
|
||
############## | ||
Introduction | ||
############## | ||
|
||
The `anemoi-datasets` package allows you to create datasets for training | ||
data-driven weather models. The datasets are built using a `recipe` | ||
file, which is a YAML file that describes sources of meteorological | ||
fields as well as the operations to perform on them, before they are | ||
written to a zarr file. The input of the process is a range of dates and | ||
some options to control the layout of the output. Statistics will be | ||
computed as the dataset is build, and stored in the metadata, with other | ||
information such as the the locations of the grid points, the list of | ||
variables, etc. | ||
Introduction | ||
============ | ||
|
||
The `anemoi-datasets` package allows you to create datasets for training data-driven | ||
weather models. The datasets are built using a `recipe` file, which is a YAML file that | ||
describes sources of meteorological fields as well as the operations to perform on them, | ||
before they are written to a zarr file. The input of the process is a range of dates and | ||
some options to control the layout of the output. Statistics will be computed as the | ||
dataset is build, and stored in the metadata, with other information such as the the | ||
locations of the grid points, the list of variables, etc. | ||
|
||
.. figure:: ../schemas/recipe.png | ||
:alt: Building datasets | ||
:align: center | ||
:alt: Building datasets | ||
:align: center | ||
|
||
********** | ||
Concepts | ||
********** | ||
Concepts | ||
-------- | ||
|
||
date | ||
Throughout this document, the term `date` refers to a date and time, | ||
not just a date. A training dataset is covers a continuous range of | ||
dates with a given frequency. Missing dates are still part of the | ||
dataset, but the data are missing and marked as such using NaNs. | ||
Dates are always in UTC, and refer to date at which the data is | ||
valid. For accumulations and fluxes, that would be the end of the | ||
accumulation period. | ||
Throughout this document, the term `date` refers to a date and time, not just a | ||
date. A training dataset is covers a continuous range of dates with a given | ||
frequency. Missing dates are still part of the dataset, but the data are missing and | ||
marked as such using NaNs. Dates are always in UTC, and refer to date at which the | ||
data is valid. For accumulations and fluxes, that would be the end of the | ||
accumulation period. | ||
|
||
variable | ||
A `variable` is meteorological parameter, such as temperature, wind, | ||
etc. Multilevel parameters are treated as separate variables, one for | ||
each level. For example, temperature at 850 hPa and temperature at | ||
500 hPa will be treated as two separate variables (`t_850` and | ||
`t_500`). | ||
A `variable` is meteorological parameter, such as temperature, wind, etc. Multilevel | ||
parameters are treated as separate variables, one for each level. For example, | ||
temperature at 850 hPa and temperature at 500 hPa will be treated as two separate | ||
variables (`t_850` and `t_500`). | ||
|
||
field | ||
A `field` is a variable at a given date. It is represented by a array | ||
of values at each grid point. | ||
A `field` is a variable at a given date. It is represented by a array of values at | ||
each grid point. | ||
|
||
source | ||
The `source` is a software component that given a list of dates and | ||
variables will return the corresponding fields. A example of source | ||
is ECMWF's MARS archive, a collection of GRIB or NetCDF files, a | ||
database, etc. See :ref:`sources` for more information. | ||
The `source` is a software component that given a list of dates and variables will | ||
return the corresponding fields. A example of source is ECMWF's MARS archive, a | ||
collection of GRIB or NetCDF files, a database, etc. See :ref:`sources` for more | ||
information. | ||
|
||
filter | ||
A `filter` is a software component that takes as input the output of | ||
a source or the output of another filter can modify the fields and/or | ||
their metadata. For example, typical filters are interpolations, | ||
renaming of variables, etc. See :ref:`filters` for more information. | ||
A `filter` is a software component that takes as input the output of a source or the | ||
output of another filter can modify the fields and/or their metadata. For example, | ||
typical filters are interpolations, renaming of variables, etc. See :ref:`filters` | ||
for more information. | ||
|
||
************ | ||
Operations | ||
************ | ||
Operations | ||
---------- | ||
|
||
In order to build a training dataset, sources and filters are combined | ||
using the following operations: | ||
In order to build a training dataset, sources and filters are combined using the | ||
following operations: | ||
|
||
join | ||
The join is the process of combining several sources data. Each | ||
source is expected to provide different variables at the same dates. | ||
The join is the process of combining several sources data. Each source is expected | ||
to provide different variables at the same dates. | ||
|
||
pipe | ||
The pipe is the process of transforming fields using filters. The | ||
first step of a pipe is typically a source, a join or another pipe. | ||
The following steps are filters. | ||
The pipe is the process of transforming fields using filters. The first step of a | ||
pipe is typically a source, a join or another pipe. The following steps are filters. | ||
|
||
concat | ||
The concatenation is the process of combining different sets of | ||
operation that handle different dates. This is typically used to | ||
build a dataset that spans several years, when the several sources | ||
are involved, each providing a different period. | ||
The concatenation is the process of combining different sets of operation that | ||
handle different dates. This is typically used to build a dataset that spans several | ||
years, when the several sources are involved, each providing a different period. | ||
|
||
Each operation is considered as a :ref:`source <sources>`, therefore | ||
operations can be combined to build complex datasets. | ||
Each operation is considered as a :ref:`source <sources>`, therefore operations can be | ||
combined to build complex datasets. | ||
|
||
***************** | ||
Getting started | ||
***************** | ||
Getting started | ||
--------------- | ||
|
||
First recipe | ||
============ | ||
~~~~~~~~~~~~ | ||
|
||
The simplest `recipe` file must contain a ``dates`` section and an | ||
``input`` section. The latter must contain a `source` In that case, the | ||
source is ``mars`` | ||
The simplest `recipe` file must contain a ``dates`` section and an ``input`` section. | ||
The latter must contain a `source` In that case, the source is ``mars`` | ||
|
||
.. literalinclude:: yaml/building1.yaml | ||
:language: yaml | ||
:language: yaml | ||
|
||
To create the dataset, run the following command: | ||
|
||
.. code:: console | ||
.. code-block:: console | ||
$ anemoi-datasets create recipe.yaml dataset.zarr | ||
$ anemoi-datasets create recipe.yaml dataset.zarr | ||
Once the build is complete, you can inspect the dataset using the | ||
following command: | ||
Once the build is complete, you can inspect the dataset using the following command: | ||
|
||
.. code:: console | ||
.. code-block:: console | ||
$ anemoi-datasets inspect dataset.zarr | ||
$ anemoi-datasets inspect dataset.zarr | ||
.. literalinclude:: yaml/building1.txt | ||
:language: console | ||
:language: console | ||
|
||
Adding a second source | ||
====================== | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
To add a second source, you need to use the ``join`` operation. In that | ||
example, we add pressure level variables to the previous example: | ||
To add a second source, you need to use the ``join`` operation. In that example, we add | ||
pressure level variables to the previous example: | ||
|
||
.. literalinclude:: yaml/building2.yaml | ||
:language: yaml | ||
:language: yaml | ||
|
||
This will build the following dataset: | ||
|
||
.. literalinclude:: yaml/building2.txt | ||
:language: console | ||
:language: console | ||
|
||
.. note:: | ||
|
||
Please note that the pressure levels parameters are named | ||
`param_level`. This is the default behaviour. See | ||
:ref:`remapping_option` for more information. | ||
Please note that the pressure levels parameters are named `param_level`. This is the | ||
default behaviour. See :ref:`remapping_option` for more information. | ||
|
||
Adding some forcing variables | ||
============================= | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
When training a data-driven models, some forcing variables may be | ||
required such as the solar radiation, the time of day, the day in the | ||
year, etc. | ||
When training a data-driven models, some forcing variables may be required such as the | ||
solar radiation, the time of day, the day in the year, etc. | ||
|
||
These are provided by the ``forcings`` source. In that example, we add a | ||
few of them. The `template` option is used to point to another source, | ||
in that case the first instance of ``mars``. This source is used to get | ||
information about the grid points, as some of the forcing variables are | ||
grid dependent. | ||
These are provided by the ``forcings`` source. In that example, we add a few of them. | ||
The `template` option is used to point to another source, in that case the first | ||
instance of ``mars``. This source is used to get information about the grid points, as | ||
some of the forcing variables are grid dependent. | ||
|
||
.. literalinclude:: yaml/building3.yaml | ||
:language: yaml | ||
:language: yaml | ||
|
||
This will build the following dataset: | ||
|
||
.. literalinclude:: yaml/building3.txt | ||
:language: console | ||
:language: console | ||
|
||
See :ref:`forcing_variables` for more information about forcing | ||
variables. | ||
See :ref:`forcing_variables` for more information about forcing variables. |
Oops, something went wrong.