This repository contains an intake catalogue for acessing data from the EUREC4A field campaign stored on: 1) AERIS and 2) Munich University (via OPeNDAP) and 3) a zarr-backed object-store (using minio) at https://minio.denby.eu and 4) OPeNDAP access to files at NOAA's Physical Sciences Lab and 5) data linked via IPFS.
To use you will need to install intake
, xarray
, intake-xarray
,
zarr
, pydap
, requests
and s3fs
pip install intake xarray intake-xarray zarr pydap s3fs requests
The catalogue (and underlying data) can then be accessed directly from python:
> from intake import open_catalog
> cat = open_catalog("https://raw.githubusercontent.com/eurec4a/eurec4a-intake/master/catalog.yml")
You can list the available sources with:
>> list(cat)
['radiosondes', 'barbados', 'dropsondes', 'halo', 'p3', 'specmacs']
>> list(cat.radiosondes)
['atalante_meteomodem',
'atalante_vaisala',
'bco',
'meteor',
'ms_merian',
'ronbrown']
Then load up a dask-backed xarray.Dataset
so
that you have access to all the available variables and attributes in the
dataset:
>> ds = cat.radiosondes.ronbrown.to_dask()
>> ds
<xarray.Dataset>
Dimensions: (alt: 3100, nv: 2, sounding: 329)
Coordinates:
* alt (alt) int16 0 10 20 30 40 50 ... 30950 30960 30970 30980 30990
flight_time (sounding, alt) datetime64[ns] dask.array<chunksize=(83, 775), meta=np.ndarray>
lat (sounding, alt) float32 dask.array<chunksize=(83, 1550), meta=np.ndarray>
lon (sounding, alt) float32 dask.array<chunksize=(83, 1550), meta=np.ndarray>
sounding_id (sounding) |S1000 dask.array<chunksize=(165,), meta=np.ndarray>
Dimensions without coordinates: nv, sounding
Data variables:
N_gps (sounding, alt) float32 dask.array<chunksize=(83, 1550), meta=np.ndarray>
N_ptu (sounding, alt) float32 dask.array<chunksize=(83, 1550), meta=np.ndarray>
alt_bnds (alt, nv) int16 dask.array<chunksize=(3100, 2), meta=np.ndarray>
...
You can then slice and access the data as if you had it available locally
If you would like to add a data source please fork this repository, follow the intake documentation to create an entry in catalog.yml (or a separate yaml-file if you are adding many new data sources) and finally make a pull-request. Tests are automatically run on pull-requests to ensure that all defined data sources can be accessed.