-
Notifications
You must be signed in to change notification settings - Fork 5
Introduction of Reading and Writing NetCDF files
Please find the French version of this page here.
Veuillez trouver la version française de cette page ici.
This is an introduction to create NetCDF files with Matlab, Octave, and Python. The emphasize here is on setting dimensions, variables, and attributes such that they fulfil the NetCDF standard called CF convention. Files following this convention are supported by most tools to visualize and perturb NetCDF files.
Tools to visualize NetCDF files are, for example, Panoply (all OS), NcView (only UNIX), QGIS (all OS), ArcGIS (only Windows). Possibilities to read, write and perturb NetCDF files are Matlab, Octave, Python, Fortran, C, etc. There are also helpful command line toolboxes that allow to change NetCDF files such as CDO and NCO.
For more helpful scripts on NetCDF and shape files also check out the Candex library developed by Shervan Gharari (U of Sask).
Structure of this article:
- Scripts used for examples
- Example
- Write NetCDF files in Matlab or Octave
- Write NetCDF files in Python
- Read NetCDF files in Matlab or Octave
- Read NetCDF files in Python
- Read NetCDF files in R
- Check NetCDF file against CF-Convention
All the scripts that will be used for the examples below are available in the utility scripts. The following sections will explain them step-by-step.
- write with:
- read with:
Let's assume a field of precipitation data called pre
. Our domain is X=4
(columns) by Y=2
(rows) representing 8 grid cells. The precipitation time series contains T=3
time points:
pre = [ % timestep t=1: 2018-01-01 00:00:00
[ [ -9999.0, 1.0, 2.0, 1.0 ],
[ 1.0, 1.0, 0.0, 0.0 ] ],
% timestep t=2: 2018-01-01 06:00:00
[ [ -9999.0, 0.0, 0.0, 0.0 ],
[ 0.0, 0.0, 0.0, 0.0 ] ],
% timestep t=3: 2018-01-01 18:00:00
[ [ -9999.0, 4.0, 4.0, 1.0 ],
[ 1.0, 1.0, -9999.0, 0.0 ] ] ]
Apart from the data, in our case pre
, other related variables can be defined in a NetCDF. As an example, the latitude and longitude of the domains for pre
can be specified as:
lat = [ [ 50.0, 50.0, 50.0, 50.0 ],
[ 49.0, 49.0, 49.0, 49.0 ] ]
lon = [ [ -110.0, -109.0, -108.0, -107.0 ],
[ -110.0, -109.0, -108.0, -107.0 ] ]
The latitude and longitude values are not necessarily needed to be in regular format as this example provides.
Other variable related to cells such as grid_ID
can be specify in the NetCDF file.
grid_ID = [ [ 120.0, 121.0, 125.0, 130.0 ],
[ 122.0, 124.0, 140.0, 131.0 ] ]
Here we provide a simple script for writing a NetCDF with above mentioned examples. Readers can just copy past the scripts part by part into Matlab Workspace and follow the progress.
The commands are all for Matlab but can be used in Octave as well. In Octave you have to replace all netcdf.<function>
commands to netcdf_<function>
.
The full script is available as link or download. We will now go step-by-step through that script.
First, we need to load some modules and initialize the variables in Matlab/Octave.
close all;
clear all;
clc;
pkg load io; % only in Octave
pkg load netcdf; % only in Octave
T_data = [ 0; 6; 18];
lat_data = [ 50.0, 50.0, 50.0, 50.0 ;
49.0, 49.0, 49.0, 49.0 ];
lon_data = [ -110.0, -109.0, -108.0, -107.0;
-110.0, -109.0, -108.0, -107.0 ];
% timestep t=1: 2018-01-01 00:00:00
pre (:,:,1) = [ -9999.0, 1.0, 2.0, 1.0;
1.0, 1.0, 0.0, 0.0 ];
% timestep t=2: 2018-01-01 06:00:00
pre (:,:,2) = [ -9999.0, 0.0, 0.0, 0.0 ;
0.0, 0.0, 0.0, 0.0 ];
% timestep t=3: 2018-01-01 18:00:00
pre (:,:,3) = [ -9999.0, 4.0, 4.0, 1.0 ;
1.0, 1.0, -9999.0, 0.0 ];
grid_ID = [ 120.0, 121.0, 125.0, 130.0 ;
122.0, 124.0, 140.0, 131.0 ];
Now we need to create a new NetCDF file:
ncid = netcdf.create('NetCDF_Matlab.nc','NETCDF4'); % Matlab
ncid = netcdf_create('NetCDF_Octave.nc','NETCDF4'); % Octave
The next step is to create dimensions which specify the number, and the size of the time variables, spatial variables and all the other variables. In our example, the dimensions are X=2
, Y=4
, and T=3
. Be aware that you have to replace netcdf.defDim
by netcdf_defDim
if you use Octave.
dimid_X = netcdf.defDim(ncid,'nlon',2);
dimid_Y = netcdf.defDim(ncid,'nlat',4);
dimid_T = netcdf.defDim(ncid,'time',3);
Usually the time dimension is set as unlimited
such that it is easier to append additional data later. An unlimited dimension is created like this (instead of the command above):
dimid_T = netcdf.defDim(ncid,'time', netcdf.getConstant('NC_UNLIMITED'));
The time variable is expressed as a time steps since a starting point of the time. The reference date is specified in the units
attribute for time following the standard pattern [seconds/hours/days] since YYYY-MM-DD HH:MM:SS
. There is zero flexibility in setting the units of the time variable. If you make changes in there or decide not to follow this standard, you will have problems to use this NetCDF file in applications. In our example, the time steps are the following:
% timestep t=1: 2018-01-01 00:00:00
% timestep t=2: 2018-01-01 06:00:00
% timestep t=3: 2018-01-01 18:00:00
There is an unlimited amount of possibilities to set the time. The following examples are all giving the same results:
hours since 2018-01-01 00:00:00 --> T_data = [ 0; 6; 18]
days since 2018-01-01 00:00:00 --> T_data = [ 0.0; 0.25; 0.75]
days since 2017-01-01 00:00:00 --> T_data = [365.0; 365.25; 365.75]
A good rule otherwise thumb is set the unit such that the time variable will contain only integers. Hence, personally I would prefer the first option but it is up to you.
The code for setting the time variable is:
% Variables
time_varid = netcdf.defVar(ncid,'time','NC_INT', dimid_T);
T_data = [ 0; 6; 18];
% Attributes
netcdf.putAtt(ncid,time_varid ,'long_name','time');
netcdf.putAtt(ncid,time_varid ,'units','hours since 2018-01-01 00:00:00');
netcdf.putAtt(ncid,time_varid ,'calendar','gregorian');
netcdf.putAtt(ncid,time_varid ,'standard_name','time');
netcdf.putAtt(ncid,time_varid ,'axis','T');
% Write data
netcdf.endDef(ncid);
% if dimid_T = netcdf.defDim(ncid,'time',3); is used
netcdf.putVar(ncid,time_varid,T_data)
% if dimid_T = netcdf.defDim(ncid,'time', netcdf.getConstant('NC_UNLIMITED')); is used
netcdf.putVar(ncid,time_varid,[0],[3],T_data)
Here we work on variables which identify the spatial extent of the data. It is good practice to add spatial information to every data you have. Even , for example, streamflow data show be georeferenced since the streamflow garage has a spatial location. In our example the data are for specific latitude and longitude values. The way to encode spatial information in NetCDF is well described in the standard. Similar to time it is highly recommended to use the below described attributes. You are theoretically allowed to set the attributes as you want but it is very likely that these NetCDFs are not supported by various software and programs. The long_name
attribute is where you can put whatever you want and add all information necessary. You can also add additional attributes if necessary.
% Variables
netcdf.reDef(ncid);
lat_varid = netcdf.defVar(ncid,'lat','NC_DOUBLE',[dimid_X dimid_Y]);
lon_varid = netcdf.defVar(ncid,'lon','NC_DOUBLE',[dimid_X dimid_Y]);
% Attributes
netcdf.putAtt(ncid,lat_varid,'long_name', 'latitude');
netcdf.putAtt(ncid,lon_varid,'long_name', 'longitude');
netcdf.putAtt(ncid,lat_varid,'units', 'degrees_north');
netcdf.putAtt(ncid,lon_varid,'units', 'degrees_east');
netcdf.putAtt(ncid,lat_varid,'standard_name','latitude');
netcdf.putAtt(ncid,lon_varid,'standard_name','longitude');
% Write data
netcdf.endDef(ncid);
netcdf.putVar(ncid,lat_varid,lat_data)
netcdf.putVar(ncid,lon_varid,lon_data)
The next step is to form the variables one by one using all the dimensions specified in previous step. Creating variables consists of three steps:
- create the variable structure with dimensions,
- set attributes and metadata for a variable (good practice is to set at least the attributes
long_name
andunits
), and - set actual variable values.
For our case we have only have the two variable pre(x,y,t)
and grid_ID(x,y)
.
% Variable
netcdf.reDef(ncid);
pre_varid = netcdf.defVar(ncid,'pre', 'NC_DOUBLE',[dimid_X dimid_Y dimid_T]);
grid_ID_varid = netcdf.defVar(ncid,'grid_ID','NC_DOUBLE',[dimid_X dimid_Y]);
% Attributes
netcdf.putAtt(ncid,pre_varid, 'long_name', 'Precipitation');
netcdf.putAtt(ncid,grid_ID_varid, 'long_name', 'Grid ID (numbering grid cells from 1 to N)');
netcdf.putAtt(ncid,pre_varid, 'units', 'mm');
netcdf.putAtt(ncid,grid_ID_varid, 'units', '1');
netcdf.putAtt(ncid,pre_varid, 'coordinates','lon lat');
netcdf.putAtt(ncid,grid_ID_varid, 'coordinates','lon lat');
% Write data
netcdf.endDef(ncid);
netcdf.putVar(ncid,pre_varid, pre)
netcdf.putVar(ncid,grid_ID_varid,grid_ID)
Finally we set some global attributes for the NetCDF file.
netcdf.reDef(ncid);
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'Conventions','CF-1.6');
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'License', 'The data were written by me. They are under GPL.');
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'history', ['Created ',datestr(datetime('now'))]);
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'source', 'Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).');
netcdf.endDef(ncid);
To make sure that all the changes are properly saved into the NetCDF file one can close writing the files:
netcdf.close(ncid)
The full script is available as link or download. We will now go step-by-step through that script.
Load the needed packages first:
import netCDF4 as nc4 # to work with netCDFs
import numpy as np # to perform numerics
import time
We then prepare the data. The defined values are just for the purpose of demonstration. You might get them as model output or read them from another file:
pre = np.array([ [ [ -9999.0, 1.0, 2.0, 1.0 ],
[ 1.0, 1.0, 0.0, 0.0 ] ],
[ [ -9999.0, 0.0, 0.0, 0.0 ],
[ 0.0, 0.0, 0.0, 0.0 ] ],
[ [ -9999.0, 4.0, 4.0, 1.0 ],
[ 1.0, 1.0, -9999.0, 0.0 ] ] ])
lat_data = np.array( [ [ 50.0, 50.0, 50.0, 50.0 ],
[ 49.0, 49.0, 49.0, 49.0 ] ])
lon_data = np.array( [ [ -110.0, -109.0, -108.0, -107.0 ],
[ -110.0, -109.0, -108.0, -107.0 ] ])
grid_ID = np.array( [ [ 120.0, 121.0, 125.0, 130.0 ],
[ 122.0, 124.0, 140.0, 131.0 ] ])
T_data = np.array( [ 0,6,18 ])
Now we need to create a new NetCDF file:
ncid = nc4.Dataset("NetCDF_Python.nc", "w", format="NETCDF4")
The next step is to create dimensions which specify the number, and the size of the time variables, spatial variables and all the other variables. In our example, the dimensions are X=2
, Y=4
, and T=3
.
dimid_X = ncid.createDimension('nlon',2)
dimid_Y = ncid.createDimension('nlat',4)
dimid_T = ncid.createDimension('time',3)
Usually the time dimension is set as unlimited
such that it is easier to append additional data later. An unlimited dimension is created like this (instead of the command above):
dimid_T = ncid.createDimension('time',None)
Some details on time variables in NetCDF is given in the Matlab section.
The code for setting the time variable in Python is:
# Variables
time_varid = ncid.createVariable('time','i4',('time',),zlib=True)
# Attributes
time_varid.long_name = 'time'
time_varid.units = 'hours since 2018-01-01 00:00:00'
time_varid.calendar = 'gregorian'
time_varid.standard_name = 'time'
time_varid.axis = 'T'
# Write data
time_varid[:] = T_data
Details on the spatial variables can be found in the Matlab section. Just be aware that changes in the setup below (even changing lower to upper case) might lead to the situation that your NetCDF files are not supported by other applications anymore.
# Variables
lat_varid = ncid.createVariable('lat','f8',('nlon','nlat',),zlib=True)
lon_varid = ncid.createVariable('lon','f8',('nlon','nlat',),zlib=True)
# Attributes
lat_varid.long_name = 'latitude'
lon_varid.long_name = 'longitude'
lat_varid.units = 'degrees_north'
lon_varid.units = 'degrees_east'
lat_varid.standard_name = 'latitude'
lon_varid.standard_name = 'longitude'
# Write data
lat_varid[:] = lat_data
lon_varid[:] = lon_data
Finally, we can create the other variables (here grid_ID(x,y)
and pre(time,x,y)
). Creating variables consists of three steps:
- create the variable structure with dimensions,
- set attributes and metadata for a variable (good practice is to set at least the attributes
long_name
andunits
), and - set actual variable values.
The zlib
attribute turns on the NetCDF internal file compression and results in much smaller files than without the attribute. netcdf.defVarDeflate
is similar to zlib
in Matlab.
# Variable
pre_varid = ncid.createVariable('pre', 'f8',('time','nlon','nlat',),zlib=True)
grid_ID_varid = ncid.createVariable('grid_ID','f8',( 'nlon','nlat',),zlib=True)
# Attributes
pre_varid.long_name = 'Precipitation'
grid_ID_varid.long_name = 'Grid ID (numbering grid cells from 1 to N)'
pre_varid.units = 'mm'
grid_ID_varid.units = '1'
pre_varid.coordinates = 'lon lat'
grid_ID_varid.coordinates = 'lon lat'
# Write data
pre_varid[:] = pre
grid_ID_varid[:] = grid_ID
Finally we set some global attributes for the NetCDF file.
ncid.Conventions = 'CF-1.6'
ncid.License = 'The data were written by me. They are under GPL.'
ncid.history = 'Created ' + time.ctime(time.time())
ncid.source = 'Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).'
ncid.close()
The full script is available as link or download. We will now go step-by-step through that script.
To read a NetCDF file in Matlab or Octave, firstly, we should find out the NetCDF variables and their attribute. To do so we can use the command ncdisp
for the NetCDF file that we created (it can be any NetCDF file):
ncdisp('NetCDF_Matlab.nc'); % in Matlab and Octave
The function returns the global attributes, dimensions, variables, and their attributes as follow:
Format:
netcdf4
Global Attributes:
Conventions = 'CF-1.6'
License = 'The data were written by me. They are under GPL.'
history = 'Created 24-Oct-2018 10:09:06'
source = 'Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).'
Dimensions:
nlon = 2
nlat = 4
time = 3
Variables:
time
Size: 3x1
Dimensions: time
Datatype: int32
Attributes:
long_name = 'time'
units = 'hours since 2018-01-01 00:00:00'
calendar = 'gregorian'
standard_name = 'time'
axis = 'T'
lat
Size: 2x4
Dimensions: nlon,nlat
Datatype: double
Attributes:
long_name = 'latitude'
units = 'degrees_north'
standard_name = 'latitude'
lon
Size: 2x4
Dimensions: nlon,nlat
Datatype: double
Attributes:
long_name = 'longitude'
units = 'degrees_east'
standard_name = 'longitude'
pre
Size: 2x4x3
Dimensions: nlon,nlat,time
Datatype: double
Attributes:
long_name = 'Precipitation'
units = 'mm'
coordinates = 'lon lat'
grid_ID
Size: 2x4
Dimensions: nlon,nlat
Datatype: double
Attributes:
long_name = 'Grid ID (numbering grid cells from 1 to N)'
units = '1'
coordinates = 'lon lat'
As you can see this NetCDF has four varibales pre
, lat
, lon
and grid_ID
which are defined in the beginign of the example described in the beginning of this page.
To read one of the varibale we use ncread
:
lat = ncread('NetCDF_Matlab.nc','lat') % in Matlab and Octave
This should return:
lat =
50 50 50 50
49 49 49 49
The NetCDF format allows the user to read a particular part of the NetCDF file without reading the entire variable. As an example lets say we would like to have the variable pre
for the lat=50.0
and lon=-108.0
for all the time steps. Firstly, we should find the location of the desired lat
and lon
. This location in Matlab is [1 3]
. To read the variable then we can use the ncread
:
pre_lat_lon = ncread('NetCDF_Matlab.nc','pre',[1 3 1],[1 1 3]);
in which [1 3 1]
is the index where ncread
start reading and [1 1 3]
is the steps of the reading meaning that the the function will read one step in lat
(the same lat
), one lon
(the same lon
) and three steps in time
. This result in:
pre_lat_lon(:,:,1) =
2
pre_lat_lon(:,:,2) =
0
pre_lat_lon(:,:,3) =
4
which can be converted into an array by the squeeze
function:
pre_lat_lon_time_series = squeeze (pre_lat_lon)
which results in:
pre_lat_lon_time_series =
2
0
4
In Matlab time is read similar to other variables described above. To read the time we can use:
time = ncread('NetCDF_Matlab.nc','time');
This function give back the minutes, hours, days from the start of the time described in the NetCDF file. Once can find the start of the time in the NetCDF file by looking into the time
variable attributes. In our example the time increments are in hours and the hour 0
is defined as 2018-01-01 00:00:00
. To have all the time steps in a vector format one can write the following:
start_time = [2018 1 1 0 0 0]; % defining [year month day hour minute second]
date_num = datenum(start_time); % changing the start_time to days from [1 1 1 0 0 0]
date_all_num = date_num + double(time)./24; % hours to days
date_all_vec = datevec (date_all_num)
this should result in:
date_all_vec =
2018 1 1 0 0 0
2018 1 1 6 0 0
2018 1 1 18 0 0
The full script is available as link or download. We will now go step-by-step through that script.
To read a NetCDF file in Python we can use the following commands (we make sure that the necessary packages are loaded):
import netCDF4 as nc4 # to work with netCDFs
import numpy as np # to perform numerics
import time
ncid = nc4.Dataset('NetCDF_Python.nc', 'r')
print(ncid)
the command return back the general attributes plus the general information on the variables.
<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
Conventions: CF-1.6
License: The data were written by me. They are under GPL.
history: Created Wed Oct 24 13:21:50 2018
source: Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).
dimensions(sizes): nlon(2), nlat(4), time(3)
variables(dimensions): int32 time(time), float64 lat(nlon,nlat), float64 lon(nlon,nlat), float64 pre(time,nlon,nlat), float64 grid_ID(nlon,nlat)
groups:
To access the attributes of a variable we can use:
print(ncid.variables['lat'])
which returns:
<class 'netCDF4._netCDF4.Variable'>
float64 lat(nlon, nlat)
long_name: latitude
units: degrees_north
standard_name: latitude
unlimited dimensions:
current shape = (2, 4)
filling on, default _FillValue of 9.969209968386869e+36 used
To read the variable lat
we can add a [:]
to the end of the above mentioned command:
lat = ncid.variables['lat'][:]
print(lat)
which returns
[[50. 50. 50. 50.]
[49. 49. 49. 49.]]
As the NetCDF format allows the user to read a particular part of the NetCDF file without reading the entire variable and similar to the above mentioned example for Matlab, here we read the vairbale pre
for the lat=50.0
and lon=-108.0
for all the time steps. Firstly, we should find the location of the desired lat
and lon
. This location in Python is [0 2]
. Please notice that the indexing for Python start from 0
on contrary to Matlab which starts from 1
. To read the variable then we can use the ncid.variables[]
:
pre_lat_lon = ncid.variables['pre'][:, 0, 2]
print(pre_lat_lon)
which returns:
[2. 0. 4.]
The colon :
in the first dimension of [:, 0, 2]
indicates that we want to read the entire time steps. 0
is the index for the lat
and 2
is the index of the lon
.
The full script is available as link or download. We will now go step-by-step through that script.
To read a NetCDF file in R we can use the following commands (we make sure that the necessary packages are loaded):
library(ncdf4) # to work with netCDFs
ncin <- nc_open("NetCDF_R.nc") # open file
ncid # view header (no data!)
the command return back the general attributes plus the general information on the variables.
File NetCDF_R.nc (NC_FORMAT_NETCDF4):
4 variables (excluding dimension variables):
double lat[Y,X] (Chunking: [4,2]) (Compression: shuffle,level 4)
long_name: latitude
units: degrees_north
standard_name: latitude
double lon[Y,X] (Chunking: [4,2]) (Compression: shuffle,level 4)
long_name: longitude
units: degrees_east
standard_name: longitude
double pre[Y,X,time] (Chunking: [4,2,1]) (Compression: shuffle,level 4)
long_name: Precipitation
units: mm
coordinates: lon lat
double grid_ID[Y,X] (Chunking: [4,2]) (Compression: shuffle,level 4)
long_name: Grid ID (numbering grid cells from 1 to N)
units: 1
coordinates: lon lat
3 dimensions:
X Size:2
Y Size:4
time Size:3 *** is unlimited ***
long_name: time
units: hours since 2018-01-01 00:00:00
calendar: gregorian
standard_name: time
axis: T
4 global attributes:
Conventions: CF-1.6
License: The data were written by me. They are under GPL.
history: Created Fri Jun 21 15:10:36 2019
source: Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).
To access the attributes of a variable we can use:
ncatt_get(ncin,"lat")
which returns:
$long_name
[1] "latitude"
$units
[1] "degrees_north"
$standard_name
[1] "latitude"
To read the variable lat
we can use the command:
lat = ncvar_get(ncin,"lat")
lat
which returns
[,1] [,2]
[1,] 50 49
[2,] 50 49
[3,] 50 49
[4,] 50 49
As the NetCDF format allows the user to read a particular part of the NetCDF file without reading the entire variable and similar to the above mentioned example for Matlab, here we read the variable pre
for the lat=50.0
and lon=-108.0
for all the time steps. Firstly, we should find the location of the desired lat
and lon
. This location in Python is [3,1]
. Please notice that the indexing for R start from 1 on contrary to Python which starts from 0. To read the variable then we can use the ncvar_get()
:
pre_lat_lon = ncvar_get(ncin,"pre")[3, 1, ]
pre_lat_lon
which returns:
[1] 2 0 4
The blank in the third dimension of [3, 1, ]
indicates that we want to read the entire time steps. 1
is the index for the lat
and 3
is the index of the lon
.
To check whether or not a written NetCDF file follows the Climate and Forecasts (CF) Metadata Convention, one can go to this website upload the NetCDF file and check the errors and warnings.
© 2017-2023 - Canadian Surface Prediction Archive - caspar.data@uwaterloo.ca
Funded under the Floodnet program.
Table of Contents: