Skip to content

Introduction of Reading and Writing NetCDF files

Julie edited this page Mar 30, 2023 · 8 revisions

Please find the French version of this page here.
Veuillez trouver la version française de cette page ici.

This is an introduction to create NetCDF files with Matlab, Octave, and Python. The emphasize here is on setting dimensions, variables, and attributes such that they fulfil the NetCDF standard called CF convention. Files following this convention are supported by most tools to visualize and perturb NetCDF files.

Tools to visualize NetCDF files are, for example, Panoply (all OS), NcView (only UNIX), QGIS (all OS), ArcGIS (only Windows). Possibilities to read, write and perturb NetCDF files are Matlab, Octave, Python, Fortran, C, etc. There are also helpful command line toolboxes that allow to change NetCDF files such as CDO and NCO.

For more helpful scripts on NetCDF and shape files also check out the Candex library developed by Shervan Gharari (U of Sask).

Structure of this article:

  1. Scripts used for examples
  2. Example
  3. Write NetCDF files in Matlab or Octave
  4. Write NetCDF files in Python
  5. Read NetCDF files in Matlab or Octave
  6. Read NetCDF files in Python
  7. Read NetCDF files in R
  8. Check NetCDF file against CF-Convention

Scripts used for examples

All the scripts that will be used for the examples below are available in the utility scripts. The following sections will explain them step-by-step.

Example

Let's assume a field of precipitation data called pre. Our domain is X=4 (columns) by Y=2 (rows) representing 8 grid cells. The precipitation time series contains T=3 time points:

    pre = [ % timestep t=1:  2018-01-01 00:00:00
            [ [ -9999.0, 1.0,     2.0, 1.0 ],
              [     1.0, 1.0,     0.0, 0.0 ] ],

            % timestep t=2:  2018-01-01 06:00:00
            [ [ -9999.0, 0.0,     0.0, 0.0 ],
              [     0.0, 0.0,     0.0, 0.0 ] ],

            % timestep t=3:  2018-01-01 18:00:00
            [ [ -9999.0, 4.0,     4.0, 1.0 ],
              [     1.0, 1.0, -9999.0, 0.0 ] ] ]

Apart from the data, in our case pre, other related variables can be defined in a NetCDF. As an example, the latitude and longitude of the domains for pre can be specified as:

    lat = [ [ 50.0, 50.0, 50.0, 50.0 ],
            [ 49.0, 49.0, 49.0, 49.0 ] ]
    lon = [ [ -110.0, -109.0, -108.0, -107.0 ],
            [ -110.0, -109.0, -108.0, -107.0 ] ]

The latitude and longitude values are not necessarily needed to be in regular format as this example provides. Other variable related to cells such as grid_ID can be specify in the NetCDF file.

   grid_ID = [ [ 120.0, 121.0, 125.0, 130.0 ],
               [ 122.0, 124.0, 140.0, 131.0 ] ]

Write NetCDF files in Matlab or Octave

Here we provide a simple script for writing a NetCDF with above mentioned examples. Readers can just copy past the scripts part by part into Matlab Workspace and follow the progress.

The commands are all for Matlab but can be used in Octave as well. In Octave you have to replace all netcdf.<function> commands to netcdf_<function>.

The full script is available as link or download. We will now go step-by-step through that script.

Getting started

First, we need to load some modules and initialize the variables in Matlab/Octave.

close all;
clear all;
clc;
pkg load io;       % only in Octave
pkg load netcdf;   % only in Octave

T_data   =  [ 0; 6; 18];
lat_data =  [ 50.0, 50.0, 50.0, 50.0 ;
              49.0, 49.0, 49.0, 49.0 ];
lon_data =  [ -110.0, -109.0, -108.0, -107.0;
              -110.0, -109.0, -108.0, -107.0 ];

% timestep t=1:  2018-01-01 00:00:00
pre (:,:,1) = [ -9999.0, 1.0,     2.0, 1.0;
                    1.0, 1.0,     0.0, 0.0 ];

% timestep t=2:  2018-01-01 06:00:00
pre (:,:,2) = [ -9999.0, 0.0,     0.0, 0.0 ;
                    0.0, 0.0,     0.0, 0.0 ];

% timestep t=3:  2018-01-01 18:00:00
pre (:,:,3) = [ -9999.0, 4.0,     4.0, 1.0 ;
                   1.0, 1.0, -9999.0, 0.0 ];

grid_ID     = [ 120.0, 121.0, 125.0, 130.0 ;
                122.0, 124.0, 140.0, 131.0 ];

Create NetCDF file

Now we need to create a new NetCDF file:

ncid = netcdf.create('NetCDF_Matlab.nc','NETCDF4');   % Matlab
ncid = netcdf_create('NetCDF_Octave.nc','NETCDF4');   % Octave

Create dimensions

The next step is to create dimensions which specify the number, and the size of the time variables, spatial variables and all the other variables. In our example, the dimensions are X=2, Y=4, and T=3. Be aware that you have to replace netcdf.defDim by netcdf_defDim if you use Octave.

dimid_X = netcdf.defDim(ncid,'nlon',2);
dimid_Y = netcdf.defDim(ncid,'nlat',4);
dimid_T = netcdf.defDim(ncid,'time',3);

Usually the time dimension is set as unlimited such that it is easier to append additional data later. An unlimited dimension is created like this (instead of the command above):

dimid_T = netcdf.defDim(ncid,'time', netcdf.getConstant('NC_UNLIMITED'));

Create time variable and its attributes

The time variable is expressed as a time steps since a starting point of the time. The reference date is specified in the units attribute for time following the standard pattern [seconds/hours/days] since YYYY-MM-DD HH:MM:SS. There is zero flexibility in setting the units of the time variable. If you make changes in there or decide not to follow this standard, you will have problems to use this NetCDF file in applications. In our example, the time steps are the following:

% timestep t=1:  2018-01-01 00:00:00
% timestep t=2:  2018-01-01 06:00:00
% timestep t=3:  2018-01-01 18:00:00

There is an unlimited amount of possibilities to set the time. The following examples are all giving the same results:

hours since 2018-01-01 00:00:00   --> T_data = [    0;      6;     18]
days  since 2018-01-01 00:00:00   --> T_data = [  0.0;   0.25;   0.75]
days  since 2017-01-01 00:00:00   --> T_data = [365.0; 365.25; 365.75]

A good rule otherwise thumb is set the unit such that the time variable will contain only integers. Hence, personally I would prefer the first option but it is up to you.

The code for setting the time variable is:

% Variables
time_varid = netcdf.defVar(ncid,'time','NC_INT', dimid_T);
T_data     = [    0;      6;     18];

% Attributes
netcdf.putAtt(ncid,time_varid ,'long_name','time');
netcdf.putAtt(ncid,time_varid ,'units','hours since 2018-01-01 00:00:00');
netcdf.putAtt(ncid,time_varid ,'calendar','gregorian');
netcdf.putAtt(ncid,time_varid ,'standard_name','time');
netcdf.putAtt(ncid,time_varid ,'axis','T');

% Write data
netcdf.endDef(ncid);
% if dimid_T = netcdf.defDim(ncid,'time',3); is used
netcdf.putVar(ncid,time_varid,T_data)
% if dimid_T = netcdf.defDim(ncid,'time', netcdf.getConstant('NC_UNLIMITED')); is used
netcdf.putVar(ncid,time_varid,[0],[3],T_data)

Create spatial variables and their attributes

Here we work on variables which identify the spatial extent of the data. It is good practice to add spatial information to every data you have. Even , for example, streamflow data show be georeferenced since the streamflow garage has a spatial location. In our example the data are for specific latitude and longitude values. The way to encode spatial information in NetCDF is well described in the standard. Similar to time it is highly recommended to use the below described attributes. You are theoretically allowed to set the attributes as you want but it is very likely that these NetCDFs are not supported by various software and programs. The long_name attribute is where you can put whatever you want and add all information necessary. You can also add additional attributes if necessary.

% Variables
netcdf.reDef(ncid);
lat_varid = netcdf.defVar(ncid,'lat','NC_DOUBLE',[dimid_X dimid_Y]);
lon_varid = netcdf.defVar(ncid,'lon','NC_DOUBLE',[dimid_X dimid_Y]);

% Attributes
netcdf.putAtt(ncid,lat_varid,'long_name',    'latitude');
netcdf.putAtt(ncid,lon_varid,'long_name',    'longitude');
netcdf.putAtt(ncid,lat_varid,'units',        'degrees_north');
netcdf.putAtt(ncid,lon_varid,'units',        'degrees_east');
netcdf.putAtt(ncid,lat_varid,'standard_name','latitude');
netcdf.putAtt(ncid,lon_varid,'standard_name','longitude');

% Write data
netcdf.endDef(ncid);
netcdf.putVar(ncid,lat_varid,lat_data)
netcdf.putVar(ncid,lon_varid,lon_data)

Create all other variables

The next step is to form the variables one by one using all the dimensions specified in previous step. Creating variables consists of three steps:

  1. create the variable structure with dimensions,
  2. set attributes and metadata for a variable (good practice is to set at least the attributes long_name and units), and
  3. set actual variable values.

For our case we have only have the two variable pre(x,y,t) and grid_ID(x,y).

% Variable
netcdf.reDef(ncid);
pre_varid     = netcdf.defVar(ncid,'pre',    'NC_DOUBLE',[dimid_X dimid_Y dimid_T]);
grid_ID_varid = netcdf.defVar(ncid,'grid_ID','NC_DOUBLE',[dimid_X dimid_Y]);

% Attributes
netcdf.putAtt(ncid,pre_varid,      'long_name',  'Precipitation');
netcdf.putAtt(ncid,grid_ID_varid,  'long_name',  'Grid ID (numbering grid cells from 1 to N)');
netcdf.putAtt(ncid,pre_varid,      'units',      'mm');
netcdf.putAtt(ncid,grid_ID_varid,  'units',      '1');
netcdf.putAtt(ncid,pre_varid,      'coordinates','lon lat');
netcdf.putAtt(ncid,grid_ID_varid,  'coordinates','lon lat');

% Write data
netcdf.endDef(ncid);
netcdf.putVar(ncid,pre_varid,    pre)
netcdf.putVar(ncid,grid_ID_varid,grid_ID)

Set global attributes

Finally we set some global attributes for the NetCDF file.

netcdf.reDef(ncid);
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'Conventions','CF-1.6');
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'License',    'The data were written by me. They are under GPL.');
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'history',    ['Created ',datestr(datetime('now'))]);
netcdf.putAtt(ncid,netcdf.getConstant('NC_GLOBAL'),'source',     'Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).');
netcdf.endDef(ncid);

Close NetCDF file

To make sure that all the changes are properly saved into the NetCDF file one can close writing the files:

netcdf.close(ncid)

Write NetCDF files in Python

The full script is available as link or download. We will now go step-by-step through that script.

Getting started

Load the needed packages first:

import netCDF4 as nc4            # to work with netCDFs
import numpy   as np             # to perform numerics
import time

We then prepare the data. The defined values are just for the purpose of demonstration. You might get them as model output or read them from another file:

pre      = np.array([ [ [ -9999.0, 1.0,     2.0, 1.0 ],
                        [     1.0, 1.0,     0.0, 0.0 ] ],
                      [ [ -9999.0, 0.0,     0.0, 0.0 ],
                        [     0.0, 0.0,     0.0, 0.0 ] ],
                      [ [ -9999.0, 4.0,     4.0, 1.0 ],
                        [     1.0, 1.0, -9999.0, 0.0 ] ] ])

lat_data = np.array(  [ [ 50.0, 50.0, 50.0, 50.0 ],
                        [ 49.0, 49.0, 49.0, 49.0 ] ])
lon_data = np.array(  [ [ -110.0, -109.0, -108.0, -107.0 ],
                        [ -110.0, -109.0, -108.0, -107.0 ] ])

grid_ID  = np.array(  [ [ 120.0, 121.0, 125.0, 130.0 ],
                        [ 122.0, 124.0, 140.0, 131.0 ] ])

T_data  = np.array(     [ 0,6,18 ])

Create NetCDF file

Now we need to create a new NetCDF file:

ncid = nc4.Dataset("NetCDF_Python.nc", "w", format="NETCDF4")

Create dimensions

The next step is to create dimensions which specify the number, and the size of the time variables, spatial variables and all the other variables. In our example, the dimensions are X=2, Y=4, and T=3.

dimid_X = ncid.createDimension('nlon',2)
dimid_Y = ncid.createDimension('nlat',4)
dimid_T = ncid.createDimension('time',3)

Usually the time dimension is set as unlimited such that it is easier to append additional data later. An unlimited dimension is created like this (instead of the command above):

dimid_T = ncid.createDimension('time',None)

Create time variable and its attributes

Some details on time variables in NetCDF is given in the Matlab section.

The code for setting the time variable in Python is:

# Variables
time_varid = ncid.createVariable('time','i4',('time',),zlib=True)

# Attributes
time_varid.long_name     = 'time'
time_varid.units         = 'hours since 2018-01-01 00:00:00'
time_varid.calendar      = 'gregorian'
time_varid.standard_name = 'time'
time_varid.axis          = 'T'

# Write data
time_varid[:] = T_data

Create spatial variables and their attributes

Details on the spatial variables can be found in the Matlab section. Just be aware that changes in the setup below (even changing lower to upper case) might lead to the situation that your NetCDF files are not supported by other applications anymore.

# Variables
lat_varid = ncid.createVariable('lat','f8',('nlon','nlat',),zlib=True)
lon_varid = ncid.createVariable('lon','f8',('nlon','nlat',),zlib=True)

# Attributes
lat_varid.long_name      = 'latitude'
lon_varid.long_name      = 'longitude'
lat_varid.units          = 'degrees_north'
lon_varid.units          = 'degrees_east'
lat_varid.standard_name  = 'latitude'
lon_varid.standard_name  = 'longitude'

# Write data
lat_varid[:] = lat_data
lon_varid[:] = lon_data

Create all other variables

Finally, we can create the other variables (here grid_ID(x,y) and pre(time,x,y)). Creating variables consists of three steps:

  1. create the variable structure with dimensions,
  2. set attributes and metadata for a variable (good practice is to set at least the attributes long_name and units), and
  3. set actual variable values.

The zlib attribute turns on the NetCDF internal file compression and results in much smaller files than without the attribute. netcdf.defVarDeflate is similar to zlib in Matlab.

# Variable
pre_varid     = ncid.createVariable('pre',    'f8',('time','nlon','nlat',),zlib=True)
grid_ID_varid = ncid.createVariable('grid_ID','f8',(       'nlon','nlat',),zlib=True)

# Attributes
pre_varid.long_name       = 'Precipitation'
grid_ID_varid.long_name   = 'Grid ID (numbering grid cells from 1 to N)'
pre_varid.units           = 'mm'
grid_ID_varid.units       = '1'
pre_varid.coordinates     = 'lon lat'
grid_ID_varid.coordinates = 'lon lat'

# Write data
pre_varid[:]     = pre
grid_ID_varid[:] = grid_ID

Set global attributes

Finally we set some global attributes for the NetCDF file.

ncid.Conventions = 'CF-1.6'
ncid.License     = 'The data were written by me. They are under GPL.'
ncid.history     = 'Created ' + time.ctime(time.time())
ncid.source      = 'Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).'

Close NetCDF file

ncid.close()

Read NetCDF files in Matlab or Octave

The full script is available as link or download. We will now go step-by-step through that script.

Read variables

To read a NetCDF file in Matlab or Octave, firstly, we should find out the NetCDF variables and their attribute. To do so we can use the command ncdisp for the NetCDF file that we created (it can be any NetCDF file):

ncdisp('NetCDF_Matlab.nc'); % in Matlab and Octave

The function returns the global attributes, dimensions, variables, and their attributes as follow:

Format:
           netcdf4
Global Attributes:
           Conventions = 'CF-1.6'
           License     = 'The data were written by me. They are under GPL.'
           history     = 'Created 24-Oct-2018 10:09:06'
           source      = 'Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).'
Dimensions:
           nlon = 2
           nlat = 4
           time = 3
Variables:
    time
           Size:       3x1
           Dimensions: time
           Datatype:   int32
           Attributes:
                       long_name     = 'time'
                       units         = 'hours since 2018-01-01 00:00:00'
                       calendar      = 'gregorian'
                       standard_name = 'time'
                       axis          = 'T'
    lat
           Size:       2x4
           Dimensions: nlon,nlat
           Datatype:   double
           Attributes:
                       long_name     = 'latitude'
                       units         = 'degrees_north'
                       standard_name = 'latitude'
    lon
           Size:       2x4
           Dimensions: nlon,nlat
           Datatype:   double
           Attributes:
                       long_name     = 'longitude'
                       units         = 'degrees_east'
                       standard_name = 'longitude'
    pre
           Size:       2x4x3
           Dimensions: nlon,nlat,time
           Datatype:   double
           Attributes:
                       long_name   = 'Precipitation'
                       units       = 'mm'
                       coordinates = 'lon lat'
    grid_ID
           Size:       2x4
           Dimensions: nlon,nlat
           Datatype:   double
           Attributes:
                       long_name   = 'Grid ID (numbering grid cells from 1 to N)'
                       units       = '1'
                       coordinates = 'lon lat'

As you can see this NetCDF has four varibales pre, lat, lon and grid_ID which are defined in the beginign of the example described in the beginning of this page. To read one of the varibale we use ncread:

lat = ncread('NetCDF_Matlab.nc','lat') % in Matlab and Octave

This should return:

lat =

    50    50    50    50
    49    49    49    49

The NetCDF format allows the user to read a particular part of the NetCDF file without reading the entire variable. As an example lets say we would like to have the variable pre for the lat=50.0 and lon=-108.0 for all the time steps. Firstly, we should find the location of the desired lat and lon. This location in Matlab is [1 3]. To read the variable then we can use the ncread:

pre_lat_lon = ncread('NetCDF_Matlab.nc','pre',[1 3 1],[1 1 3]);

in which [1 3 1] is the index where ncread start reading and [1 1 3] is the steps of the reading meaning that the the function will read one step in lat (the same lat), one lon (the same lon) and three steps in time. This result in:

pre_lat_lon(:,:,1) =

     2


pre_lat_lon(:,:,2) =

     0


pre_lat_lon(:,:,3) =

     4

which can be converted into an array by the squeeze function:

pre_lat_lon_time_series = squeeze (pre_lat_lon)

which results in:

pre_lat_lon_time_series =

     2
     0
     4

Read time

In Matlab time is read similar to other variables described above. To read the time we can use:

time = ncread('NetCDF_Matlab.nc','time');

This function give back the minutes, hours, days from the start of the time described in the NetCDF file. Once can find the start of the time in the NetCDF file by looking into the time variable attributes. In our example the time increments are in hours and the hour 0 is defined as 2018-01-01 00:00:00. To have all the time steps in a vector format one can write the following:

start_time = [2018 1 1 0 0 0]; % defining [year month day hour minute second]
date_num = datenum(start_time); % changing the start_time to days from [1 1 1 0 0 0]
date_all_num = date_num + double(time)./24; % hours to days
date_all_vec = datevec (date_all_num)

this should result in:

date_all_vec =

        2018           1           1           0           0           0
        2018           1           1           6           0           0
        2018           1           1          18           0           0

Read NetCDF files in Python

The full script is available as link or download. We will now go step-by-step through that script.

Read variables

To read a NetCDF file in Python we can use the following commands (we make sure that the necessary packages are loaded):

import netCDF4 as nc4            # to work with netCDFs
import numpy   as np             # to perform numerics
import time
ncid = nc4.Dataset('NetCDF_Python.nc', 'r')
print(ncid)

the command return back the general attributes plus the general information on the variables.

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Conventions: CF-1.6
    License: The data were written by me. They are under GPL.
    history: Created Wed Oct 24 13:21:50 2018
    source: Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).
    dimensions(sizes): nlon(2), nlat(4), time(3)
    variables(dimensions): int32 time(time), float64 lat(nlon,nlat), float64 lon(nlon,nlat), float64 pre(time,nlon,nlat), float64 grid_ID(nlon,nlat)
    groups:

To access the attributes of a variable we can use:

print(ncid.variables['lat'])

which returns:

<class 'netCDF4._netCDF4.Variable'>
float64 lat(nlon, nlat)
    long_name: latitude
    units: degrees_north
    standard_name: latitude
unlimited dimensions:
current shape = (2, 4)
filling on, default _FillValue of 9.969209968386869e+36 used

To read the variable lat we can add a [:] to the end of the above mentioned command:

lat = ncid.variables['lat'][:]
print(lat)

which returns

[[50. 50. 50. 50.]
 [49. 49. 49. 49.]]

As the NetCDF format allows the user to read a particular part of the NetCDF file without reading the entire variable and similar to the above mentioned example for Matlab, here we read the vairbale pre for the lat=50.0 and lon=-108.0 for all the time steps. Firstly, we should find the location of the desired lat and lon. This location in Python is [0 2]. Please notice that the indexing for Python start from 0 on contrary to Matlab which starts from 1. To read the variable then we can use the ncid.variables[]:

pre_lat_lon = ncid.variables['pre'][:, 0, 2]
print(pre_lat_lon)

which returns:

[2. 0. 4.]

The colon : in the first dimension of [:, 0, 2] indicates that we want to read the entire time steps. 0 is the index for the lat and 2 is the index of the lon.

Read NetCDF files in R

The full script is available as link or download. We will now go step-by-step through that script.

Read variables

To read a NetCDF file in R we can use the following commands (we make sure that the necessary packages are loaded):

library(ncdf4)                   # to work with netCDFs
ncin <- nc_open("NetCDF_R.nc")   # open file
ncid                             # view header (no data!)

the command return back the general attributes plus the general information on the variables.

File NetCDF_R.nc (NC_FORMAT_NETCDF4):

     4 variables (excluding dimension variables):
        double lat[Y,X]   (Chunking: [4,2])  (Compression: shuffle,level 4)
            long_name: latitude
            units: degrees_north
            standard_name: latitude
        double lon[Y,X]   (Chunking: [4,2])  (Compression: shuffle,level 4)
            long_name: longitude
            units: degrees_east
            standard_name: longitude
        double pre[Y,X,time]   (Chunking: [4,2,1])  (Compression: shuffle,level 4)
            long_name: Precipitation
            units: mm
            coordinates: lon lat
        double grid_ID[Y,X]   (Chunking: [4,2])  (Compression: shuffle,level 4)
            long_name: Grid ID (numbering grid cells from 1 to N)
            units: 1
            coordinates: lon lat

     3 dimensions:
        X  Size:2
        Y  Size:4
        time  Size:3   *** is unlimited ***
            long_name: time
            units: hours since 2018-01-01 00:00:00
            calendar: gregorian
            standard_name: time
            axis: T

    4 global attributes:
        Conventions: CF-1.6
        License: The data were written by me. They are under GPL.
        history: Created Fri Jun 21 15:10:36 2019
        source: Written by CaSPAr test script (https://github.com/kckornelsen/CaSPAR_Public).

To access the attributes of a variable we can use:

ncatt_get(ncin,"lat")

which returns:

$long_name
[1] "latitude"

$units
[1] "degrees_north"

$standard_name
[1] "latitude"

To read the variable lat we can use the command:

lat = ncvar_get(ncin,"lat")
lat

which returns

     [,1] [,2]
[1,]   50   49
[2,]   50   49
[3,]   50   49
[4,]   50   49

As the NetCDF format allows the user to read a particular part of the NetCDF file without reading the entire variable and similar to the above mentioned example for Matlab, here we read the variable pre for the lat=50.0 and lon=-108.0 for all the time steps. Firstly, we should find the location of the desired lat and lon. This location in Python is [3,1]. Please notice that the indexing for R start from 1 on contrary to Python which starts from 0. To read the variable then we can use the ncvar_get():

pre_lat_lon = ncvar_get(ncin,"pre")[3, 1, ]
pre_lat_lon

which returns:

[1] 2 0 4

The blank in the third dimension of [3, 1, ] indicates that we want to read the entire time steps. 1 is the index for the lat and 3 is the index of the lon.

Check NetCDF file against CF-Convention

To check whether or not a written NetCDF file follows the Climate and Forecasts (CF) Metadata Convention, one can go to this website upload the NetCDF file and check the errors and warnings.