Skip to content

Standard (ASCII) format for station data

Joaquin Bedia edited this page Mar 7, 2022 · 11 revisions

Weather station data are most often stored in the form of text/csv files instead NetCDF. In the following, we describe the standard format for observational datasets considered in loadeR, which is the same defined within the COST Action VALUE. Then, the VALUE ECA&D dataset is going to be used as example, which contains weather data of 86 stations spread over Europe, and is available for download:

value <- tempfile(fileext = ".zip")
download.file("www.value-cost.eu/sites/default/files/VALUE_ECA_86_v2.zip", 
              destfile = value)
# Data inventory
di <- dataInventory(dataset = value)

NOTE: To see examples of available station data go to section 3.2. Accessing and loading station data.

In order to explore data formats in detail, next the "VALUE_ECA_86_v2.zip" is decompressed:

valuefiles <- tempdir()
unzip(value, exdir = valuefiles)

Station data and metadata are stored as a collection of csv files strictly following this structure:

stations.txt

This file contains the information regarding the weather stations. The first three columns are the minimum information required for defining an station dataset, so these are compulsory. The remaining data (altitude and source in this case) are an example of optional metadata than can be additionally included in this file. The datasets can have as many metadata as one may want, but columns station_id, longitude and latitude are mandatory, and their names must match exactly the ones shown in this example.

head(read.table(paste0(valuefiles, "/VALUE_ECA_86_v2/stations.txt"), sep = ",", header = TRUE))

##   station_id,      name, longitude,  latitude, altitude, source
## 1     000012,      GRAZ, 15.450000, 47.083100,      366,  ECA&D
## 2     000013, INNSBRUCK, 11.400000, 47.266700,      577,  ECA&D
## 3     000014,  SALZBURG, 13.000000, 47.800000,      437,  ECA&D
## 4     000015, SONNBLICK, 12.950000, 47.050000,     3106,  ECA&D
## 5     000016,      WIEN, 16.350000, 48.233100,      198,  ECA&D
## 6     000017,     UCCLE,  4.366400, 50.800000,      100,  ECA&D

variables.txt

This file contains the information regarding the variables contained in the dataset, including their identification ID (variable_id), variable name (longname), units of measure (unit), the code used to identify missing data (missing_code) and other info that can be optionally included (e.g. type).

NOTE: The use of special characters in 'variable_id' codes is discouraged

head(read.table(paste0(valuefiles, "/VALUE_ECA_86_v2/variables.txt"), sep = ",", header = TRUE))

##   variable_id,                                    longname, unit, missing_code,        type
## 1      precip, Total_precipitation_accumulated_in_24_hours,   mm,          NaN, observation
## 2       tmean,                   Daily_maximum_temperature, degC,          NaN, observation
## 3        tmin,                   Daily_minimum_temperature, degC,          NaN, observation
## 4        tmax,                      Daily_mean_temperature, degC,          NaN, observation

Data files

Variables are stored separately in text files named as indicated by the variable field in the variables.txt file. The first column of the file represents the observation date dates, following the format YYYYMMDD. More exceptionally in downscaling applications, time records for subdaily data can be indicated using the format YYYYMMDDHH. The remaining columns (2 to n) correspond to the observed series at each station, following the order defined in the first line of the file. This is a (truncated) example file for the minimum daily temperature data of this dataset:

head(read.table(paste0(valuefiles, "/VALUE_ECA_86_v2/tmin.txt"), sep = ",", header = TRUE))

##   YYYYMMDD, X000012, X000013, X000014, X000015, X000016, X000017, X000021,
## 1 19610101,    -4.7,    -4.8,    -5.2,   -13.7,    -1.7,     1.2,    -1.4,
## 2 19610102,    -1.2,      -2,    -1.2,   -13.2,    -0.3,       2,     0.3,
## 3 19610103,    -4.3,    -2.7,    -3.2,   -13.2,    -1.5,       2,    -0.6,
## 4 19610104,     0.8,    -2.5,    -7.5,   -14.6,     0.2,     2.6,     5.5,
## 5 19610105,     0.8,      -5,    -6.1,   -17.4,     0.5,     1.1,     3.7,
## 6 19610106,    -4.4,      -7,    -6.9,   -18.4,    -1.4,       2,     1.2,

NOTE: To see examples of available station data go to section 3.2. Accessing and loading station data.


<-- Home page of the Wiki

print(sessionInfo())

## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.3 LTS
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] magrittr_1.5    formatR_1.2     tools_3.2.3     htmltools_0.2.6
##  [5] yaml_2.1.13     stringi_0.4-1   rmarkdown_0.6.1 knitr_1.10.5   
##  [9] stringr_1.0.0   digest_0.6.8    evaluate_0.7