-
Notifications
You must be signed in to change notification settings - Fork 0
/
02_data_collection.Rmd
50 lines (32 loc) · 1.9 KB
/
02_data_collection.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
## Data Processing
Data are often inconsistent, incomplete, incorrect, or misspelled. [Data cleaning]((https://www.practicereproducibleresearch.org/core-chapters/7-glossary.html#munging))
is essential.
For data cleaning you may use a GUI (Graphical User Interface) based tool like
OpenRefine [http://openrefine.org/](http://openrefine.org/) or choose a programmatic
approach.
In the following we describe how data can be imported into the [R](https://r-project.org)-Programming
Environment, which can be used for data cleaning, aggregation and visualisation.
`r knitcitations::citep(manual["Grolemund_2017"])`
### Logger Devices
The R package [kwb.logger](https://github.com/kwb-r.kwb.logger) `r knitcitations::citep(manual["Sonnenberg_2018"])`
helps to import raw data from loggers used in different KWB projects into the
software [R](https://r-project.org) `r knitcitations::citep(citation())`, which is used for
data processing (e.g. data cleaning, aggregation and visualisation).
```{block2, type = 'rmdtip'}
For details, which loggers currently are supported by the R packages [kwb.logger](https://kwb-r.github.io/kwb.logger)
please check the [documentation website](https://kwb-r.github.io/kwb.logger/reference/index.html).
```
### Spreadsheets
General recommendations for working with EXCEL spreadsheets is given in the
[FAQs](#faq-excel) section.
#### Import Data From One Excel File
* Save the original file in the rawdata zone.
#### Import Data From Many Excel Files
##### Files Are In the Same Format
Import Excel files of the same format by
* defining a function that is able to read the data from that file
* calling this function in a loop for each file to import.
##### Files Are In Different Formats
We developed a general approach of importing data from many Excel files in which
the formats (e.g. more than one table area within one sheet, differing numbers
of header rows) differ from file to file.