-
Notifications
You must be signed in to change notification settings - Fork 51
/
Copy path012_rstudio.Rmd
executable file
·233 lines (154 loc) · 15.6 KB
/
012_rstudio.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
<style>@import url(style.css);</style>
[Introduction to Data Analysis](index.html "Course index")
# 1.2. Installing RStudio
Working with R is easier when you run it through [RStudio](http://rstudio.org/), a software interface that adds many useful features to it. Let's review some basics of both R and RStudio together, run our first R script—and produce our first plot—a random heatmap.
## Installation
Use [this link][rs-dl] to download RStudio. Installation should then be pretty straightforward. You will be installing the Desktop version -- the other one is for running RStudio on a server, while you will be running it as a desktop program.
[rs-dl]: http://www.rstudio.com/ide/download/desktop "RStudio Desktop"
RStudio comes with [excellent documentation][rs-docs] available through the Help menu. The same menu will also show you [keyboard shortcuts][rs-kb], which will quickly come in handy. The sections below cover only the essentials, which should be enough to get you started.
[rs-docs]: http://www.rstudio.com/ide/docs/ "RStudio documentation"
[rs-kb]: http://www.rstudio.com/ide/docs/using/keyboard_shortcuts "RStudio keyboard shortcuts"
A few more things about the RStudio software:
- __Rstudio requires a 64-bit processor.__ Users with older 32-bit processors like older Apple MacBooks from circa 2009 will have to use an old version that is not available from the website any more. Ask us for it _right now_.
- __RStudio is frequently improved__—it was actually updated on the day we started the class! You can choose to update it periodically, or you can stick with the version of RStudio that you installed: it should not have any impact on the course.
- __RStudio can crash from time to time.__ It might sometimes have to "abort your R session", which means crashing R without crashing RStudio itself. Please let us know when that happens. You will generally be able to quickly restart a new R session.
- __RStudio is very quickly developing__ into a powerhouse for R, with [more features][rs-ide] than we will be able to show. One of the most recent developments allows to [build web applications][rs-shiny] on top of R code, directly from RStudio.
[rs-ide]: http://www.rstudio.com/ide/ "RStudio features"
[rs-shiny]: http://www.rstudio.com/shiny/ "RStudio: Shiny"
## Interface
When you open RStudio for the first time, you see something like the screenshot below, composed of three quadrants. Each quadrant is called a "pane" of the RStudio window interface:
[![RStudio interface.](images/rstudio-interface-1.png)](images/rstudio-interface-1.png)
The red circle designates the Console window, where you get to type functions and see their results. The blue circle designates the Workspace, where your objects appear. The purple circle designates a bunch of other windows for files, plots, packages and help.
## Running scripts
Most of the time, you will have one more pane open on the top-left side of the screen to view, edit and run R scripts. Press ⌘-⇧-N (Mac) or Ctrl-⇧-N (Win) to open a new R script. You can also use the File menu to select "New > R Script", or use this icon:
![RStudio new script.](images/rstudio-new-script.png)
After opening the new R script, copy-paste the entire code snippet below into it:
```{r matrix-heatmap, eval=FALSE}
# Create a vector of 400 random values.
m <- runif(400)
# Print the first values of the vector.
head(m)
# Turn the vector into a 20 x 20 matrix.
m <- matrix(m, nrow = 20)
# Check the class of the matrix object.
class(m)
# Name the matrix rows with letters.
rownames(m) <- letters[1:20]
# Plot the matrix as a heatmap.
heatmap(m)
```
Finally, select the whole code and run it. You will see results get printed out (in black) in the Console, along with the code (in blue). The `m` object will also appear in the Workspace, and a random heatmap will appear in the bottom-right pane:
[![RStudio interface.](images/rstudio-interface-2.png)](images/rstudio-interface-2.png)
## Setting the working directory
Let's now save the script. Click the "Files" tab in the bottom-right pane. Choose the folder where you store your student stuff, and create a new folder called `IDA` for this course. Then set it as the working directory through the More menu, as shown below:
![RStudio working directory.](images/rstudio-setwd.png)
The working directory is the default location on your computer where R looks for stuff. RStudio copied that location to the Console when you completed the step described above, as if you had run the `setwd()` function, which sets the working directory to a given folder:
```{r setwd, eval=FALSE}
setwd("~/Documents/Teaching/IDA")
```
You can now save your script: focus on it by clicking anywhere in its window, press ⌘-S (Mac) or Ctrl-S (Win), and RStudio will offer to save it in your `IDA` folder. Save it there under a sensible name like `heatmap-demo.R`. R scripts should end with the `.R` extension.
Finally, set your course folder as the default working directory in RStudio's preferences. Open the preferences with ⌘-, (Mac) or from "Tools > Options". The very first preference is the default working directory, which should be set to your `IDA` folder:
![RStudio working directory.](images/rstudio-setwd-pref.png)
The folder path, or location, printed above is one on my own (Mac OS X) system. Your computer will show a different one. RStudio also shows that path on top of the Console window. Tip: always opt for simple folder paths.
Last but not least: __never move your `IDA` folder once you have set it up__, or RStudio will not be able to find it. This will cause useless trouble, like running into cascades of errors because your scripts cannot find the course datasets.
## Tab auto-completion
Now that the script is saved, take a minute to explore its code by navigating its lines with the keyboard arrow keys. Switch back and forth from the Source Editor (the script) to the Console (its results) by pressing Ctrl-1 and Ctrl-2 to change the cursor focus.
Your goal for the first month of class is actually to get as much practice with keyboard navigation and shortcuts as you can, so that you get used to running code from a script file while tinkering it from the Console.
A nifty feature of RStudio (and R) in that regard is tab auto-completion. Type in the first letters of a function and press `Tab`: possible function names will appear. The example below shows how to reach the `install.packages()` function through tab auto-completion:
![RStudio auto-completion.](images/rstudio-autocompletion.png)
To get the auto-completion window to open, just type in a few letters like `ins` and then (gently) hit the `Tab` key. The auto-completion feature also shows you the basic function description and syntax, which you can also get in the documentation pages.
## Packages
R is extremely modular: it comes with a set of "base" functions that can be supplemented with user-contributed ones, most often installed through the [CRAN][cran] online central repository from which you previously downloaded R itself. This is a core feature of R, as its open source nature has encouraged the development of thousands of user-contributed packages.
[cran]: http://cran.r-project.org/ "CRAN"
Let's first take a look at which packages come with R by default and where packages are installed:
```{r default-packages}
# See the default packages.
getOption("defaultPackages")
# See where packages are stored.
.libPaths()
```
Remember that R is case sensitive: the functions above require that the `O` in `getOption()` and `P` in `.libPaths()` are typed in UPPERCASE. Also note the somewhat unusual trailing dot at the beginning of the `.libPaths()` function: if you omit to type `.` to call the `.libPaths()` function, R will return an error.
Let's install a few packages straight away. To do that quickly, your best option is simply to copy-paste the code below into a new R script, and to run it as you previously did with the random heatmap example.
```{r install-packages, tidy=FALSE}
# Create a list of essential packages.
list <- c("foreach", "knitr", "devtools", "ggplot2", "downloader", "reshape")
# Select those that are not installed.
list <- list[!list %in% installed.packages()[, 1]]
# Install the missing packages.
if(length(list))
lapply(list, install.packages, repos = "http://cran.us.r-project.org")
```
Note that the last two lines form a logical statement and should be run together (i.e. consecutively) to work as intended. The result will be a lot of red ink: R is going to diagnose which of these packages you do not yet have installed, and then install them. You need to be online for this to work. Please ask us for help if you encounter any difficulties at that stage.
Once installed, the additional packages can be loaded (initialized) with the `library()` or `require()` functions. Each package brings its own functionalities: `ggplot2` package is used for [plotting neat graphics][ggplot2], `downloader` is used to download data from online sources. You will have to initialize the packages every time that you use their functions in R, as shown here:
[ggplot2]: http://ggplot2.org/ "ggplot2"
```{r load-packages, results='hide'}
# Load ggplot2 package.
library(ggplot2)
# Load downloader package.
require(downloader)
```
The "loading ..." messages that you might get by running the functions above are not errors but simply a sign that other packages are loaded at the same time in the background. R prints errors, messages and warnings (non-fatal errors) in the same color, which is indeed confusing. This is one of R's idiosyncracies: you cannot trust the colors, as red ink does not always stand for "problem".
A possibility is that you get _no result at all_, and this is behaviour is generalizable from installing packages to executing code overall: when you run a function, no result can be good news and mean that everything wnet alright. Reversely, getting a result does _not_ mean that it systematically is the correct one (your function could contain a mistake, for instance).
## Drawing and saving plots
One last functionality that we want to show as part of the RStudio interface has to do with plots. Let's take inspiration from a [quick introduction to `ggplot2`][ec-ggplot2], a graphics package that you should have [previously][011] installed when trying out package installation in R.
[ec-ggplot2]: http://blog.echen.me/2012/01/17/quick-introduction-to-ggplot2/ "Quick Introduction to ggplot2 (Edwin Chen)"
[011]: 011_r.html
```{r ggplot2-load-package}
# In case you have not yet installed ggplot2, this will.
if(!require(ggplot2)) {
# Install the ggplot2 package.
install.packages("ggplot2")
# Load the ggplot2 package.
library(ggplot2)
}
```
[iris]: http://stat.ethz.ch/R-manual/R-patched/library/datasets/html/iris.html "Edgar Anderson's Iris Data (R Documentation)"
We'll be looking at some example data included in the R software. The data describes [Edgar Anderson's set of fifty flowers][iris], studied by famous statistician Ronald Fisher in an article published in 1936. The data itself were collected by Edgar Anderson and published a year before. The `head()` function shows you the top rows of the data:
```{r ggplot2-load-data}
# Look at first rows of a default dataset.
head(iris)
```
[qplot]: http://docs.ggplot2.org/current/qplot.html qplot (ggplot2)
Let's now use the 'quick plot' functionality of the `ggplot2` package. The [`qplot()` function][qplot] requires three arguments here: the data to use, one variable to plot on the horizontal x-axis, and another to plot on the vertical y-axis. The order of the arguments is, as usual, irrelevant, as long as you identify `x` and `y` properly in the function call.
```{r ggplot2-qplot}
# Quick plot.
qplot(data = iris, x = Sepal.Length, y = Petal.Length)
```
[ggplot2-tutorial]: http://inundata.org/2013/04/10/a-quick-introduction-to-ggplot2/ "A quick introduction to ggplot2 (Karthik Ram)"
We will come back to how `ggplot2` works in due time (here's [another quick introduction][ggplot2-tutorial] showing more plots if you want a preview). For the moment, let's simply improve the plot by adding a bunch of options that will add some color, sizing and transparency effects to the plot that opened in the bottom-right pane. The next code block is just one long declaration to do that:
```{r ggplot2-3, tidy = FALSE}
# Select all lines below to run properly.
qplot(data = iris,
x = Sepal.Length,
y = Petal.Length,
color = Species,
size = Petal.Width,
alpha = I(0.7))
```
[wiki-raster]: https://en.wikipedia.org/wiki/Raster_graphics "Raster graphics (Wikipedia)"
[wiki-vector]: https://en.wikipedia.org/wiki/Vector_graphics "Vector graphics (Wikipedia)"
This plot can be saved to PNG (a [bitmap][wiki-raster] format) or PDF (a [vector][wiki-vector] format) from the Export menu. You will want to use PNG if you are saving the plot for screen viewing, e.g. posting on a Web page. Otherwise, you will prefer to export the plot as PDF, as shown below.
![RStudio export plot.](images/rstudio-save-plot.png)
RStudio will open a window to ask you for a few details, such as the filename for the exported graphic. The default saving location will be your working directory. After the plot is exported, it will open in your default PDF viewer, so that you can check everything went well.
## Help pages
Type in `help.start()` to access the index of documentation pages in R. If you do not understand a specific function like `qplot()`, then type its name with a question mark in front of it, as in `?qplot`. If that does not work, search with keywords by typing two question marks, as in `??heatmap`, to perform a fuzzy search that will match documentation from all installed packages.
Help pages open in the bottom-right pane of RStudio. Try these examples out:
```{r help-calls, eval=FALSE, tidy=FALSE}
# Function help.
?setwd
# Keyword search.
??heatmap
# Package docs.
help(package = "downloader")
```
R documentation pages contain precise instructions on the syntax of each function: cover the "Description", "Usage" and "Arguments" sections first, and turn to the "Details" section if the text sends you to it. Always check out the "Examples" section, which might (or might not, hence textbooks and tutorials) lead you to understand how the function works in context.
Reading technical documentation is a skill in itself, and not everything can be answered from there, so looking for help elsewhere is frequent practice. Note, however, that this is also a skill in itself and that there is an etiquette to getting help online: if you plan to use R-help or Stack Overflow to ask questions, turn first to [Roger Peng][rp-help], [Jeromy Anglim][ja-qa] or [Eric Raymond][er-qa].
[rp-help]: https://www.youtube.com/watch?v=ZFaWxxzouCY "How to Get [R] Help (Roger Peng)"
[er-qa]: http://catb.org/~esr/faqs/smart-questions.html "How To Ask Questions The Smart Way (Eric Steven Raymond)"
[ja-qa]: http://jeromyanglim.blogspot.com.au/2011/03/how-to-ask-me-statistics-question.html "How to Ask Me a Statistics Question (Jeromy Anglim)"
Note, finally, that typing the name of a function without its brackets will also show some form of help, which is generally useful only if you know a bit more about low-level programming in R. Here's a simple example showing how the `median()` function works internally, from which you can learn that it allows an option called `na.rm` and belongs to the base `stats` package preinstalled in R:
```{r help-function}
median
```
Last, do not hate the documentation: _everyone_ finds it difficult to read, and there are good reasons why the technical documentation is written how it is. It is not really meant to teach you R but rather to document it like a flight manual documents a plane: as fully and exactly as possible.
> __Next__: [Practice](013_practice.html).