forked from bioinformatics-core-shared-training/r-intro
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathweek1.Rmd
764 lines (532 loc) · 25.6 KB
/
week1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
---
title: "Data types and structures"
---
> #### Learning objectives
>
> * Install R and RStudio
> * Install the tidyverse collection of R packages
> * To introduce Rstudio that we will be using, to write R scripts, in this course
> * To introduce data types and data structures
# Installing software you will need
Before starting this course you will need to ensure that your computer is set up with the required software.
If you have any difficulty installing any of this software then please contact one of the trainers for help.
# R and RStudio
**R** and **RStudio** are separate downloads and installations.
**R** is the underlying statistical computing environment. The base R system and a very large collection of packages that give you access to a huge range of statistical and analytical functionality are available from
[CRAN](https://cran.r-project.org), the Comprehensive R Archive Network.
However, using R alone is no fun. **RStudio** is a graphical integrated
development environment (IDE) that makes using R much easier and more
interactive. You need to install R before you install RStudio.
On this course we will be making use of a brilliant collection of packages designed for data science called the **`tidyverse`** that make it much easierand more fun to work with your data. After installing R and RStudio, follow the instructions at the bottom of this page to install the `tidyverse`.
## Windows
#### If you already have R and RStudio installed
* Open RStudio, and click on "Help" > "Check for updates". If a new version is
available, quit RStudio, and download the latest version for RStudio.
To check which version of R you are using, start RStudio and the first thing
that appears in the console indicates the version of R you are running.
Alternatively, you can type `sessionInfo()`, which will also display which
version of R you are running. Go on the
[CRAN website](https://cran.r-project.org/bin/windows/base/) and check whether
a more recent version is available. If so, please download and install it. You
can [check here](https://cran.r-project.org/bin/windows/base/rw-FAQ.html#How-do-I-UNinstall-R_003f) for more information on how to remove old versions from your system if you wish to do so.
#### If you don't have R and RStudio installed
* Download R from the [CRAN website](https://cran.r-project.org/bin/windows/base/release.htm).
* Run the `.exe` file that was just downloaded
* Go to the [RStudio download page](https://posit.co/download/rstudio-desktop/#download)
* Under *Installers* select **RStudio-2023.xx.y-zzz.EXE - Windows 10/11** (where x, y, and z represent version numbers)
* Double click the file to install it
* Once it's installed, open RStudio to make sure it works and you don't get any error messages.
## macOS
#### If you already have R and RStudio installed
* Open RStudio, and click on "Help" > "Check for updates". If a new version is available, quit RStudio, and download the latest version for RStudio.
To check the version of R you are using, start RStudio and the first thing that
appears on the terminal indicates the version of R you are running.
Alternatively, you can type `sessionInfo()`, which will also display which
version of R you are running. Go on the
[CRAN website](https://cran.r-project.org/bin/macosx/) and check whether a more
recent version is available. If so, please download and install it.
#### If you don't have R and RStudio installed
* The first thing you should do is check what chip your Mac has
* How to check chip type on Mac? Click on apple logo on top left, then chick on About This Mac.
![Intel chip](./images/mac_intel.jpg)
![M chip](./images/Mac_Mchip.png)
* Download R from
the [CRAN website](https://cran.r-project.org/bin/macosx/).
![](./images/R_mac_types.jpg)
* Select appropriate R `.pkg` file for the latest R version
* For M type chip: download R-x.y.z-arm64.pkg (where x, y, and z represent version numbers)
* For inter chip: download R-x.y.z.pkg (where x, y, and z represent version numbers)
* Double click on the downloaded file to install R
* It is also a good idea to install [XQuartz](https://www.xquartz.org/) (needed
by some packages)
* Go to the [RStudio download page](https://posit.co/download/rstudio-desktop/#download)
* Under *Installers* select **RStudio-2023.xx.y-zzz.DMG - Mac OS 11+ (64-bit)** (where x, y, and z represent version numbers)
* Double click the file to install RStudio
* Once it's installed, open RStudio to make sure it works and you don't get any error messages.
## Linux
* Follow the instructions for your distribution
from [CRAN](https://cloud.r-project.org/bin/linux), they provide information
to get the most recent version of R for common distributions. For most
distributions, you could use your package manager (e.g., for Debian/Ubuntu run
`sudo apt-get install r-base`, and for Fedora `sudo yum install R`), but we
don't recommend this approach as the versions provided by this are
usually out of date. In any case, make sure you have at least R 3.3.1.
* Go to the
[RStudio download page](https://posit.co/download/rstudio-desktop/#download)
* Under *Installers* select the version that matches your distribution, and
install it with your preferred method (e.g., with Debian/Ubuntu `sudo dpkg -i RSTUDIO-2023.xx.y-zzz-AMD64.DEB` at the terminal).
* Once it's installed, open RStudio to make sure it works and you don't get any error messages.
# Tidyverse
After installing R and RStudio, please install the `tidyverse` packages.
* After starting RStudio, at the console type:
`install.packages("tidyverse")`
(look for the 'Console' tab and type at the `>` prompt)
* You can also do this by going to Tools -> Install Packages and typing the names of the packages separated by a comma.
# Introduction to R and RStudio
# Why learn R?
* R involves creating & using scripts which makes the steps you used in your analysis clear and can be inspected by someone else for feedback and error-checking.
* R code is great for reproducibility. An increasing number of journals and funding agencies expect analyses to be reproducible, so knowing R will give you an edge with these requirements.
* R integrates with other tools to generate manuscripts from your code.
This document (RMarkdown a .Rmd file) is a case in point.
* R is interdisciplinary and extensible and has thousands of installable packages to extend its capabilities. R has packages for image analysis, GIS, time series, population genetics, and a lot more.
* R scales well to work on data of all shapes and sizes.
* R can connect to spreadsheets, databases, and many other data formats, on your computer or on the web.
* R produces high-quality graphics suitable for publication in journals or the web.
* R has a large and welcoming community - Thousands use R daily and many of them are willing to help you through websites such as Stack Overflow or the RStudio community.
* Not only is R free, but it is also open-source and cross-platform.
---
# Rstudio a brief tour
Rstudio provides us with a friendly interface to the R statistical progrmming language.
It consists of four main "Panes". These can be re-sized and moved around to suit how
you like to work.
<img src="images/Rstudioataglance.png" alt="Rstudio screenshot" width="90%">
## Editing pane
By default the top left-hand pane is one for creating, editing & running R scripts.
<img src="images/Rstudio_scripting.png" alt="Rstudio edit pane screenshot" width="90%">
A script is an R program that you have written. A good practice is for that script to
perform only one role in your analysis workflow and so you may have several R scripts which you call, in a particular sequence, to analyse your data.
As you will see, a script is basically a text file that contains R commands and
(ideally) comments to explain what the codes function is (as a documentation process).
As well as R scripts, there are many types of Rstudio document including Markdown files which we will use in the teaching of this course. These can provide interactive workbooks or pdf and web documents to name but a few possible outcomes.
## Console
Coming down the screen to the bottom left-hand pane we find the console window. This is where we can find output produced by running our R scripts.
<img src="images/Rstudio_console.png" alt="Rstudio screenshot" width="90%">
We can also try out snippets of R code here. Those of you who have only used graphical interfaces like Windows or MacOS where you click on commands using a mouse may find this aspect of R somewhat different. We type in commands to R using the command line.
This area can also be used like a calculator. Let's just type in something like `23 + 45` followed by the return key and see what happens. You should get the following:
```
> 23 + 45
[1] 68
```
Now 68 is clearly the answer but what is that 1 in brackets?
Here is another example to explain. If we type `1:36` and press enter, what happens?
R generates output counting from 1 to 36 but cannot fit all the output on one line and so starts another like this:
```
> 1:36
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36
```
Now we have two lines beginning with a number in square brackets. Note that the
number of values displayed on each line may differ on your computer; it largely
depends on the width of your console pane and the font size. Try creating a
larger sequence of numbers, e.g. `1:100`, if all 36 numbers fit on a single line
in your case.
This is just R helping us to keep tabs on which number we are looking at. `[1]` denotes that the line starts with the first result and the `[26]` denotes that this line starts with the 26th number. Let's try another one and generate a sequence incrementing in steps of 2:
```
> 1:36 * 2
[1] 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50
[26] 52 54 56 58 60 62 64 66 68 70 72
```
There are other tabs on this pane but we shall not be covering these on this course.
## Environment
Next we move to the top right-hand corner pane. Here we have even more tabs (of which we will only consider two Environment and History).
<img src="images/Rstudio_env.png" alt="Rstudio screenshot" width="80%">
Environment keeps track on R variables which we create (more on those later) and their contents. History is like a tally role of all the R commands we have entered in our session.
## Files, Plots & Help
Our final bottom right-hand pane also has several tabs. The Files tab is a file explorer that enable us to move around our directories and select which files we wish to work on. We can also change the default working directory that Rstudio will use.
<img src="images/Rstudio_filexpl.png" alt="Rstudio screenshot" width="80%">
The Plots tab is where any graphs that we create in R will appear. We can move through them using the arrow buttons and the export button will convert them to different graphics formats e.g. for publication in a paper or for the web.
The Packages tab shows which R packages are installed (These expand R's functionallity and again will be covered later) and can also install new packages.
The Help tab is a _massively useful_ tab which enables you to search R help index to get help pages on R functions and provide example code to help you use them in your R scripts.
---
# Our first look at the R language
Our overall goal for this course is to give you the ability to import your data into R, select a subset
of the data most of interest for a given analysis, carry out an analysis to summarize these data and
create visualizations of the data.
First though, let us consider __“What is Data?”__
Data comes in many forms: Numbers (Integers and decimal values) or alphabetical (characters or lines of text). Clearly a computer (or R) needs a way of representing this wide range of data with it’s diverse properties.
## Data types in R
![Image source:https://www.javatpoint.com/r-data-types](./images/r-programming-data-types.png)
* R has 6 basic data types
* __character__
* character is nothing but a string
* anything between two quotes (Single or double quotes): "a", 'cat'
* Quoted numbers are character type: '3.14'
* __numeric__ or __double__: Set of all real numbers.
* A number. Could be a integer or a real number: 3.14, 1.45765, 5, 1000000
* __integer__: Whole numbers. 5L (the L tells R to store this as an integer)
* __logical__: TRUE, FALSE
* __complex__: 1+4i (complex numbers with real and imaginary parts)
* __raw__:
* The last two data types are rarely used in practice
* Different types of data are needed in (any) programming for a variety of reasons:
* Processing instructions are different for different data types. For instance mathematical operations (+, -, x and /) are only meaningful for numbers.
```{r, eval=FALSE}
11 + 3 # Operation of addition performed correctly
"11" + 3 # gives error
```
* Efficient storage: Integer type uses less memory when compared to decimal (numeric) data type.
### R Arithmetic Operators
Mathematical operations such as addition and multiplication are performed using various operators. Here is a list of R's arithmetic operators.
* Addition: +
* Subtraction: -
* Multiplication: *
* Division: /
* Exponent: ^
* Modulus (Remainder from division): %%
```{r}
5*5 # 5 times 5
7/3 # 7 divided by 3
7%%3 # reminder of 7 divided by 3
```
## Data structures in R
![Image source:http://venus.ifca.unican.es/Rintro/dataStruct.html](./images/dataStructures.png)
* R has many data structures. These include
* Atomic vector
* data frame
* matrix
* list
* factors
### Atomic vector
* This is the fundamental data structure in R
* All the other data structures built on vectors
* `Understanding this structure is absolutely essential`
![Image source:https://www.geeksforgeeks.org/r-vector/](./images/Vectors-in-R.jpg)
```{r}
x <- 100 # create first vector
```
* "x" is an `object or a variable`, both of which are used interchangeably in this course.
* `<-` An assignment operator one can also use `=` instead of `<-`
* Meaning of above expression is x gets 100
```{r}
x <- 100
x = 100
```
* After a variable/object is created, it can be used as many times as needed.
```{r}
y <- 10
y*y
y + y
100 + y
```
* Variable values can be replaced
```{r}
x <- "Tom"
x
x <- "Jerry"
x
```
* variable names can not start with a number or a special characters like (_, -, etc)
```{r, eval=FALSE}
2x <- 100 # gives error
_x <- 100 # gives error
```
* only allowed special characters in variable names are "_" and "." any other special character throws an error]
```{r, eval=FALSE}
my-name <- "Chandra" # throws error
my_name <- "Chandra" # no error
my.name <- "Chandra" # no error
```
* R is case sensitive
* COUNTRY and country are two different variables
```{r}
COUNTRY <- "United Kingdom"
country <- "India"
```
* `c()` function should be used to create a vector that holds more than one value
* c stands for combine
* values are separated by ","
* When creating single or multiple values as a beginner, it would be advisable to use the function `c()`
```{r, eval=FALSE}
x <- 100
x <- c(100) # same like above
x <- 100, 200 # gives error
x <- c(100, 200) # no error
```
* ":" operator that generates a range of values
```{r}
x <- c(1:100) # create values from 1 to 100
```
* The values in a vector can be of one and only one type, such as numerics, integers, characters, logic, complexes, or raw data.
* When we attempt to mix different data types in a single vector, R automatically converts the data types, this phenomenon is called [Coercion](https://www.zigya.com/blog/what-is-coercion-in-r/)
```{r}
x <- c(1,2,3,4)
typeof(x)
x <- c(1,"2",3,4)
typeof(x)
y <- c(TRUE, FALSE, TRUE, 1L)
y
typeof(y)
z <- c(TRUE, FALSE, FALSE)
typeof(z)
z <- c(TRUE, FALSE, "FALSE")
z
typeof(z)
```
* Logical values TRUE and FALSE are internally represented as 1 and 0. Therefore mathematical operations can be performed on these vectors.
```{r}
y <- c(TRUE, FALSE, TRUE)
y
sum(y)
mean(y)
```
* Vectorization in R
* It is important to note that most of R's functions are **vectorized**, which means that they operate on all elements of a vector without looping through each element one by one. Coding becomes more concise, easier to read, and less prone to errors as a result.
```{r}
x <- c(1,2,3,4,5)
x * 5 # same as x * c(5)
x + 1 # same as x + c(1)
```
* Vectorization is powerful, quick and concise, but leads to confusion when vector lengths are different.
```{r}
x <- c(1,2,3,4,5,6)
y <- c(1,2,3,4,5,6)
x + y
```
```{r}
x <- c(1,2,3,4,5)
y <- c(1,2)
x + y
```
![](./images/vec_recycling.png)
* How to access values from a vector?
* Use `[]` subscript operator
* within `[]` one can give any of the following
* Vector of index numbers
* Vector of logical values
```{r}
vec <- c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
vec
```
```{r}
# extract 4th value from the vector var
vec[4]
vec[c(4)] # good idea to use c() function even for single value
```
```{r}
# extract 4th and 7th values from the vector var
vec[c(4,7)]
```
* One can inverse using "-" symbol
```{r}
# extract all the values except 4th and 7th value
vec[c(-4,-7)]
```
* Extracting values using logical vectors
```{r}
y <- c( 5, 8, 10)
y[c(FALSE, TRUE, FALSE)] # extract second element
```
* Logic vectors are not created manually in reality
* Logic vectors are the outputs of comparison operators
* comparison operators
* Equal to ==
* Not equal to !=
* Greater than >
* Less than <
* Greater than or equal to >=
* Less than or equal to <=
```{r}
x <- c(10, 20, 30, 40)
x == 20
x > 20
keep <- x > 20
keep
x[keep]
x[ x > 20 ] # The equivalent of x[keep]
```
* By using logical operators one can create complex expression for more complex subsetting
* There are 3 logical operators in R
* & : logical and
* | : logical or
* ! : logical not
![](./images/truth_table.jpg)
* For instance if you want to extract all the values in a vector that are greater than 10 but less than 40
```{r}
x <- c(10, 20, 30, 40)
x[x > 10] # get all the values > 10
x[x < 40] # get all the values < 40
x[ x > 10 & x < 40] # get all the values > 10 and < 40
```
```{r}
x[ x == 20] # get values that are equal to 20
x[ !x == 20] # equivalent to x[ x != 20]
```
* It is possible to selectively replace vector values.
```{r}
x <- c(10, 20, 30)
x[2]
x[2] <- 1000
x
```
### Functions and their arguments
Functions are a fundamental building block of R. Functions are "canned scripts" that automate more complicated sets of commands including operations assignments, etc. Many functions are predefined, or can be made available by importing R packages (more on that later). A function usually takes one or more inputs called arguments. Functions often (but not always) return a value. A typical example would be the function `round()`. The input (the argument) must be a number, and the return value (in fact, the output) is the rounded number. Executing a function (‘running it’) is called calling the function. An example of a function call is:
```{r}
pi <- 3.141593
round(pi)
```
* `round` is a function that takes at lest one number and returns a number that rounded to the nearest integer.
* In R, all functions have the same syntax, which is function name followed by `()`
* Depends on a function within "()" you supply zero to many arguments
* The names and numbers of arguments vary from function to function
* How to know what augments function has?
* Get help of that function
* use `args()` function
* How to get help in R?
* How to get help in R details: https://www.r-project.org/help.html
* By typing `?` or `help()` followed by the name of the function in the console, for example to get help with the round function, type "?round" in the console
* In Rstudio under the help tab one can search for a given function
* On general Google search, for instance "round + r function + documentation"
```{r}
?round
help(round) # equivalent to ?round
```
* Use `args()` to view the arguments of a function
```{r}
args(round)
```
* Using the help or the args function, you can see that `round()` takes exactly two arguments
* x: a numeric vector
* digits: integer indicating the number of decimal places to round
```{r}
round(x=pi, digits = 0)
round(x=pi, digits = 2)
round(x=pi, digits = 4)
```
* As long as you use argument names order of the arguments does not matter.
```{r}
round( digits = 4, x=pi)
```
* Some useful math/stat functions in R
* max(): maximum value in a numeric vector
* min(): minimum value in a numeric vector
* range(): vector of min and max
* sum(): sum of a vector
* mean(): mean of a vector
* median(): median of a vector
* var(): variance of a vector
* sd(): standard deviation of a vector
* sort(): sorted version of a vector
* length(): length of an object
* cor(): correlation of x and y
* data type conversion functions: as."datatype" family of functions are useful for converting one data type to other
* as.numeric()
* as.character()
* as.integer()
### Missing data
As R was designed to analyze datasets, it includes the concept of missing data (which is uncommon in other programming languages). Missing data are represented in vectors as NA.
When doing operations on numbers, most functions will return NA if the data you are working with include missing values. This feature makes it harder to overlook the cases where you are dealing with missing data. You can add the argument na.rm = TRUE to calculate the result while ignoring the missing values.
```{r}
heights <- c(2, 4, 4, NA, 6)
mean(heights)
max(heights)
mean(heights, na.rm = TRUE)
max(heights, na.rm = TRUE)
```
If your data include missing values, you may want to become familiar with the function is.na() See below for examples.
```{r}
## Extract those elements which are not missing values.
heights[!is.na(heights)]
```
# Challenges
:::exercise
1. You have given a list of tumour volumes from 5 patients, 2.1, 1.9, 2.6, 1.8, and 3 $cm^3$. Convent this data into a R vector and get the following characteristics of data by applying some of the functions listed above.
a. How many observations we have?
b. What is the mean tumour volume?
c. How many patients has tumour volume less than 2 $cm^3$?
<details><summary>Answer</summary>
```{r ex_1, purl=FALSE}
# Challenge 1 a: How many observations we have?
tumour_vol <- c(2.1, 1.9, 2.6, 1.8,3)
length(tumour_vol)
# Challenge 1 b: What is the mean tumour volume?
mean(tumour_vol)
# Challenge 1 c: How many patients has tumour volume less than 2
sum(c(2.1, 1.9, 2.6, 1.8,3) < 2)
```
</details>
:::
:::exercise
2. You have given two vectors of observations data1 <- c(10, 9, 7, 6, 7, 3, 7, 5, 6, 6) and data2 <- c(5, 2, 10, 7, 2, 5, 1, 5, 3, 4).
a. use the function `cor()` to get the correlation coefficient
b. Can you identify the default correlation method the `cor()` function uses? For help use `help()` or `?`
c. Can you get Spearman correlation coefficient for these two vectors?
<details><summary>Answer</summary>
```{r ex_2, purl=FALSE}
# Challenge 2 a: use the function `cor()` to get the correlation coefficient
data1 <- c(10, 9, 7, 6, 7, 3, 7, 5, 6, 6)
data2 <- c(5, 2, 10, 7, 2, 5, 1, 5, 3, 4)
cor(x=data1, y=data2)
# Challenge 2 b: Can you identify the default correlation method the `cor()` function uses?
?cor
# According to the help in the 'cor' file, if no method is specified by default the function will use the 'pearson' method.
# Challenge 2 c: Can you get Spearman correlation coefficient for these two vectors?
cor(x=data1, y=data2, method = "spearman")
```
</details>
:::
:::exercise
3. You have given a logical vector, logi_vec <- c(TRUE, FALSE, TRUE, TRUE). Can you apply math functions like `sum()`, `mean()` on this logical vector, if so what is the output of sum() and mean()?
<details><summary>Answer</summary>
```{r ex_3, purl=FALSE}
# Challenge 3:
logi_vec <- c(TRUE, FALSE, TRUE, TRUE)
# Mathematical functions can be applied to logical vectors. Internally, the logical values TURE and FALSE are represented as 1 and 0, respectively.
sum(logi_vec)
mean(logi_vec)
```
</details>
:::
:::exercise
4. What is the output of following operations and explain your logic behind it.
a. c(5, 2, 9, 1, 13) * c(2)
b. c(5, 2, 9, 1, 13) * c(1,2)
c. c(5, 2, 9, 1, 13) + c(1,2,3,4,5)
<details><summary>Answer</summary>
```{r ex_4, purl=FALSE}
# Challenge 4 a: c(5, 2, 9, 1, 13) * c(2)
c(5, 2, 9, 1, 13) * c(2)
# Since the shorter vector has only one value, every value of the longer vector is multiplied by the value of the shorter vector.
# Challenge 4 b: c(5, 2, 9, 1, 13) * c(1,2)
c(5, 2, 9, 1, 13) * c(1,2)
# Since the shorter vector has only two values, these two values are sequentially recycled to multiply the longer vector's values.
# Challenge 4 c: c(5, 2, 9, 1, 13) + c(1,2,3,4,5)
c(5, 2, 9, 1, 13) + c(1,2,3,4,5)
# Due to the fact that both vectors have the same length, values are sequentially added together
```
</details>
:::
:::exercise
5. From the vector c(23, 12, 41, 65, 23, 6), can you extract those values that are equal to 23 or less than 15?
<details><summary>Answer</summary>
```{r ex_5, purl=FALSE}
vec <- c(23, 12, 41, 65, 23, 6)
vec[ vec == 23 | vec < 15]
```
</details>
:::
:::exercise
6. "month.name" is a in-built R vector
a. What is the index number of "April" in month.name vector? hist: "which" function may help you.
b. Extract all the months from April to December
<details><summary>Answer</summary>
```{r ex_6, purl=FALSE}
# Challenge 6 a: What is the index number of "April" in month.name vector? hist: "which" function may help you.
which(month.name == "April")
# Challenge 6 b:Extract all the months from April to December
month.name[which(month.name == "April"):length(month.name)]
```
</details>
:::
#### Credit
These instructions were adapted from [Data Carpentry](https://datacarpentry.org)
course materials.