-
Notifications
You must be signed in to change notification settings - Fork 10
/
home_projects.Rmd
229 lines (186 loc) · 7.95 KB
/
home_projects.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
title: "Projects"
output:
bookdown::html_document2:
highlight: textmate
toc: false
toc_float:
collapsed: true
smooth_scroll: true
print: false
toc_depth: 4
number_sections: false
df_print: default
code_folding: none
self_contained: false
keep_md: false
encoding: 'UTF-8'
css: "assets/lab.css"
include:
after_body: assets/footer-lab.html
---
```{r,child="assets/header-lab.Rmd"}
```
# {.tabset .tabset-fade}
## Datasets
Hands-on analysis of actual data is the best way to learn R programming. This page contains some data sets that you can use to explore what you have learned in this course. For each data set, a brief description as well as download instructions are provided.
<div class="alert alert-info">
<strong> Try to focus on using the tools from the course to explore the data, rather than worrying about producing a perfect report with a coherent analysis workflow.</strong>
</div>
On the last day you will present your Rmd file (or rather, the resulting html report) and share with the class what your data was about.
---
### Palmer penguins 🐧
- This is a data set containing a series of measurements for three species of penguins collected in the Palmer station in Antarctica.
- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/heplots/peng.html>
<details>
<summary>Download instructions</summary>
```{r, warning=F, message=F}
penguins <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/heplots/peng.csv", header = T, sep = ",")
str(penguins)
```
</details>
---
### Drinking habits 🍷
- Data from a national survey on the drinking habits of american citizens in 2001 and 2002.
- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/stevedata/nesarc_drinkspd.html>
<details>
<summary>Download instructions</summary>
```{r}
library(dplyr)
# this will download the csv file directly from the web
drinks <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/stevedata/nesarc_drinkspd.csv", header = T, sep = ",")
# the lines below will take a sample from the full data set
set.seed(seed = 2)
drinks <- sample_n(drinks, size = 3000, replace = F)
# and here we check the structure of the data
str(drinks)
```
</details>
---
### Car crashes 🚗
- Data from car accidents in the US between 1997-2002.
- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/DAAG/nassCDS.html>
<details>
<summary>Download instructions</summary>
```{r}
library(dplyr)
# this will download the csv file directly from the web
crashes <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/DAAG/nassCDS.csv", header = T, sep = ",")
# the lines below will take a sample from the full data set
set.seed(seed = 2)
crashes <- sample_n(crashes, size = 3000, replace = F)
# and here we check the structure of the data
str(crashes)
```
</details>
---
### Gapminder health and wealth 📈
- This is a collection of country indicators from the Gapminder dataset for the years 2000-2016.
- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/dslabs/gapminder.html>
<details>
<summary>Download instructions</summary>
```{r}
library(dplyr)
# this will download the csv file directly from the web
gapminder <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/dslabs/gapminder.csv", header = T, sep = ",")
# here we filter the data to remove anything before the year 2000
gapminder <- gapminder |> filter(year >= 2000)
# and here we check the structure of the data
str(gapminder)
```
</details>
---
### StackOverflow survey 🖥️
- This is a downsampled and modified version of one of StackOverflow's annual surveys where users respond to a series of questions related to careers in technology and coding.
- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/modeldata/stackoverflow.html>
<details>
<summary>Download instructions</summary>
```{r}
library(dplyr)
# this will download the csv file directly from the web
stackoverflow <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/modeldata/stackoverflow.csv", header = T, sep = ",")
# the lines below will take a sample from the full data set
set.seed(2)
stackoverflow <- sample_n(stackoverflow, size = 3000)
# and here we check the structure of the data
str(stackoverflow)
```
</details>
---
### Doctor visits 🤒
- Data on the frequency of doctor visits in the past two weeks in Australia for the years 1977 and 1978.
- Data description: <https://vincentarelbundock.github.io/Rdatasets/doc/AER/DoctorVisits.html>
<details>
<summary>Download instructions</summary>
```{r}
library(dplyr)
# this will download the csv file directly from the web
doctor <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/AER/DoctorVisits.csv", header = T, sep = ",")
# the lines below will take a sample from the full data set
set.seed(2)
doctor <- sample_n(doctor, size = 3000)
# and here we check the structure of the data
str(doctor)
```
</details>
---
### Video Game Sales 🎮
- This data set contains sales figures for video games titles released in 2001 and 2002.
- Data description: <https://mavenanalytics.io/data-playground?order=date_added%2Cdesc&search=Video%20Game%20Sales>
- Click on "Preview Data" and "VG Data Dictionary" to see the description for each column.
<details>
<summary>Download instructions</summary>
```{r, warning=F, message=F}
library(dplyr)
library(lubridate)
# this will download the file to your working directory
download.file(url = "https://maven-datasets.s3.amazonaws.com/Video+Game+Sales/Video+Game+Sales.zip", destfile = "video_game_sales.zip")
# this will unzip the file and read it into R
videogames <- read.table(unz(filename = "vgchartz-2024.csv", "video_game_sales.zip"), header = T, sep = ",", quote = "\"", fill = T)
# this will select rows corresponding to years 2001 and 2002
videogames <- filter(videogames, year(as_date(release_date)) %in% c(2001,2002))
# and here we check the structure of the data
str(videogames)
```
</details>
---
### LEGO Sets 🏗️
- This data set contains the description of all LEGO sets released from 2000 to 2009.
- Data description: <https://mavenanalytics.io/data-playground?order=date_added%2Cdesc&search=lego>
- Click on "Preview Data" and "VG Data Dictionary" to see the description for each column.
<details>
<summary>Download instructions</summary>
```{r, warning=F, message=F}
library(dplyr)
# this will download the file to your working directory
download.file(url = "https://maven-datasets.s3.amazonaws.com/LEGO+Sets/LEGO+Sets.zip", destfile = "lego.csv.zip")
# this will unzip the file and read it into R
lego <- read.table(unz(filename = "lego_sets.csv", "lego.csv.zip"), header = T, sep = ",", quote = "\"", fill = T)
# this will select rows corresponding to years 2000-2009
lego <- filter(lego, year %in% seq(2000,2009,1))
# and here we check the structure of the data
str(lego)
```
</details>
---
### Shark attacks 🦈
- This data set contains information on shark attack records from all over the world.
- Data description: <https://mavenanalytics.io/data-playground?order=date_added%2Cdesc&search=shark>
- Click on "Preview Data" and "VG Data Dictionary" to see the description for each column.
<details>
<summary>Download instructions</summary>
```{r, warning=F, message=F}
library(dplyr)
# this will download the file to your working directory
download.file(url = "https://maven-datasets.s3.amazonaws.com/Shark+Attacks/attacks.csv.zip", destfile = "attacks.csv.zip")
# this will unzip the file and read it into R
sharks <- read.table(unz(filename = "attacks.csv", "attacks.csv.zip"), header = T, sep = ",", quote = "\"", fill = T)
# the lines below will take a sample from the full data set
set.seed(seed = 2)
sharks <- sample_n(sharks, size = 3000, replace = F)
str(sharks)
```
</details>
***
## Sample project report
<p><iframe style="border: none;" title="workshop" src="https://nbisweden.github.io/workshop-r/2410-canvas/Gapminder_project.html" width="100%" height="900px"></iframe></p>