-
Notifications
You must be signed in to change notification settings - Fork 31
/
RamnivasS-TidyverseCreate.Rmd
120 lines (100 loc) · 4.06 KB
/
RamnivasS-TidyverseCreate.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: 'Data 607 : Assignment 2 - Tidyverse Create'
author: "Ramnivas Singh"
date: "02/14/2021"
output:
html_document:
theme: default
highlight: espress
toc: true
toc_depth: 4
toc_float:
collapsed: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Overview
Task here is to Create an example. Using one or more TidyVerse packages, and any dataset from fivethirtyeight.com or Kaggle, create a programming sample “vignette” that demonstrates how to use one or more of the capabilities of the selected TidyVerse package with your selected dataset.
I picked up 'COVID-19 World Vaccination Progress' dataset from [Kaggle](https://www.kaggle.com/gpreda/covid-world-vaccination-progress)
## Summary
The worldwide endeavor to create a safe and effective COVID-19 vaccine is bearing fruit. A handful of vaccines now have been authorized around the globe; many more remain in development.The biggest vaccination campaign in history is underway. More than 172 million doses have been administered across 77 countries, according to data collected by Bloomberg. The latest rate was roughly 5.92 million doses a day. To bring this pandemic to an end, a large share of the world needs to be immune to the virus.
in this assignment, Kaggle dataset will be used for analysis to apply Tidyverse capabilities.
## Tidyverse
The tidyverse is a coherent system of packages for data manipulation, exploration and visualization that share a common design philosophy. Tidyverse packages are intended to make statisticians and data scientists more productive by guiding them through workflows that facilitate communication, and result in reproducible work products.
```
Data Wrangling and Transformation
* dplyr
* tidyr
* stringr
* forcats
Data Import and Management
* tibble
* readr
Functional Programming
* purrr
Data Visualization and Exploration
* ggplot2
```
More information on tidyverse can be found [here](https://tidyverse.org)
## Data Collection
Data set from [Kaggle](https://www.kaggle.com/gpreda/covid-world-vaccination-progress) is used for this assignment.
## Load tidyverse packages
```{r}
library(tidyverse) # Load all "tidyverse" libraries.
# OR
# library(readr) # Read tabular data.
# library(tidyr) # Data frame tidying functions.
# library(dplyr) # General data frame manipulation.
# library(ggplot2) # Flexible plotting.
library(viridis) # Viridis color scale.
```
## Load Vaccination Progress data
```{r}
#Source: https://www.kaggle.com/gpreda/covid-world-vaccination-progress/download
url='https://raw.githubusercontent.com/rnivas2028/MSDS/Data607/Tidyverse/country_vaccinations.csv'
country_vaccinations <- read.csv(url(url))
```
## Tidyverse capabilities
### glimpse
To look at the variable names and types
```{r}
glimpse(country_vaccinations)
```
### summary
To get an overview of data set
```{r}
summary(country_vaccinations)
```
### select
To get selected columns from a large data sets with many columns
```{r}
head(country_vaccinations%>%select(country, total_vaccinations, people_vaccinated, people_fully_vaccinated),5)
```
### rename
To rename columns in a data set
```{r}
head(rename(country_vaccinations, 'Total Vaccinations'=total_vaccinations),5)
```
### sample_n
To pick random sample from the data set
```{r}
head(sample_n(country_vaccinations, 5))
```
### group_by
To group data set into smaller data groups
```{r}
by_country <- group_by(country_vaccinations, country)
summarise <- summarise(by_country, count = n(),
country_vaccinations_mean = mean(total_vaccinations, na.rm = TRUE))
by_country <-head(summarise %>% arrange(desc(country_vaccinations_mean)))
```
### ggplot
```{r}
ggplot(by_country, aes(x=country_vaccinations_mean, y=country)) + geom_point()
ggplot(data=by_country, aes(x=(reorder(country, country_vaccinations_mean)), y = country_vaccinations_mean))+
geom_bar(stat="identity", fill="#FF6600")+ coord_flip()+
labs(title="Average of country vaccinations", x= "Country", y = "Country Vaccinations Mean")+
geom_text(aes(label=round(country_vaccinations_mean, digits = 2)))+
theme(plot.title=element_text(hjust=0.5))
```