-
Notifications
You must be signed in to change notification settings - Fork 31
/
data-607-tidyverse-TNS.Rmd
80 lines (56 loc) · 2.94 KB
/
data-607-tidyverse-TNS.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: "DATA-607 TIDYVERSE ASSIGNMENT"
author: "Tage N Singh"
date: "`r Sys.Date()`"
output:
prettydoc::html_pretty:
theme: cayman
highlight: vignette
vignette: >
%\VignetteIndexEntry{Vignette Title}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
This vignette will provide a brief exploration of the "tidyverse" package using its built-in libraries
======================================================================================================
The tidyverse is a powerful collection of R packages that are actually data tools All packages of the tidyverse share an underlying common APIs as listed below.
-------------------------------------------------------------------------------
Lists
- ggplot2, which implements the grammar of graphics. You can use it to visualize your data.
- dplyr is a grammar of data manipulation. You can use it to solve the most common data manipulation challenges.
- tidyr helps you to create tidy data or data where each variable is in a column, each observation is a row end each value is a cell.
- readr is a fast and friendly way to read rectangular data.
- purrr enhances R’s functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors.
- tibble is a modern re-imaginging of the data frame.
- stringr provides a cohesive set of functions designed to make working with strings as easy as posssible
- forcats provide a suite of useful tools that solve common problems with factors.
Now for the package
-------------------
```{r setup}
#Loading just the "tidyverse" library !
library(tidyverse)
```
```{r tidyverse usage-1}
# For this exercise we are using a dataset from kaggle.com which contain
#information about avacado sales in various cities in the USA
# ---------- Using the "rear" function "read_csv" -------
avacado <- data.frame(read_csv(file = "https://raw.githubusercontent.com/tagensingh/SPS-DATA607-TIDYVERSE/main/avocado.csv"))
# ---------- Using the "tibble" function to look at a snapshot of the dataframe -------
tibble(avacado)
```
```{r tidyverse usage-2}
# ---------- Using the "arrange" function from "dplyr"to sort the dataframe by date-------
avacado_date <- arrange(avacado,Date)
tibble(avacado_date)
# ---------- Using the "ggplot" function from "ggplot2"to chart the pricing density of avacados-------
# Histogram overlaid with kernel density curve
ggplot(avacado, aes(x=AveragePrice)) +
geom_histogram(aes(y=..density..), # Histogram with density instead of count on y-axis
binwidth=.1,
colour="black", fill="white") +
geom_density(alpha=.1, fill="#FF6666")+# Overlay with transparent density plot
ggtitle("Avacados Pricing Density")
```
As is shown above, the tidyverse packages is one of the most versatile packages in the R sphere....
Test for Git
------------------------------------------------------------------------------------------------