diff --git a/docs/contributing.html b/docs/contributing.html index 911db0a..ba5eb89 100644 --- a/docs/contributing.html +++ b/docs/contributing.html @@ -2,7 +2,7 @@ - + @@ -93,7 +93,7 @@ Contributing diff --git a/docs/index.html b/docs/index.html index ae67d70..abe0bae 100644 --- a/docs/index.html +++ b/docs/index.html @@ -2,7 +2,7 @@ - + @@ -153,7 +153,7 @@ Contributing @@ -186,7 +186,7 @@

DaSL Data Snacks

+
Categories
All (7)
Python (1)
R (4)
Tables (1)
Visualization (2)
ggplot2 (1)
graphics (2)
@@ -227,7 +227,7 @@
Categories
-
+

@@ -248,7 +248,7 @@

-
+

@@ -269,49 +269,46 @@

-
+

-Large Data Work: Intro to parquet files in R +Working with ggplot2: A Short Guide

- +
-
+
-
+
-
+
-
+

diff --git a/docs/listings.json b/docs/listings.json index 793ba9e..eca7830 100644 --- a/docs/listings.json +++ b/docs/listings.json @@ -2,7 +2,7 @@ { "listing": "/index.html", "items": [ - "/r_snacks/parquet.html", + "/r_snacks/ggplot.html", "/r_snacks/naniar.html", "/python_snacks/wordcloud.html", "/r_snacks/patchwork.html", diff --git a/docs/python_snacks/wordcloud.html b/docs/python_snacks/wordcloud.html index e9bcb28..96935e1 100644 --- a/docs/python_snacks/wordcloud.html +++ b/docs/python_snacks/wordcloud.html @@ -2,7 +2,7 @@ - + @@ -41,8 +41,6 @@ - - - - + @@ -98,7 +95,7 @@ Contributing
@@ -157,75 +154,48 @@

Visualize text frequency with {wordcloud}

Show your data

We download US Presidential State of the Union speeches as a demo dataset - from Washington to Obama.

-
-
+
from wordcloud import WordCloud
+import matplotlib.pyplot as plt
 
-
- -
+with open('state_union_part1.txt', 'r') as file: + text1 = file.read() + +with open('state_union_part2.txt', 'r') as file: + text2 = file.read() + +text = text1 + text2 + +print("Here's an example section of the text:", text[6030:6400])

Demonstrate wordcloud

-
-
+
# Generate a word cloud image
+wordcloud = WordCloud(max_font_size=40).generate(text)
 
-
- -
+# Display the generated image: +plt.imshow(wordcloud, interpolation='bilinear') +plt.axis("off") +plt.show()

Your turn!

What happens when you change the max_font_size?

-
-
+
wordcloud2 = WordCloud(max_font_size=______).generate(text)
 
+# Display the generated image:
+plt.imshow(wordcloud2, interpolation='bilinear')
+plt.axis("off")
+plt.show()
- -
-
- - - - -
-
-
-
-
-
-
- - + + + + + + + + + + + + + + + + + + + + + + + +
+
+ +
+ +
+
+
+

Working with ggplot2: A Short Guide

+
+
R
+
Visualization
+
ggplot2
+
+
+
+ + +
+ +
+
Author
+
+

Lore

+
+
+ +
+
Published
+
+

September 30, 2024

+
+
+ + +
+ + +
+ + + + +
+ + + + + +
+

Introduction to ggplot2

+

In this short guide, we’ll explore how to use ggplot2 to create visualizations using the cars dataset. This dataset contains the speed of cars and the distances taken to stop.

+
+

Basic Scatter Plot

+

Let’s start with a simple scatter plot of speed versus stopping distance:

+
+
#| autorun: false
+library(ggplot2)
+
+# Load the cars dataset
+data(cars)
+
+# Basic scatter plot with ggplot2
+ggplot(cars, aes(x = speed, y = dist)) +
+  geom_point() +
+  labs(title = "Speed vs Stopping Distance",
+       x = "Speed (mph)",
+       y = "Stopping Distance (ft)")
+
+
+
+

Adding a Smoothing Line

+

We can add a trend line to the scatter plot to visualize the relationship between speed and stopping distance.

+
+
#| autorun: false
+ggplot(cars, aes(x = speed, y = dist)) +
+  geom_point() +
+  geom_smooth(method = "lm", se = FALSE) +
+  labs(title = "Speed vs Stopping Distance with Trend Line",
+       x = "Speed (mph)",
+       y = "Stopping Distance (ft)")
+
+
+
+

Customizing the Plot

+

Finally, let’s customize the colors and theme of the plot.

+
+
#| autorun: false
+ggplot(cars, aes(x = speed, y = dist)) +
+  geom_point(color = "blue") +
+  geom_smooth(method = "lm", se = FALSE, color = "red") +
+  labs(title = "Customized Speed vs Stopping Distance",
+       x = "Speed (mph)",
+       y = "Stopping Distance (ft)") +
+  theme_minimal()
+
+ + +
+
+ +

Citation

BibTeX citation:
@online{2024,
+  author = {, Lore},
+  title = {Working with Ggplot2: {A} {Short} {Guide}},
+  date = {2024-09-30},
+  langid = {en}
+}
+
For attribution, please cite this work as:
+Lore. 2024. “Working with Ggplot2: A Short Guide.” +September 30, 2024. +
+ +
+ + + + + \ No newline at end of file diff --git a/docs/r_snacks/gtsummary.html b/docs/r_snacks/gtsummary.html index 0c5a23f..454a0bf 100644 --- a/docs/r_snacks/gtsummary.html +++ b/docs/r_snacks/gtsummary.html @@ -2,7 +2,7 @@ - + @@ -41,8 +41,6 @@ - - - - + @@ -98,7 +95,7 @@ Contributing @@ -184,14 +181,11 @@

Learning ObjectivesOur Cohort: Penguins

We’re going to use the palmerpenguins dataset as our example cohort. As a reminder, here’s the first few rows of this dataset.

-
-
+
library(palmerpenguins)
+library(gtsummary)
+library(dplyr)
 
-
- -
+gt(head(penguins))
@@ -199,38 +193,29 @@

Summary Table of

{gtsummary} lets you build up a summary demographics table with dplyr commands and special summarization commands.

Here, we’re

-
-
- -
- -
+
penguins |>
+  select(species, island, bill_length_mm) |>
+  tbl_summary()

Comparing Groups

-
-
- -
- -
+
penguins |>
+  tbl_summary(include=c(island, bill_length_mm),
+              by=species,
+              missing="no")
+

We can also add N’s and P-values:

-
-
- -
- -
+
penguins |>
+  tbl_summary(include=c(island, bill_length_mm),
+              by=species,
+              missing="no") |>
+  add_n() |>
+  add_p()
+

Here you can see we did a chi-squared test to look at combinations of island and species, and we did a Kruskal-Wallis rank sum to compare bill_length_mm across species.

This is just the tip of the iceberg for {gtsummary}. You also can output to Microsoft Word for further tweaks.

@@ -239,34 +224,10 @@

Comparing Groups

Packages Used

-
-
- -
- -
+
sessionInfo()
- - -
-
- -
-
- -
-
-

Citation

BibTeX citation:
@online{laderas2024,
@@ -279,17 +240,6 @@ 

Packages Used

Laderas, Ted. 2024. “Make Your Table 1 with {Gtsummary}.” September 4, 2024.
- - - - - - + @@ -98,7 +95,7 @@ Contributing @@ -169,14 +166,10 @@

On this page

Our Dataset

-
-
+
data(penguins)
+library(gt)
 
-
- -
+gt(head(penguins))
@@ -188,25 +181,12 @@

Visualizi

My favorite way to look for these patterns is a package called {naniar} written by my friend Nick Tierney. naniar visualizes rows of data as lines in a rectangle. Columns are represented by line sections.

Let’s take a look at the missing values in the penguins data.

-
-
- -
- -
+
library(naniar)
+vis_miss(penguins)

What I like about this visual representation is that it lets you see the association of missing values as holes in the visualization, as well as percent missing values in each variable. In this example, you can see that some penguins are missing information such as sex.

-
-
- -
- -
+
gg_miss_upset(penguins)

In this example, reading the combinations from left to right, we can see:

    @@ -220,26 +200,28 @@

    -
    -
    - -
    - -
    +
    ggplot(airquality,
    +       aes(x = Ozone,
    +           y = Solar.R)) +
    + geom_miss_point() +
    +  
    +  ##everything past this point is just 
    +  #to explain the visualization
    +  theme_minimal() +
    +  geom_vline(xintercept=0) +
    +  geom_hline(yintercept = 0) +
    +  annotate("text",x=-5 ,y=150, label= "missing ozone", angle=90) +
    +  annotate("text", y=-15, x=75, label="missing Solar.R") +
    +  annotate("text", y=-20, x=-20, label="missing\nboth") +
    +  annotate("text", y=150, x=75, label="no missing data")

    In this plot, the missing values are represented by red points that are below the zero line for both axes (they are jittered so they don’t all occupy the same line). Specifically, the points on the left side have values for Solar.R but are missing values for Ozone. In this case, the points are distributed across the entire range of Solar.R. Note that this isn’t the case for missing values of Solar.R, which are represented in the lower right of the plot. These missing values are not distributed evenly across Ozone, showing a bias towards lower values of Ozone.

    ful when you facet on a categorical variable, to look for conditioned randomness, MAR/MNAR.

    -
    -
    - -
    - -
    +
    ggplot(airquality,
    +       aes(x = Ozone,
    +           y = Solar.R)) +
    + geom_miss_point() + facet_wrap(~Month)

    Here we can see a possible bias in missing values by the month (compare month=6 to month=9).

@@ -248,23 +230,6 @@

I

I’ve barely scratched the surface of all you can do with {naniar}. Nick has come up with all sorts of visualizations to address issues with missing values. I especially like the visualizations he’s added around imputations, which is one way to address missing values. Check his package out!

- - -
-
- -
-
- -
-
-

Citation

BibTeX citation:
@online{laderas2024,
@@ -277,17 +242,6 @@ 

I Laderas, Ted. 2024. “What’s Missing with `{Naniar}`.” September 22, 2024.

- - - - - - + @@ -98,7 +95,7 @@ Contributing @@ -191,39 +188,35 @@

What is {patchwork

Penguins Data

Just a quick reminder of the penguins data:

-
-
+
#| edit: false
+data(penguins)
+library(gt)
 
-
- -
+gt(head(penguins))

Let’s start with two plots

Let’s make two different views of the palmerpenguins data. The first is a bar plot of the penguin species:

-
-
+
#| autorun: false
+#| warning: false
+library(palmerpenguins)
+library(ggplot2)
 
-
- -
+penguin_species <- ggplot(penguins, aes(y=species, fill=species)) + + geom_bar() + +penguin_species

Let’s do a histogram of penguin bill_length_mm, colored by species:

-
-
+
#| autorun: false
+#| warning: false
+penguin_bill_length <- ggplot(penguins, aes(y=bill_length_mm, fill=species)) +
+  geom_histogram(bins=20)
 
-
- -
+penguin_bill_length
@@ -232,99 +225,67 @@

Composing Plots together

The {patchwork} package has two basic operations. + composes the plots side by side, and / composes one plot on top of each other.

Let’s try out a side by side composition:

-
-
- -
- -
+
#| autorun: false
+#| warning: false
+library(patchwork)
+penguin_species + penguin_bill_length

Let’s try stacking the plots on top of each other:

-
-
- -
- -
+
#| autorun: false
+#| warning: false
+penguin_species / penguin_bill_length

We can remove the legends from both:

-
-
- -
- -
+
#| autorun: false
+#| warning: false
+(penguin_species + theme(legend.position="none")) /
+  (penguin_bill_length + theme(legend.position="none"))

Side by side and Stacked

How about three figures? We can compose them with a combination of + and /:

-
-
+
#| autorun: false
+#| warning: false
+penguin_island <- ggplot(penguins, aes(y=island)) +
+  geom_bar()
 
-
- -
+(penguin_species + penguin_island) / penguin_bill_length

There is an equivalent syntax of using | (the pipe character), which does the same thing as +:

-
-
- -
- -
+
#| autorun: false
+#| warning: false
+(penguin_species | penguin_island) / penguin_bill_length

Plot Labeling

You can automatically label plots in your figure using plot_annotation():

-
-
- -
- -
+
#| autorun: false
+#| warning: false
+(penguin_species + penguin_island) / penguin_bill_length + 
+    plot_annotation(tag_levels="A")

Finally, let’s add a title for our figure:

-
-
- -
- -
+
#| autorun: false
+#| warning: false
+(penguin_species + penguin_island) / penguin_bill_length + 
+  plot_annotation(tag_levels="A") +
+  plot_annotation(title="Penguins are Very Surprising")

Try it out!

Try out a different combination of plots, such as one plot on top and another on the bottom. Or make your own penguins plot and compose them.

-
-
- -
- -
+
#| autorun: false
+#| warning: false
+
@@ -332,23 +293,6 @@

Go Further

This is just the tip of the iceberg. You can learn way more about {patchwork} at Thomas Lin Pedersen’s website: https://patchwork.data-imaginist.com/index.html

- - -
-
- -
-
- -
-
-
@@ -362,17 +306,6 @@

Go Further

Laderas, Ted. 2024. “Compose Plots with {Patchwork}.” September 13, 2024. - -