This is the repository for all of my contributions to #TidyTuesday, an initiative by the R4DS online learning community. All the data and related articles can be found in the official #TidyTuesday github repo.
Each of the folders in this repository contains some example visualizations and the code I wrote to create them. So far, I have worked on the following data sets:
The data this week comes from Kaggle. It contains data on all the contenders in the Olympics from Athens 1896 to Rio 2016, including of course the great Champions who won gold, silver and bronze medals time and again. My unit chart below shows the 12 athletes who have won the highest number of gold medals over these 120 years. Each square represents one medal.
Check out the code here.
This week's dataset was provided by The Trust for Public Land. It contains ratings of public parks across the United States based on a number of criteria. I created a Lollipop Plot to show the ratings for the five biggest cities in the US.
Check out the code here.
This dataset comes from the Great Lakes Fishery Commission. I have been looking for an opportunity to create sunburst plots and this seemed like the perfect dataset for it, with its nested structure (lakes - years - fish species). Below is the static plot for Lake Erie.
Since the dataset includes figures for all five great lakes, I wrote the plotting code into a function, which can be used to make this kind of plot for each of them by just specifying the name of the lake.
Check out the code here.
The data this week comes from the Ask A Manager Survey. I wanted to see the differences in salaries by age and race, so I created boxplots for all these different groups. The dataset is quite imbalanced because most of the respondents were white and female. However, it still gives some interesting insights into how these factors appear to influence the salary level.
Check out the code here.
This is a really interesting data set from Water Point Data Exchange, but I just could not find the time to work on it. Nevertheless, I quickly created this histogram to show what proportion of the installed water sources were still functional at the time of the visit. I thought it was great to see that so many of them continue to be used!
Check out the code here.
This week's data set contains information on CEOs in S&P 1500 firms from the 90s through 2020. It documents their departure from the company and the reason for leaving. Credit goes to Gentry et al. by way of DataIsPlural.
I created this combined plot showing different aspects in several figures. The main plot shows a regression, which was inspired by Julia Silge's TidyTuesday contribution for this week. The smaller two plots show (1) how many times each company has changed its CEO and (2) the reasons for the turnover.
Check out the code here.
This week the data came from Kaggle with credit to Shivam Bansal.
I was interested in where the movies and TV shows on Netflix come from, so I created rain cloud plots for the five countries with the highest number of movies and shows, using the gghalves library. These plots give an idea of the absolute number of data points, while also showing their distribution. Additionally, I worked on the theme to align it with Netflix branding.
Check out the code here.
I tried my hand at animated maps with the maps library and gganimate to show the establishment of post offices in the mainland USA between the years 1639 and 2000. The data was provided by Cameron Blevins and Richard W. Helbock.
Check out the code here.
I created treemaps to show the break up of forest area loss by its different drivers in the years 2001-2013. I then used the gifski library to animate the png files and compile them into one gif. The data was provided by Our World in Data.
Check out the code here.