-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #5 from willtownes/main
make preview_datasets more flexible and add sprinters dataset
- Loading branch information
Showing
8 changed files
with
2,167 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,3 +6,6 @@ _site/ | |
|
||
# Ignore metadata, since it is auto-generated | ||
metadata.json | ||
|
||
# RStudio project data | ||
.Rproj.user |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Version: 1.0 | ||
|
||
RestoreWorkspace: Default | ||
SaveWorkspace: Default | ||
AlwaysSaveHistory: Default | ||
|
||
EnableCodeIndexing: Yes | ||
UseSpacesForTab: Yes | ||
NumSpacesForTab: 2 | ||
Encoding: UTF-8 | ||
|
||
RnwWeave: Sweave | ||
LaTeX: pdfLaTeX |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
--- | ||
title: Fastest 100m sprint times | ||
author: Will Townes | ||
date: August 30, 2024 | ||
description: Explore the 1,000 fastest times ever recorded for the 100m sprint, in men's and women's track, cleaning the data and conducting EDA. | ||
categories: | ||
- data cleaning | ||
- EDA | ||
- split/apply/combine | ||
data: | ||
year: 2021 | ||
files: | ||
- sprint_m.txt | ||
- sprint_w.txt | ||
--- | ||
|
||
## Motivation | ||
|
||
This data comes from a compilation of official times for runners in 100m sprint | ||
events around the world. | ||
|
||
## Data | ||
|
||
There are two files, one for men and one for women. Each contains the top 1,000 | ||
times (as of 2021) for men or women in the 100m sprint. Each row represents one | ||
time. | ||
|
||
Both data files are in tab-separated format. | ||
|
||
### Data preview | ||
|
||
```{r, echo=FALSE, results="asis"} | ||
source("../preview_dataset.R") | ||
rt <- function(x) { | ||
read.table(file.path("../data", x), sep = "\t", quote = "", header = TRUE) | ||
} | ||
datasets <- sapply(c("sprint_m.txt", "sprint_w.txt"), rt, simplify = FALSE) | ||
preview_datasets(datasets = datasets) | ||
``` | ||
|
||
### Variable descriptions | ||
|
||
| Variable | Description | | ||
|----|-------------| | ||
| Rank | Ranking of the time | | ||
| Time | Number of seconds to complete the 100m sprint. Times have an "A" suffix if the event was at an altitude greater than 1,000 meters above sea level (e.g. "10.36A"). | | ||
| Wind | Windspeed in meters per second. Positive number is a tailwind. | | ||
| Name | Name of the sprinter. | | ||
| Country | Country the sprinter represented. | | ||
| Birthdate | When the sprinter was born (format DD.MM.YY) | | ||
| City | Where the race took place. | | ||
| Date | When the race took place (format DD.MM.YYYY) | | ||
|
||
The format is the same in both data files. | ||
|
||
## Questions | ||
|
||
This dataset provides many opportunities for data cleaning, wrangling, and EDA. | ||
For instance, in data cleaning: | ||
|
||
1. The `Time` column contains a suffix to indicate if the event was at altitude. | ||
Create a new `Altitude` column that is true if the event was at altitude, and | ||
adjust `Time` to contain only the time as a number. | ||
2. Some of the text columns contain special characters encoded as [HTML | ||
entities](https://developer.mozilla.org/en-US/docs/Glossary/Character_reference). | ||
For instance, ü is written `ü`. Use a package that can decode these to | ||
convert the text back to a readable format. | ||
3. Convert the `Birthdate` and `Date` columns to your programming language's | ||
date objects (such as `datetime.date` in Python or `Date` objects in R). How | ||
do you handle two-digit years and determine which century they occurred in? | ||
|
||
|
||
For data wrangling and split-apply-combine operations: | ||
|
||
1. Find the fastest time achieved by runners of each country. | ||
2. Find the fastest time recorded each year. Is this increasing or decreasing | ||
over time? | ||
3. Which sprinters have the most times in the top 1,000? | ||
|
||
|
||
For EDA: | ||
|
||
1. Is the race time correlated with the wind speed? Do tailwinds make runners | ||
faster? | ||
2. Do certain cities have faster average times than others? | ||
|
||
|
||
## References | ||
|
||
Data scraped from Peter Larsson's [Track and Field all-time Performances | ||
Homepage](http://www.alltime-athletics.com): | ||
|
||
- [Men's 100 meters](http://www.alltime-athletics.com/m_100ok.htm) | ||
- [Women's 100 meters](http://www.alltime-athletics.com/w_100ok.htm) | ||
|
||
Copyright held by Peter Larsson, email: `kl78vc` at `alltime-athletics.com`. | ||
Approval to redistribute was granted in September 2024. |