This repository houses an invited presentation for Kennesaw State's DS 7900 Applied Project in Analytics and Data Science.
I use the marathon data that the New York Times article What Good Marathons and Bad Investments Have in Common used.
They provide links to the entire data of almost ten million records in csv from box.com. I have removed a few columns and provided two formats from dropbox.
You can find the same data in .feather
and .parquet
formats in this repository's arrow
folder.
- initial_setup.R provides the script that drops columns from the original source.
- create_arrow.R provides an example of converting a large file from
.sas7bdat
to.feather
and.parquet
. The results are inarrow
. - data_digest.R provides size and parsing time for each format.
- create_arrow.py provides an example of converting a large file from
.sas7bdat
to.feather
and.parquet
. The results are inpy_arrow
- data_digest.R provides sizes and parsing for
.sas7bdat
and.parquet
.
The explore_bigdata.R file provides a short example.
GitHub Pages Slideshow with Remark
This template is made from Remark, an open-source tool to help create and display slideshows from markdown. For questions, see Remark's documentation.
The most important things to know are:
- Enable GitHub Pages from
master
for the slides to work - Once enabled, the slides will be visible at
https://USERNAME.github.io/REPOSITORY-NAME/#1
, like https://brianamarie.github.io/slideshow-on-pages/#1 - Edit the
index.html
file to edit the slides - Slides are separated by
----
- Presenter notes after
???
within one slide - Toggle presenter notes during presentation with
P
- Read the full guide to remark markdown
- Press
C
to clone a display; then pressP
to switch to presenter mode. Open help menu withh