Skip to content

BYUIDSconsulting/analytics-collaboration-hathaway

 
 

Repository files navigation

Data Science is about communication

This repository houses an invited presentation for Kennesaw State's DS 7900 Applied Project in Analytics and Data Science.

Data parsing example

I use the marathon data that the New York Times article What Good Marathons and Bad Investments Have in Common used.

They provide links to the entire data of almost ten million records in csv from box.com. I have removed a few columns and provided two formats from dropbox.

You can find the same data in .feather and .parquet formats in this repository's arrow folder.

R scripts

  • initial_setup.R provides the script that drops columns from the original source.
  • create_arrow.R provides an example of converting a large file from .sas7bdat to .feather and .parquet. The results are in arrow.
  • data_digest.R provides size and parsing time for each format.

Python scripts

  • create_arrow.py provides an example of converting a large file from .sas7bdat to .feather and .parquet. The results are in py_arrow
  • data_digest.R provides sizes and parsing for .sas7bdat and .parquet.

Data exploration example

The explore_bigdata.R file provides a short example.

GitHub Pages Slideshow with Remark

This template is made from Remark, an open-source tool to help create and display slideshows from markdown. For questions, see Remark's documentation.

The most important things to know are:

  • Enable GitHub Pages from master for the slides to work
  • Once enabled, the slides will be visible at https://USERNAME.github.io/REPOSITORY-NAME/#1, like https://brianamarie.github.io/slideshow-on-pages/#1
  • Edit the index.html file to edit the slides
  • Slides are separated by ----
  • Presenter notes after ??? within one slide
  • Toggle presenter notes during presentation with P
  • Read the full guide to remark markdown
  • Press C to clone a display; then press P to switch to presenter mode. Open help menu with h

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.6%
  • Other 0.4%