-
Introduction to Data Science (presentation, tex)
- What can Big Data do for you?
- What is Big Data?
- Implications for Statistics and Computation
- What is Data Science?
- Prerequisites
-
Get your own (Big) Data (presentation, tex)
- Scrape web pages and pdfs. (Scripts)
- Image to Text (Python Script using Tesseract)
- Image to Text in R using the Abbyy FineReader Cloud OCR
- Image to Text in R using the Captricity API
- Web Scraping/API Applications:
- Get Social Networking Data
- Regular Expressions
- Pre-process text data
- Assignment
-
Databases and SQL (presentation, tex)
- What are databases?
- Relational Model
- Relational Algebra
- Basic SQL
- Views
4a. Introduction to Introduction to Statistical Learning
4b. Introduction to Statistical Learning (presentation, tex) * How to learn from data? * Nearest Neighbors * When you don't have good neighbors * Assessing model fit * Clarification about Big Data
-
Supervised Methods
-
Unsupervised Methods
- PCA, CA
- k-means (presentation, tex)
-
Presenting Analyses
- ggplot2 in brief
- Examples of ggplot in action:
- Some Applications
- From paper to digital (presentation, tex)
- Text as Data
The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition
By Trevor Hastie, Robert Tibshirani, Jerome Friedman
ISBN: 0387848576
Python Programming: An Introduction to Computer Science
By John Zelle
ISBN: 1590282418
ggplot2: Elegant Graphics for Data Analysis (Use R!)
By Hadley Wickham
ISBN: 0387981403
Released under the Creative Commons License.