Skip to content

xiangzhemeng/EPFL-Applied-Data-Analysis-2017

Repository files navigation

Applied Data Analysis Homework & Project

EPFL CS-401 Applied data analysis

Team Members: Shengzhao LEI, Tao Sun, Xiangzhe Meng

Project Data Story Link: Amazon review data analysis

Abstract

Applied data analysis course teaches the basic techniques and practical skills required to make sense out of a variety of data, with the help of the most acclaimed software tools in the data science world: pandas, scikit-learn, Spark, etc.

This course covers the fundamental steps of the data science pipeline:

  • Data Acquisition

Variety as one of the main challenges in big data: structured, semi-structured, unstructured; Data sources: open, public (scraping, parsing, and down-sampling); Dataset fusion, filtering, slicing & dicing; Data granularities and aggregations

  • Data Wrangling

Data manipulation, array programming, dataframes; The many sources of data problems (and how to fix them): missing data, incorrect data, inconsistent representations; Schema alignment, data reconciliation; Data quality testing with crowdsourcing

  • Data Interpretation

Distribution fitting, statistical significance; Co-occurrence grouping (market-basket analysis); Machine learning in practice (supervised and unsupervised, feature engineering, more data vs. advanced algorithms, curse of dimensionality, etc.); Text mining: vector space model, topic models, word embedding; Social network analysis (influencers, community detection, etc.)

  • Data Visualization

  • Reporting

About

Homework & project for ADA2017 from EPFL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published