You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Learn to use Anaconda to manage packages and environments for use with Python
LESSON TWO Jupyter Notebooks
Learn to use this open-source web application to combine explanatory text, math equations, code, and visualizations in one sharable document
LESSON THREE Data Analysis Process
Learn about the key steps of the data analysis process. Investigate multiple datasets using Python and Pandas.
LESSON FOUR Pandas and AND NumPy: Case Study 1
Perform the entire data analysis process on a dataset. Learn to use NumPy and Pandas to wrangle, explore, analyze, and visualize data
LESSON FIVE Pandas and AND NumPy: Case Study 2
Perform the entire data analysis process on a dataset. Learn more about NumPy and Pandas to wrangle, explore, analyze, and visualize data
LESSON SIX Programming Workflow for Data Analysis
Learn about how to carry out analysis outside Jupyter notebook using IPython or the command line interface
Course 2: Practical Statistics
LESSON ONE Simpson’s Paradox
Examine a case study to learn about Simpson’s Paradox
LESSON TWO Probability
Learn the fundamental rules of probability.
LESSON THREE Binomial Distribution
Learn about binomial distribution where each observation represents one of two outcomes. Derive the probability of a binomial distribution.
LESSON FOUR Conditional Probability
Learn about conditional probability, i.e., when events are not independent.
LESSON FIVE Bayes Rule
Build on conditional probability principles to understand the Bayes rule. Derive the Bayes theorem.
LESSON SIX Standardizing
Convert distributions into the standard normal distribution using the Z-score. Compute proportions using standardized distributions.
LESSON SEVEN Sampling Distributions and Central Limit Theorem
Use normal distributions to compute probabilities. Use the Z-table to look up the proportions of observations above, below, or in between values.
LESSON EIGHT Confidence Intervals
Estimate population parameters from sample statistics using confidence intervals.
LESSON NINE Hypothesis Testing
Use critical values to make decisions on whether or not a treatment has changed the value of a population parameter.
LESSON TEN T-Tests and A/B Tests
Test the effect of a treatment or compare the difference in means for two groups when we have small sample sizes.
LESSON ELEVEN Regression
Build a linear regression model to understand the relationship between independent and dependent variables. Use linear regression results to make a prediction.
LESSON TWELVE Multiple Linear Regression
Use multiple linear regression results to interpret coefficients for several predictors
LESSON THIRTEEN Logistic Regression
Use logistic regression results to make a prediction about the relationship between categorical dependent variables and predictors.
Course 3: Data Wrangling
LESSON ONE Intro to Data Wrangling
Identify each step of the data wrangling process (gathering, assessing, and cleaning). Wrangle a CSV file downloaded from Kaggle using fundamental gathering, assessing, and cleaning code.
LESSON TWO Gathering Data
Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs. Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files. Store gathered data in a PostgreSQL database.
LESSON THREE Assessing Data
Assess data visually and programmatically using pandas. Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues). Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity.
LESSON FOUR Cleaning Data
Identify each step of the data cleaning process (defining, coding, and testing). Clean data using Python and pandas. Test cleaning code visually and programmatically using Python.
Course 4: Data Visualization with Python
LESSON ONE Data Visualization in Data Analysis
Understand why visualization is important in the practice of data analysis. Know what distinguishes exploratory analysis from Explanatory analysis, and the role of data visualization in each.
LESSON TWO Design of Visualizations
Interpret features in terms of level of measurement. Know different encodings that can be used to depict data in visualizations. Understand various pitfalls that can affect the effectiveness and truthfulness of visualizations.
LESSON THREE Univariate Exploration of Data
Use bar charts to depict distributions of categorical variables. Use histograms to depict distributions of numeric variables. Use axis limits and different scales to change how your data is interpreted.
LESSON FOUR Bivariate Exploration of Data
Use scatterplots to depict relationships between numeric variables. Use clustered bar charts to depict relationships between categorical variables. Use violin and bar charts to depict relationships between categorical and numeric variables. Use faceting to create plots across different subsets of the data.
LESSON FIVE Multivariate Exploration of Data
Use encodings like size, shape, and color to encode values of a third variable in a visualization. Use plot matrices to explore relationships between multiple variables at the same time. Use feature engineering to capture relationships between variables.
LESSON SIX Explanatory Visualizations
Understand what it means to tell a compelling story with data. Choose the best plot type, encodings, and annotations to polish your plots. Create a slide deck using a Jupyter Notebook to convey your findings.
LESSON SEVEN Visulization Case Study
Apply your knowledge of data visualization to a dataset involving the characteristics of diamonds and their prices.