Udacity Data Analytics Nanodegree Project 1 - Exploring Gapminder Datasets using Python and Jupyter Notebook
In this project, I will analyze a dataset and then communicate my findings about it. I will use the Python libraries NumPy, pandas, and Matplotlib to make my analysis easier.
I will need an installation of Python, plus the following libraries:
- pandas
- NumPy
- Matplotlib
- csv
After completing the project, I will:
- Know all the steps involved in a typical data analysis process
- Be comfortable posing questions that can be answered with a given dataset and then answering those questions
- Know how to investigate problems in a dataset and wrangle the data into a format you can use
- Have practice communicating the results of your analysis
- Be able to use vectorized operations in NumPy and pandas to speed up your data analysis code
- Be familiar with pandas' Series and DataFrame objects, which let you access your data more conveniently
- Know how to use Matplotlib to produce plots showing your findings
I chose four datasets from Gapminder.
- Female salaried employee
- Female literacy rate
- Gdp per capita
- Life expectancy at birth
Create a single folder (repositories) that contains:
- The report communicating your findings
- Any Python code you wrote as part of your analysis
- The data set you used
Make questions that promote looking at relationships between multiple variables. I aimed to analyze at least one dependent variable and three independent variables in my investigation. Make sure I use NumPy and pandas where they are appropriate!
Create a report that shares the findings I found most interesting.