In this project, I analyze a dataset of Titanic passengers. I used a Jupyter Notebook with 2.7 Python. The external libraries used are NumPy, Pandas, Seaborn, and Matplotlib.
The Anaconda (https://www.anaconda.com) environment I used to create this is included in the repository (DAND.yml)
This study is an exercise to show how to use foundations of Data Science in order to import, study, visualize, and present the raw data in a method that is easy for any user to digest and understand.
This study uses passenger data from the ill-fated maiden voyage of the RMS Titanic (1912). The data (and explanation of the data) can be obtained from https://www.kaggle.com/c/titanic/data
First, the raw comma separated values (.cvs) data will be loaded into a Python (Pandas) dataframe.
Second, there will be some data exploration. This will be completed mostly by loading plots of different data slices in order to better understand the data with visualization. Visualizing the data makes generating a hypothesis easier.
Third, the data will be analyzed.
Lastly, a function has been created where a user can input their personal information to see their probability of surviving the Titanic disaster.
- DAND.yml (Anaconda/Python environment)
- titanic-data.csv (the dataset used for the analysis from Kaggle)
- titanic-investigation.ipynb (Jupyter Notebook analysis to active viewing, use, and editing)