This project focuses on Exploratory Data Analysis (EDA) to identify the key determinants that influenced survival during the infamous Titanic accident.
The sinking of the Titanic is one of the most well-known maritime disasters in history. In this project, I delve into the Titanic dataset to uncover patterns and insights that could explain what factors most significantly impacted the chances of survival. Using a combination of statistical analysis and data visualization, this repository aims to provide a comprehensive understanding of the variables at play.
The dataset used for this analysis is the Titanic dataset, which includes various features such as:
PassengerId
Survived (target variable)
Pclass (passenger class)
Name
Sex
Age
SibSp (number of siblings/spouses aboard)
Parch (number of parents/children aboard)
Ticket
Fare
Cabin
Embarked (port of embarkation)
Data Cleaning: Handling missing values, correcting data types, and ensuring the dataset is ready for analysis.
Exploratory Data Analysis: Generating descriptive statistics and visualizations to understand the distribution and relationships between variables.
Feature Engineering: Creating new features or transforming existing ones to better capture the underlying patterns.
Statistical Analysis: Identifying statistically significant factors affecting survival.
Passenger Class: Higher survival rates among passengers in higher classes (Pclass).
Sex: Females had a significantly higher chance of survival compared to males.
Age: Younger passengers had higher survival rates.
Family Size: The number of siblings/spouses and parents/children aboard had varying impacts on survival chances.
Fare: Higher ticket fares were generally associated with higher survival rates.