This project performs Exploratory Data Analysis (EDA) on the NYC Taxi Trip dataset and builds several machine learning models to predict taxi trip outcomes. The dataset contains detailed records of taxi trips in New York City, offering rich insights into urban transport patterns.
This project explores the NYC Taxi Trip dataset, performing comprehensive EDA to understand the data and its features. Multiple machine learning models are developed, trained, and evaluated on their predictive performance for taxi trip-related metrics. The goal is to derive actionable insights and create reliable predictive models.
The dataset used in this project includes:
- Pickup and Dropoff Times: Time of the trip's start and end.
- Geospatial Information: Latitude and longitude of the pickup and dropoff locations.
- Passenger Count: Number of passengers in the taxi.
- Trip Distance: Total distance covered in the trip.
The dataset can be accessed from the NYC Taxi & Limousine Commission (TLC) data portal.
During EDA, the following aspects are analyzed:
- Trip Distribution: Analyzing the distribution of trip distances, durations, and locations.
- Passenger and Payment Patterns: Insights into average trip costs, most common pickup/dropoff locations, and the impact of different variables on trip price.
- Temporal Trends: Investigating patterns over time (e.g., busiest hours for trips).
Visualizations such as histograms, bar charts, and heatmaps are used to explore these features.
Multiple machine learning models are applied and evaluated, including:
- Linear Regression
- Decision Trees
- Random Forest
- Gradient Boosting
- K-Nearest Neighbors (KNN)
Each model is trained and validated using appropriate metrics to predict variables such as:
- Trip duration
- Trip fare amount
- Passenger count
The performance of each model is compared using metrics like:
- Mean Absolute Error (MAE)
- Root Mean Squared Error (RMSE)
- R² Score
The model performance is analyzed to determine which algorithm provides the most accurate predictions for different types of trips.
To run the project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/your-username/nyc-taxi-trip-prediction.git
-
Navigate to the project directory:
cd nyc-taxi-trip-prediction
-
Install the necessary dependencies:
pip install -r requirements.txt
-
Run the Jupyter notebook:
jupyter notebook
- Python: The main programming language for this project.
- Pandas: Data manipulation and preprocessing.
- NumPy: Numerical operations.
- Matplotlib & Seaborn: Data visualization.
- Scikit-learn: Machine learning models and evaluation.
- Open the notebook eda_nyc_taxi_trip.ipynb to explore the EDA and model-building process.
- You can visualize the data and run the models by executing the code cells.
- Modify the parameters and models to experiment with different predictive techniques.
Contributions to this project are welcome! Feel free to submit pull requests or raise issues.
- Fork the repository.
- Create a new branch for your feature:
git checkout -b new-feature
- Commit your changes:
git commit -m 'Add new feature'
- Push to the branch:
git push origin new-feature
- Open a pull request.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.