Skip to content

Commit

Permalink
Merge pull request #717 from tanuj437/main
Browse files Browse the repository at this point in the history
Used Car Price Prediction
  • Loading branch information
abhisheks008 committed Jul 21, 2024
2 parents 924825e + a12c7ce commit 94b0ff6
Show file tree
Hide file tree
Showing 22 changed files with 54,627 additions and 0 deletions.
57 changes: 57 additions & 0 deletions Used Car Price Prediction/Dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Car Price Prediction Dataset

## 📝 Description
Explore the intricate details of car prices with our comprehensive dataset. This dataset captures various attributes of cars, each labeled with its respective price. By analyzing this dataset, you will gain valuable insights into the factors that influence car prices, aiding in accurate and efficient price prediction.

## Key Features
- **Diverse Car Attributes:** Understand the impact of various attributes on car prices, including brand, model, model year, mileage, fuel type, engine type, transmission type, exterior color, interior color, accident history, and title status.
- **High-Quality Data:** Each car is represented with detailed attributes, ensuring that all relevant factors are captured for thorough analysis.
- **Balanced Representation:** While some attributes may have more variations than others, the dataset provides a balanced overview of different car features, facilitating effective training and testing of regression models.

## Data Collection
The data has been meticulously collected and labeled based on actual car listings. This structured approach ensures that each car is accurately described, providing a reliable dataset for training machine learning models.

## Data Attributes
The dataset contains the following attributes for each car:

- **id:** Unique identifier for each car
- **brand:** The brand of the car
- **model:** The specific model of the car
- **model_year:** The year the car model was manufactured
- **milage:** The total mileage of the car in kilometers
- **fuel_type:** The type of fuel used by the car (e.g., Petrol, Diesel, Electric)
- **engine:** The engine type of the car (e.g., V6, V8, Electric)
- **transmission:** The type of transmission in the car (e.g., Automatic, Manual)
- **ext_col:** The exterior color of the car
- **int_col:** The interior color of the car
- **accident:** Indicates whether the car has been in an accident (Yes/No)
- **clean_title:** Indicates whether the car has a clean title (Yes/No)
- **price:** The price of the car (target variable)

## Sample Data
Here are a few sample entries from the dataset:

| id | brand | model | model_year | milage | fuel_type | engine | transmission | ext_col | int_col | accident | clean_title | price |
|----|--------|-------|------------|--------|-----------|--------|--------------|---------|---------|----------|-------------|-------|
| 1 | Toyota | Camry | 2015 | 60000 | Petrol | V6 | Automatic | Black | Grey | No | Yes | 15000 |
| 2 | Ford | F-150 | 2018 | 40000 | Diesel | V8 | Manual | White | Black | Yes | No | 22000 |
| 3 | Tesla | Model S | 2020 | 20000 | Electric | Electric | Automatic | Red | White | No | Yes | 75000 |

## How to Use the Dataset
To use this dataset for training machine learning models, follow these steps:

1. **Download the Dataset:**
The dataset can be downloaded from the relevant directory within this project.[Kaggle](https://www.kaggle.com/datasets/zeeshanlatif/used-car-price-prediction-dataset/data?select=train.csv)

2. **Data Preprocessing:**
- Convert categorical columns to category dtype.
- Apply standard scaling for numerical features.
- Apply one-hot encoding for categorical features.
- Split the data into training and test sets for model evaluation.

3. **Model Training:**
Train various machine learning models on the dataset to predict car prices based on the provided attributes.

## 📢 Conclusion
The car price prediction dataset provides a comprehensive and well-structured collection of car attributes and prices, facilitating the development of accurate and robust prediction models. By leveraging this dataset, you can gain valuable insights into the factors that influence car prices and improve your predictive modeling capabilities.

54,274 changes: 54,274 additions & 0 deletions Used Car Price Prediction/Dataset/train.csv

Large diffs are not rendered by default.

55 changes: 55 additions & 0 deletions Used Car Price Prediction/Model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Car Price Prediction - Model

## 📝 Description
This folder contains the pre-trained machine learning models and scripts used for predicting car prices based on various attributes. The aim is to accurately estimate the price of a car given its features.

## 📂 Contents
- **used-car-price-prediction.ipynb:** Jupyter Notebook containing the complete process of data preprocessing, model training, evaluation, and visualization.
- **README.md:** This document.
- **ridgemodel.pkl:** Pre-trained Ridge Regression model used for car price prediction.
- **preprocessor.pkl:** Pre-trained data preprocessor.
- **unique_values.pkl:** Precomputed unique values for categorical columns.

## 🎯 Goal
The goal of this car price prediction project is to accurately predict car prices using various machine learning models based on attributes such as brand, model, model year, mileage, fuel type, engine type, transmission type, exterior color, interior color, accident history, and title status.

## 🧮 What I Did
In this car price prediction project, various models were evaluated to find the most effective one for predicting car prices. The models evaluated include:

## Models Used
- **Linear Regression:** A basic linear approach to modeling the relationship between the dependent variable and one or more independent variables.
- **Ridge Regression:** A linear regression model with L2 regularization to prevent overfitting.
- **Lasso Regression:** A linear regression model with L1 regularization to perform feature selection.
- **Decision Tree:** A model that splits the data into subsets based on feature values, creating a tree-like structure for regression.
- **Gradient Boosting:** An ensemble learning method that builds models sequentially to correct errors of previous models.

- **K-Nearest Neighbors Regressor (KNN):** An instance-based learning algorithm that predicts a sample's value based on the average value of its k-nearest neighbors.
- **XGBoost Regressor:** An optimized gradient boosting library designed for speed and performance.

## Data Preprocessing and Augmentation
- **Image Resizing and Normalization:** Not applicable for this dataset.
- **Feature Engineering:** Applied standard scaling for numerical features and one-hot encoding for categorical features.
- **Data Splitting:** Divided data into training and test sets for robust model evaluation.

## 🚀 Models Implemented
- **Linear Regression:** Basic linear approach.
- **Ridge Regression:** Regularized linear regression.
- **Lasso Regression:** Regularized linear regression with feature selection.
- **Decision Tree:** Non-linear tree structure.
- **Gradient Boosting:** Sequential ensemble learning.
- **K-Nearest Neighbors Regressor (KNN):** Instance-based learning.
- **XGBoost Regressor:** Optimized gradient boosting.

## 📈 Performance of the Models
The models were evaluated using mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE), and R-squared (R2) score. Detailed performance metrics for each model are included in the Jupyter Notebook.

<img width="146" alt="RMSE_cmp" src="https://github.com/user-attachments/assets/b80bd458-93d5-4108-a447-c9dfa7ef75ee">
<img width="150" alt="R2_cmp" src="https://github.com/user-attachments/assets/c9d3824f-6c49-437d-9a42-333206f30800">
<img width="139" alt="MSE_cmp" src="https://github.com/user-attachments/assets/e2dc0348-2baa-490b-be7a-09be09f15ffc">
<img width="154" alt="MAE_cmp" src="https://github.com/user-attachments/assets/aba3bab7-6000-4ec0-943f-7788d03d1efd">

## 📢 Conclusion
The car price prediction project demonstrates that various machine learning models can accurately estimate car prices based on their features. Ridge Regression was chosen as the final model for deployment in the web app.

## ✒️ Connect with Me
Tanuj Saxena [LinkedIn](https://www.linkedin.com/in/tanuj-saxena-970271252/)
Binary file added Used Car Price Prediction/Model/preprocessor.pkl
Binary file not shown.
Binary file added Used Car Price Prediction/Model/ridgemodel.pkl
Binary file not shown.
Binary file added Used Car Price Prediction/Model/unique_values.pkl
Binary file not shown.

Large diffs are not rendered by default.

98 changes: 98 additions & 0 deletions Used Car Price Prediction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Used Car Price Prediction
Explore the world of used car price prediction using various machine learning models to accurately estimate car prices based on their attributes. This project focuses on predicting car prices by analyzing attributes such as brand, model, mileage, fuel type, and more.
<img width="920" alt="webapp1" src="https://github.com/user-attachments/assets/37d15871-c6fe-4042-a360-d10b6b816e15">
<img width="922" alt="webapp2" src="https://github.com/user-attachments/assets/93e44e0a-3a8d-4041-82c2-7bbf2ee1f85e">
## 📝 Abstract
The Used Car Price Prediction project aims to estimate the price of used cars based on multiple features. By applying various machine learning models, including regression techniques and ensemble methods, the project seeks to build a robust model for predicting car prices and provide insights into the factors influencing car values.

## 🔍 Methodology
1. **Importing Libraries**

Essential libraries such as NumPy, Pandas, Scikit-Learn, and XGBoost are imported for data manipulation, preprocessing, model training, and evaluation.

2. **Loading the Dataset**

The dataset contains information about used cars, including features such as brand, model, mileage, fuel type, engine type, transmission type, and more. This dataset is used to train and evaluate the prediction models.

3. **Data Preprocessing**

The preprocessing steps include handling missing values, encoding categorical variables, scaling numerical features, and splitting the dataset into training and testing sets.

4. **Training the Models**

Multiple models are implemented, including Linear Regression, Ridge Regression, Lasso Regression, Decision Tree Regressor, Random Forest Regressor, Gradient Boosting Regressor, Support Vector Regressor, Extra Trees Regressor, K-Nearest Neighbors Regressor, and XGBoost Regressor. Each model is trained and evaluated to identify the most effective approach for predicting car prices.

5. **Model Evaluation**

The performance of each model is evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R2). Visualization of results helps in comparing the effectiveness of different models.

6. **Web Application**

A Streamlit web application is developed to allow users to input car attributes and get predictions on car prices in real-time. This application utilizes the pre-trained model to provide instant price estimates.


### 📂 Project Directory Structure

```bash
Used Car Price Prediction
|- Dataset
|- train.csv
|- README.md

|- Model
|- used_car_price_prediction.ipynb
|- README.md
|- model.pkl
|- preprocessor.pkl
|- unique.pkl

|- Web App
|- app.py
|- README.md

|- Images
|- MAE_cmp.png
|- MSE_cmp.png
|- RMSE_cmp.png
|- R2_cmp.png
|- car_price_distribution.png
|- correlation_matrix.png
|- mileage_vs_price.png
|- README.md

|- requirements.txt
|- README.md
```
## How to Use
1. **Install Requirements**

Ensure you have the necessary libraries and dependencies installed. You can find the list of required packages in the requirements.txt file.
```bash
pip install -r requirements.txt
```
2. **Download Data**

Ensure you have the car_prices.csv dataset in the Dataset folder. [Kaggle](https://www.kaggle.com/datasets/zeeshanlatif/used-car-price-prediction-dataset/data?select=train.csv)

3. **Run the Jupyter Notebook**

Open the provided Jupyter Notebook file (used_car_price_prediction.ipynb) and run each cell sequentially. Update any file paths or configurations as needed for your environment.

4. **Training and Evaluation**

Train the models and evaluate their performance using the provided data. Analyze the results to determine the best-performing model.

5. **Run the Web Application**

Navigate to the Web App directory and run the Streamlit application to start predicting car prices using the pre-trained model.
```bash
streamlit run app.py
```
6. **Interpret Results**

Use the provided visualizations and metrics to interpret the model’s performance and insights from the data.

Feel free to reach out if you encounter any issues or need further assistance with running the notebook or web application.

## Connect with Me
Tanuj Saxena [LinkedIn](https://www.linkedin.com/in/tanuj-saxena-970271252/)
40 changes: 40 additions & 0 deletions Used Car Price Prediction/Webapp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Car Price Prediction Web App

## Goal 🎯
This project focuses on predicting car prices based on various attributes such as brand, model, model year, mileage, fuel type, engine type, transmission type, exterior color, interior color, accident history, and title status. The goal is to provide an estimate of a car's price using machine learning models.

## Model(s) Used for the Web App 🧮
The model used in this web app is a pre-trained Ridge Regression model, which has been fine-tuned for car price prediction.

## Video Demonstration




https://github.com/user-attachments/assets/b672ae69-ee43-43c8-be83-5b6c45558a43




## How to Run the Web App

### Requirements
Ensure you have the necessary libraries and dependencies installed. You can find the list of required packages in the `requirements.txt` file.

### Installation
1. **Clone the repository:**
```bash
gh repo clone tanuj437/Car-Price-Prediction
cd Car-Price-Prediction
2. **Install the Dependencies**
```bash
pip install -r requirements.txt
```
3. **Run the Streamlit app**
```bash
streamlit run app.py
```
### Signature ✒️
Tanuj Saxena

[![LinkedIn](https://img.shields.io/badge/LinkedIn-%230077B5.svg?logo=linkedin&logoColor=white)](https://www.linkedin.com/in/tanuj-saxena-970271252/)
41 changes: 41 additions & 0 deletions Used Car Price Prediction/Webapp/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import streamlit as st
import pandas as pd
import joblib

# Load the preprocessor and Ridge model
preprocessor = joblib.load('Model/preprocessor.pkl')
ridge_model = joblib.load('Model/ridgemodel.pkl')

# Load unique values for categorical columns
unique_values = joblib.load('Model/unique_values.pkl')

# Define input features
categorical_cols = ['brand', 'model', 'fuel_type', 'engine', 'transmission', 'ext_col', 'int_col', 'accident', 'clean_title']
numerical_cols = ['model_year', 'milage']

# Define the web app
st.title('Car Price Prediction App')

st.write("""
## Predict the price of a car based on its attributes
""")

# Input fields
inputs = {}
for col in numerical_cols:
inputs[col] = st.number_input(f'Enter {col}', min_value=0)
for col in categorical_cols:
options = unique_values[col]
inputs[col] = st.selectbox(f'Select {col}', options=options)

# When the user clicks the Predict button
if st.button('Predict'):
input_df = pd.DataFrame([inputs])

# Apply the transformations to the input data
input_transformed = preprocessor.transform(input_df)

# Make a prediction
prediction = ridge_model.predict(input_transformed)

st.write(f'The predicted price of the car is: ${prediction[0]:,.2f}')
Binary file added Used Car Price Prediction/images/MAE_cmp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Used Car Price Prediction/images/MSE_cmp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Used Car Price Prediction/images/R2_cmp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
53 changes: 53 additions & 0 deletions Used Car Price Prediction/images/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Images for Car Price Prediction Project

This folder contains visualizations and plots that illustrate different aspects of the car price prediction project. These images provide insights into model performance, data distribution, and feature relationships.

## Contents

### 1. MAE (Mean Absolute Error)
<img width="154" alt="MAE_cmp" src="https://github.com/user-attachments/assets/b8249ac7-2888-42f5-b310-e5698fd06c8b">

This plot shows the Mean Absolute Error for various models used in the car price prediction. Lower MAE values indicate better model performance.

### 2. MSE (Mean Squared Error)
<img width="139" alt="MSE_cmp" src="https://github.com/user-attachments/assets/0cc4f9fa-c0d0-48df-a1d3-db11089525af">

This plot represents the Mean Squared Error for the models. It provides insights into the average squared difference between predicted and actual values.

### 3. RMSE (Root Mean Squared Error)
<img width="146" alt="RMSE_cmp" src="https://github.com/user-attachments/assets/4a5ee698-a234-48a8-b140-ee276c8b509f">

The Root Mean Squared Error plot shows the square root of the Mean Squared Error. RMSE is useful for understanding the average magnitude of prediction errors.

### 4. R2 Score
<img width="150" alt="R2_cmp" src="https://github.com/user-attachments/assets/777e3e75-246c-403f-8004-387918b294bd">

This plot displays the R2 Score for the models. The R2 Score indicates the proportion of variance in the dependent variable that is predictable from the independent variables.

### 5. Car Price Distribution
<img width="590" alt="car_price_distribution" src="https://github.com/user-attachments/assets/d546a94e-f310-407a-b9e7-8c30c7920b39">

This visualization shows the distribution of car prices in the dataset, providing an understanding of the spread and central tendency of car prices.

### 6. Correlation Matrix
<img width="545" alt="correlation_matrix" src="https://github.com/user-attachments/assets/173bd9b7-de3f-4098-99d1-8fc3b8ba164f">

The Correlation Matrix plot highlights the correlation coefficients between different features in the dataset, showing how features are related to each other.

### 7. Mileage vs. Price
<img width="578" alt="milega_vs_price" src="https://github.com/user-attachments/assets/5b338d24-0e2e-4c4a-96b9-81327e5097df">

This scatter plot visualizes the relationship between mileage and car price, helping to understand how mileage affects the price of cars.

## Usage

These images can be used for presentations, reports, or further analysis to better understand the car price prediction model's performance and the dataset's characteristics.

## Contributing

If you have suggestions for additional visualizations or improvements, please feel free to contribute by submitting a pull request.

## Contact

For any questions or further information, please reach out to [Tanuj Saxena](https://www.linkedin.com/in/tanuj-saxena-970271252/).

Binary file added Used Car Price Prediction/images/RMSE_cmp.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Used Car Price Prediction/images/webapp1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Used Car Price Prediction/images/webapp2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Used Car Price Prediction/images/wepapprun.mp4
Binary file not shown.
8 changes: 8 additions & 0 deletions Used Car Price Prediction/requirments.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
pandas==1.5.3
numpy==1.24.3
scikit-learn==1.2.2
matplotlib==3.7.2
seaborn==0.12.2
xgboost==1.7.6
streamlit==1.24.1
joblib==1.3.2

0 comments on commit 94b0ff6

Please sign in to comment.