Used Cars Price Prediction

This project involves predicting the prices of used cars using various machine learning models. The dataset used for this analysis comes from the US Used Cars dataset (3 million), containing detailed information about used cars in the United States.

Project Overview

The goal of this project is to build and compare multiple regression models to predict the prices of used cars. The models included in this analysis are:

Linear Regression
Decision Trees
Random Forest
eXtreme Gradient Boosting (XGBoost)
Deep Neural Networks (DNN)

The performance of each model is evaluated using Root Mean Squared Error (RMSE) as the key metric. Additionally, techniques such as GridSearchCV and Hyperband (Keras Tuner) are used to fine-tune model parameters and achieve optimal performance.

Data Source

The dataset used in this project is publicly available on Kaggle:

US Used Cars dataset (3 million)

Data Cleaning and Preprocessing

Handling Placeholder Characters: The dataset contains placeholder characters represented as --, which are cleaned before further processing.
Missing Values: A comprehensive approach is used to deal with missing data:
- Removal of missing entries in certain columns.
- Mean imputation, mode imputation, and multiple imputation techniques are applied where appropriate.
Categorical Encoding: One-hot encoding is applied to convert categorical variables into a numerical format that can be used by machine learning models.
Normalization: Numerical columns are standardized to ensure consistent scaling across features.
Feature Selection: The SelectKBest method is used to identify the most informative features for predictive modeling, enhancing model performance.

Model Evaluation

After preprocessing, five regression models are trained and evaluated using RMSE as the key metric. The results are as follows:

Random Forest: Best performance with an RMSE of 0.00265.
XGBoost: Second-best performance with an RMSE of 0.00278.
Decision Tree: RMSE of 0.00282, closely following XGBoost.
Deep Neural Network (DNN): RMSE of 0.00283, slightly higher than Decision Tree.
Linear Regression: RMSE of 0.00393, significantly worse than the other models.

Model Tuning

GridSearchCV: Used for hyperparameter tuning of the XGBoost, and Decision Tree models.
Keras Tuner (Hyperband): Applied for optimizing the Deep Neural Network model's parameters.

Conclusion

Among the five models tested, Random Forest outperformed the others, achieving the lowest RMSE, indicating its superior ability to predict used car prices. XGBoost, Decision Tree, and DNN performed similarly, while Linear Regression lagged behind in terms of predictive accuracy.

How to Run the Project

Clone this repository.

git clone https://github.com/asenacak/UsedCarsML.git

Download the dataset from the following link: US Used Cars dataset (3 million).
Install the necessary dependencies.
Run the Jupyter notebooks:
- Used_Cars_Data_Cleaning.ipynb: Data preprocessing and cleaning.
- models_usedcars.ipynb: Model building, evaluation, and hyperparameter tuning.

Dependencies

Python 3
Jupyter Notebook
pandas
scikit-learn
XGBoost
TensorFlow/Keras
Keras Tuner
matplotlib

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
Used_Cars_Data_Cleaning.ipynb		Used_Cars_Data_Cleaning.ipynb
models_usedcars.ipynb		models_usedcars.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Used Cars Price Prediction

Project Overview

Data Source

Data Cleaning and Preprocessing

Model Evaluation

Model Tuning

Conclusion

How to Run the Project

Dependencies

License

About

Releases

Packages

Languages

License

asenacak/UsedCarsML

Folders and files

Latest commit

History

Repository files navigation

Used Cars Price Prediction

Project Overview

Data Source

Data Cleaning and Preprocessing

Model Evaluation

Model Tuning

Conclusion

How to Run the Project

Dependencies

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages