Goal:
The goal of this project is to develop a machine learning model that accurately predicts the ex-showroom price of a car based on various relevant features.
Purpose:
The main purpose of this project is to provide valuable insights into businesses, enabling car buyers and sellers to make informed decisions and optimize their strategies in the competitive automotive industry by analyzing a raw dataset having 140 car attributes.
-
In this rapidly growing market, the significance of valuable resources is increasing exponentially, as they have the potential to enhance convenience, save time, and simplify everyday life. Among these valuable resources, cars play a pivotal role by enabling efficient transportation and reducing human labour.
-
As a result, the automotive industry faces the crucial task of determining the appropriate pricing for their cars before launching them into the market. This is accomplished through a meticulous analysis of various car features such as mileage, horsepower, body type, fuel type, and more.
-
Analyzing car features to determine pricing helps the automotive industry strike a balance between affordability for customers and profitability for the business. It ensures that the prices of cars align with their attributes, performance, and overall value proposition. This approach also facilitates fair competition and enables customers to make well-informed decisions based on their specific requirements and budget.
-
A major challenge encountered in this project was working with a real-world dataset, as prior experience was limited in this domain. Real-world datasets differ from synthetic ones typically used for learning, requiring adaptation to the complexity, size, and noise inherent in the data.
-
Another challenge faced during the project was the quality and organization of the dataset. The dataset contained a considerable number of missing records, which required careful handling to ensure data completeness. Additionally, the variables in the dataset were not consistently organized, resulting in a messy structure. This lack of uniformity made it difficult to perform meaningful analysis and build a machine-learning model directly.
Addressing missing values
: Attributes with more than 70% missing values were removed to preserve data integrity.Splitting dataset
: The dataset was split into categorical and numerical dataframes for precise data cleaning.Feature selection:
52 relevant features were extracted from the categorical dataframe for accurate analysis.Cleaning categorical data:
Categorical attributes were cleansed by handling duplicate values, correcting misspelt entries, and filling in missing values using external sources like Google Search.Numerical data preprocessing:
Various data preprocessing techniques were applied to the numerical dataframe, including handling measurement unit inconsistencies and transforming attributes to a uniform scale for improved modeling.Exploratory Data Analysis (EDA):
EDA was performed to identify trends, patterns, and relationships between independent variables and the target variable.Feature selection
: Statistical techniques such as ANOVA, correlation analysis, and chi-square tests were employed to select key features significantly influencing the car price.Model building:
Multiple algorithms were used to build predictive models, including regression, decision trees, and ensemble techniques.Model evaluation:
Model performance was assessed using metrics and visualized with residual plots.Model comparison and stacking:
Different models were compared, and a stacked model was created using the top performers.Robust model creation:
A robust, accurate model was developed for successful car price prediction.
- The developed predictive model achieved high accuracy and low error rates.
- The model exhibited an impressive R-squared value of 97% and an adjusted R-squared of 96%, indicating its ability to explain and account for the majority of the variance in car ex-showroom prices.
- The model demonstrated precise estimations with a low root mean squared error (RMSE) of 0.0001506, indicating minimal deviation between predicted and actual prices.
- These results validate the effectiveness of the model in accurately estimating car ex-showroom prices, providing valuable insights to both car buyers and sellers in the competitive automotive market.
- 💻 Python
- 💻 HTML
- 🐼 Pandas
- 📊 Matplotlib
- 📈 Seaborn
- 📈 Statistics
- 🤖 Scikit-learn
- 🧠 Machine Learning
- 📓 Jupyter Notebook
- 🔗 GitHub
- 📊 Power BI
- The project has reached completion, successfully meeting the predefined goals and purposes.
- All project objectives have been accomplished, including end-to-end execution from data collection and preprocessing to model development and evaluation.
Contributions are welcome! If you have any suggestions, bug fixes, or feature additions, please open an issue or submit a pull request.
For any questions or inquiries, please contact kumod.aws@gmail.com or you can contact me on LinkedIn.
Thank you for checking out my repository! I hope you find the projects and code provided helpful and informative. If you have any questions or suggestions, please feel free to reach out.😊