-
-
Notifications
You must be signed in to change notification settings - Fork 216
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #628 from imsuraj675/readme-branch
[README Enhancement]: Advertisement Click Prediction
- Loading branch information
Showing
1 changed file
with
80 additions
and
47 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,48 +1,81 @@ | ||
### Project Title | ||
Advertisement Click Prediction | ||
### Aim | ||
Predict the clicks on the advertisement depending on different attributes and user inputs of the dataset. | ||
### Dataset | ||
https://www.kaggle.com/jahnveenarang/cvdcvd-vd | ||
### Approach | ||
Initially Exploratory Data Analysis and Data Visulaization is performed on the dataset. Then by applying various algorithms on the dataset, we are going to predict whether the user will click on the advertisement or not. Finally the accuracies of all algorithms are compared and found the best fitted model. | ||
### Steps Involved | ||
- All the necessary libraries are imported | ||
- Performing EDA on the data to understand it | ||
- Data Visualization to visualize the data and get meaningful insights | ||
- Correlation of all features are found to understand the relationship between each feature | ||
- Categorical features are converted into numerical features using feature mapping | ||
- The dataset is split into training and test data and scaled | ||
- Model Building: | ||
We use four algorithms to build the models | ||
- XGBoost Classifier | ||
- Random Forest Classifier | ||
- Gradient Boosting | ||
- Multi Layer Perceptron | ||
- After fitting these models, we analyze the confusion matrix and compare the accuracies of all algorithms. | ||
### Data Visualization and Correlation | ||
|
||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/gender.png"> | ||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/purchased.png"> | ||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/age-purchased.png"> | ||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/salary-purchased.png"> | ||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/purchased-gender.png"> | ||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/box-purchased-salary.png"> | ||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/box-purchased-age.png"> | ||
<img src='https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/correlation.png'> | ||
|
||
### Accuracies | ||
- XGBoost Classifier - 92% | ||
- RandomForest - 90% | ||
- Gradient Boosting - 90% | ||
- Multi-Layer Perceptron - 87% | ||
<img src = 'https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/accuracy.png'> | ||
|
||
### Language Used - Python | ||
### Libraries Used - pandas, seaborn, numpy, matplotlib | ||
### Conclusion | ||
Among all the models, XGBoost Classifier model gave almost 92% accuracy and it is the best fitted model. | ||
<hr> | ||
|
||
Code contributed by SNEGA S | ||
## **Advertisement Click Prediction** | ||
|
||
### 🎯 **Goal** | ||
The objective is to predict whether a user will click on an advertisement based on various attributes and user inputs from the dataset. By analyzing these features, the aim is to develop a model that accurately forecasts user behavior in response to advertisements. | ||
|
||
### 🧵 **Dataset** | ||
Link for the dataset used in the project: [`https://www.kaggle.com/jahnveenarang/cvdcvd-vd`](https://www.kaggle.com/jahnveenarang/cvdcvd-vd) | ||
|
||
### 🧾 **Description** | ||
We start with *Exploratory Data Analysis (EDA)* and *Data Visualization* to gain insights from the dataset. Then, we apply various machine learning algorithms to predict whether a user will click on an advertisement. Finally, we compare the accuracies of these algorithms to identify the best-performing model. | ||
|
||
### 🧮 **What I had done!** | ||
- Imported essential libraries for data manipulation and machine learning. | ||
- Conducted Exploratory Data Analysis (EDA) to comprehend the dataset. | ||
- Visualized data to extract meaningful patterns and insights. | ||
- Assessed feature correlations to understand interdependencies. | ||
- Converted categorical features into numerical formats via feature mapping. | ||
- Split the dataset into training and testing sets and applied scaling techniques. | ||
- Implemented and trained four machine learning models: **XGBoost**, **Random Forest**, **Gradient Boosting**, and **Multi-Layer Perceptron**. | ||
- Evaluated the models using confusion matrices and compared their accuracies to determine the best-performing model. | ||
|
||
### 🚀 **Models Implemented** | ||
Model Building: We implemented the following algorithms for their distinct advantages in handling various aspects of the dataset: | ||
|
||
- XGBoost Classifier: Known for its high performance and efficiency in handling large datasets with complex patterns. | ||
- Random Forest Classifier: Effective in reducing overfitting and providing reliable feature importance insights. | ||
- Gradient Boosting: Powerful for capturing intricate data relationships and improving accuracy through boosting techniques. | ||
- Multi-Layer Perceptron: Capable of capturing non-linear relationships due to its deep learning architecture. | ||
|
||
### 📚 **Libraries Needed** | ||
- Language Used | ||
- Python | ||
- Libraries Used | ||
- Pandas | ||
- Seaborn | ||
- Numpy | ||
- Matplotlib | ||
|
||
### 📊 **Exploratory Data Analysis Results** | ||
<table> | ||
<tr> | ||
<td><img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/gender.png"></td> | ||
<td><img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/purchased.png"></td> | ||
</tr> | ||
<tr> | ||
<td><img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/age-purchased.png"></td> | ||
<td><img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/salary-purchased.png"></td> | ||
</tr> | ||
<tr> | ||
<td><img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/box-purchased-salary.png"></td> | ||
<td><img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/box-purchased-age.png"></td> | ||
</tr> | ||
<tr> | ||
<td><img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/purchased-gender.png"></td> | ||
<td><img width=70% src='https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/correlation.png'></td> | ||
</tr> | ||
</table> | ||
|
||
### 📈 **Performance of the Models based on the Accuracy Scores** | ||
<table> | ||
<tr> | ||
<td style="padding-right: 20px; vertical-align: top;"> | ||
<ul style="list-style-type: disc; margin: 0;"> | ||
<li>XGBoost Classifier - 92%</li> | ||
<li>RandomForest - 90%</li> | ||
<li>Gradient Boosting - 90%</li> | ||
<li>Multi-Layer Perceptron - 87%</li> | ||
</ul> | ||
</td> | ||
<td style="vertical-align: top;"> | ||
<img src="https://github.com/snega16/ML-Crate/blob/snega16/Advertisement%20Click%20Prediction/Images/accuracy.png" alt="Description of image" style="max-width: 200px; max-height: 200px;"> | ||
</td> | ||
</tr> | ||
</table> | ||
|
||
|
||
### 📢 **Conclusion** | ||
Among all the models tested, the **XGBoost Classifier** achieved the highest accuracy, approximately **92%**, making it the best-performing model for predicting advertisement clicks. This demonstrates its effectiveness in handling the dataset and providing reliable predictions. | ||
|
||
### ✒️ **Your Signature** | ||
Created by [Suraj Kashyap](https://github.com/imsuraj675) as a part of SSOC'24. |