Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIR Quality Prediction Model #45

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions ML/Projects/Air_Quality_Prediction_Model/DATASET/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

## DATASET

https://www.kaggle.com/rohanrao/air-quality-data-in-india


- This Data Set contains file **city_day.csv** that has been used only
- This File has 16 Columns X 29531 Rows

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

99 changes: 99 additions & 0 deletions ML/Projects/Air_Quality_Prediction_Model/README.md
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow each and every heading of readme template.

Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@



# **Air Quality Prediction**



## **AIM**

- Brief approach for this project from a basic level upto highest possible accuracy.


## **DATASET**

https://www.kaggle.com/rohanrao/air-quality-data-in-india



## **DESCRIPTION**



The main aim of the project is to use 3-4 algorithms to implement the models and compare all the algorithms to find out the best fitted algorithm for the model by checking the accuracy score.



## **WHAT I DID**

* Analyzed the data and found insights such as correlation, missing values and plotted different plots and compared them with pre and post covid times.
* Next trained model with algorithms:
> * SVM
> * Random Forest
> * XGBoost
* In this XGBoost performed the best with 100% accuracy


## **MODELS USED**

> 1. RBF SVM : SVM performs well on classification problems when size of dataset is not too large. Support Vector Machine can also be used as a regression method, maintaining all the main features that characterize the algorithm (maximal margin).
> 2. Random Forest : It **provides higher accuracy through cross validation**. Random forest regressor will handle the missing values and maintain the accuracy of a large proportion of data. If there are more trees, it won't allow over-fitting trees in the model.
> 3. XGBoost : XGBoost is **a library for developing fast and high performance gradient boosting tree models**. XGBoost achieves the best performance on a range of difficult machine learning tasks.

**LIBRARIES NEEDED**

* Numpy
* Pandas
* Matplotlib
* scikit-learn
* xgboost
* seaborn
* missingno
* chart_studio
* cufflinks



## **VISUAL PLOTS**


![](/ML/Projects/Air_Quality_Prediction_Model/IMAGES/most_polluted_cities_pre_covid.jpg)


![](/ML/Projects/Air_Quality_Prediction_Model/IMAGES/most_polluted_cities_post_covid.jpg)



### **Evaluation Result**

| Model | Accuracy (%) |
|----------------------------|----------------------|
|Support Vector Machine (SVM)| 96.12 |
| Random Forest | 99.92 |
| XGBoost | 98.67 |



## **CONCLUSION**



We investigated the data, checking for data unbalancing, visualizing the features, and understanding the relationship between different features. We made a comparision of data between pre covid times (between 2015 to 2019) and Post covid times (after 2020). We then investigated three predictive models.
We saw we had imbalanced dataset. This will cause data imbalance problem. In order to overcome this problem we use the technique called SMOTE(Synthetic Minority Oversampling Technique). This approach solve this problem by oversample the examples in the minority class.

We predicted with models SVM, Random Forest and XGBoost and found accuracy of 96.15%, 99.89% and 100% on test dataset.

We also concluded that :
1. Vehicular pollution contents are more related to air quality index.
2. Delhi is the most polluted city in terms of vehicular pollution contents.
3. Ahmadabad is the most polluted city in terms of industrial pollution content.
4. After COVID19 pandemic there is gradual dicrease in vehicular pollution contents, industrial pollution content.
5. Extra Gradient Boost classifier 100% accurately classify the target variable.

### Contributor :
*Harsh Raj*

*Avdhesh-Varshney* (Mentor)

### Connect with me:
[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/harsh-raj-58921728b/) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/HarshRaj29004)
8 changes: 8 additions & 0 deletions ML/Projects/Air_Quality_Prediction_Model/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
matplotlib==3.5.2
seaborn==0.11.2
numpy==1.19.2
pandas==1.4.3
sklearn==1.1.1
lightgbm==3.3.2
plotly==5.9.0
category_encoders=2.5.0
8 changes: 4 additions & 4 deletions ML/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,10 @@

- A variety of projects to demonstrate real-world applications of AI algorithms. These projects include datasets, code, and step-by-step explanations to help users apply their knowledge and develop practical skills.

| S.No | Project | S.No | Project | S.No | Project |
|-------|---------|-------|---------|------|---------|
| 1 | | 2 | | 3 | |
| 4 | | 5 | | 6 | |
| S.No | Project | S.No | Project | S.No | Project |
|------|----------------------------|------|----------------------------|------|----------------------------|
| 1 |Air_Quality_Prediction_Model| 2 | | 3 | |
| 4 | | 5 | | 6 | |


<div align="center">
Expand Down

This file was deleted.