-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AIR Quality Prediction Model #45
Closed
Closed
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
|
||
## DATASET | ||
|
||
https://www.kaggle.com/rohanrao/air-quality-data-in-india | ||
|
||
|
||
- This Data Set contains file **city_day.csv** that has been used only | ||
- This File has 16 Columns X 29531 Rows | ||
|
Binary file added
BIN
+72.7 KB
...rojects/Air_Quality_Prediction_Model/IMAGES/most_polluted_cities_post_covid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+74.6 KB
ML/Projects/Air_Quality_Prediction_Model/IMAGES/most_polluted_cities_pre_covid.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions
1
ML/Projects/Air_Quality_Prediction_Model/MODEL/air-quality-prediction.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
|
||
|
||
|
||
# **Air Quality Prediction** | ||
|
||
|
||
|
||
## **AIM** | ||
|
||
- Brief approach for this project from a basic level upto highest possible accuracy. | ||
|
||
|
||
## **DATASET** | ||
|
||
https://www.kaggle.com/rohanrao/air-quality-data-in-india | ||
|
||
|
||
|
||
## **DESCRIPTION** | ||
|
||
|
||
|
||
The main aim of the project is to use 3-4 algorithms to implement the models and compare all the algorithms to find out the best fitted algorithm for the model by checking the accuracy score. | ||
|
||
|
||
|
||
## **WHAT I DID** | ||
|
||
* Analyzed the data and found insights such as correlation, missing values and plotted different plots and compared them with pre and post covid times. | ||
* Next trained model with algorithms: | ||
> * SVM | ||
> * Random Forest | ||
> * XGBoost | ||
* In this XGBoost performed the best with 100% accuracy | ||
|
||
|
||
## **MODELS USED** | ||
|
||
> 1. RBF SVM : SVM performs well on classification problems when size of dataset is not too large. Support Vector Machine can also be used as a regression method, maintaining all the main features that characterize the algorithm (maximal margin). | ||
> 2. Random Forest : It **provides higher accuracy through cross validation**. Random forest regressor will handle the missing values and maintain the accuracy of a large proportion of data. If there are more trees, it won't allow over-fitting trees in the model. | ||
> 3. XGBoost : XGBoost is **a library for developing fast and high performance gradient boosting tree models**. XGBoost achieves the best performance on a range of difficult machine learning tasks. | ||
|
||
**LIBRARIES NEEDED** | ||
|
||
* Numpy | ||
* Pandas | ||
* Matplotlib | ||
* scikit-learn | ||
* xgboost | ||
* seaborn | ||
* missingno | ||
* chart_studio | ||
* cufflinks | ||
|
||
|
||
|
||
## **VISUAL PLOTS** | ||
|
||
|
||
![](/ML/Projects/Air_Quality_Prediction_Model/IMAGES/most_polluted_cities_pre_covid.jpg) | ||
|
||
|
||
![](/ML/Projects/Air_Quality_Prediction_Model/IMAGES/most_polluted_cities_post_covid.jpg) | ||
|
||
|
||
|
||
### **Evaluation Result** | ||
|
||
| Model | Accuracy (%) | | ||
|----------------------------|----------------------| | ||
|Support Vector Machine (SVM)| 96.12 | | ||
| Random Forest | 99.92 | | ||
| XGBoost | 98.67 | | ||
|
||
|
||
|
||
## **CONCLUSION** | ||
|
||
|
||
|
||
We investigated the data, checking for data unbalancing, visualizing the features, and understanding the relationship between different features. We made a comparision of data between pre covid times (between 2015 to 2019) and Post covid times (after 2020). We then investigated three predictive models. | ||
We saw we had imbalanced dataset. This will cause data imbalance problem. In order to overcome this problem we use the technique called SMOTE(Synthetic Minority Oversampling Technique). This approach solve this problem by oversample the examples in the minority class. | ||
|
||
We predicted with models SVM, Random Forest and XGBoost and found accuracy of 96.15%, 99.89% and 100% on test dataset. | ||
|
||
We also concluded that : | ||
1. Vehicular pollution contents are more related to air quality index. | ||
2. Delhi is the most polluted city in terms of vehicular pollution contents. | ||
3. Ahmadabad is the most polluted city in terms of industrial pollution content. | ||
4. After COVID19 pandemic there is gradual dicrease in vehicular pollution contents, industrial pollution content. | ||
5. Extra Gradient Boost classifier 100% accurately classify the target variable. | ||
|
||
### Contributor : | ||
*Harsh Raj* | ||
|
||
*Avdhesh-Varshney* (Mentor) | ||
|
||
### Connect with me: | ||
[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/harsh-raj-58921728b/) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/HarshRaj29004) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
matplotlib==3.5.2 | ||
seaborn==0.11.2 | ||
numpy==1.19.2 | ||
pandas==1.4.3 | ||
sklearn==1.1.1 | ||
lightgbm==3.3.2 | ||
plotly==5.9.0 | ||
category_encoders=2.5.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
13 changes: 0 additions & 13 deletions
13
...s at Ten’s Place and the Least Significant Digit of the Entered Integer at One’s Place.py
This file was deleted.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow each and every heading of readme template.