Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weather analysis #571 #583

Merged
merged 25 commits into from
Feb 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions Weather Analysis/Dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Weather Analysis Dataset

The Dataset used here is taken from the Kaggle database website. You can download the file from the link given here, Weather Analysis and Prediction.( https://www.kaggle.com/datasets/mastmustu/weather-analysis)

## About the dataset

The data contains day wise weather attributes from 2009 to July 2020. Our CSV file has 22 columns and 3902 entries(Rows).

**Columns Description**:

- Date
- Average temperature (°F)
- Average humidity (%)
- Average dewpoint (°F)

- Average barometer (in)

- Average windspeed (mph)

- Average gustspeed (mph)

- Average direction (°deg)

- Rainfall for month (in)

- Rainfall for year (in)

- Maximum rain per minute

- Maximum temperature (°F)

- Minimum temperature (°F)

- Maximum humidity (%)

- Minimum humidity (%)

- Maximum pressure

- Minimum pressure

- Maximum windspeed (mph)

- Maximum gust speed (mph)

- Maximum heat index (°F)

- Month
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Weather Analysis/Images/distribution plot 1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Weather Analysis/Images/distribution plot 2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added Weather Analysis/Images/distribution plot 3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3,659 changes: 3,659 additions & 0 deletions Weather Analysis/Model/Weather_Analysis.ipynb

Large diffs are not rendered by default.

111 changes: 111 additions & 0 deletions Weather Analysis/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
<h1>Weather Analysis</h1>

**GOAL**

To build a machine learning model for predicting the Average Rainfall per month for a given atmospheric conditions like temperature, humidity , dewpoint, pressure, windspeed, etc.

**DATASET**

[https://www.kaggle.com/datasets/mastmustu/weather-analysis]

**DESCRIPTION**

To analyze the dataset of Weather Analysis and build and train the model on the basis of different features and variables.

The datasets have a csv file with 3902 entries, 22 columns.

**Columns Description**:

- Date
- Average temperature (°F)
- Average humidity (%)
- Average dewpoint (°F)
- Average barometer (in)
- Average windspeed (mph)
- Average gustspeed (mph)
- Average direction (°deg)
- Rainfall for month (in)
- Rainfall for year (in)
- Maximum rain per minute
- Maximum temperature (°F)
- Minimum temperature (°F)
- Maximum humidity (%)
- Minimum humidity (%)
- Maximum pressure
- Minimum pressure
- Maximum windspeed (mph)
- Maximum gust speed (mph)
- Maximum heat index (°F)
- Month


### Visualization and EDA of different attributes:

<img alt="Distribution" src="./Images/distribution plot 1.png">

<img alt="Distribution" src="./Images/distribution plot 2.png">

<img alt="Regression" src="./Images/avg barometer vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg dewpoint vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg humidity vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg temp vs rainfall per mnth.png">

<img alt="Regression" src="./Images/avg windspeed vs rainfall per mnth.png">

<img alt="Regression" src="./Images/max temp vs rainfall per mnth.png">

<img alt="Regression" src="./Images/month vs rainfall per month.png">


**MODELS USED**

| Model | MSE_train | R2_train | MSE_test | R2_test |
|---------------------------|-----------|----------|-----------|-----------|
|Random Forest Regression | 0.0126 | 0.965291 | 0.082938 | 0.773470 |
|XGBoost Regression | 0.0056 | 0.984504 | 0.089369 | 0.755905 |
|Decision Tree | 0.58e-34 | 1.000000 | 0.144070 | 0.606500 |
|Riddge Regression | 3.58e-34 | 1.000000 | 0.144070 | 0.606500 |
|Linear Regression | 0.274 | 0.243614 | 0.281541 | 0.231021 |
|Elastic Net Regression | 2.94e-01 | 0.190594 | 0.302724 | 0.173166 |
|Neural Network Regression | 0.358 | 0.076272 | 0.405645 |-0.107945 |


**WHAT I HAD DONE**

* Load the dataset which is CSV format.
* It has 3902 entries(Rows), 22 columns.
* Checked for missing values and cleaned the data accordingly.
* Analyzed the data, found insights and visualized them accordingly.
* Found detailed insights of different columns with target variable using plotting libraries.
* Train the datasets by different models and saves their accuracies into a dataframe.


**LIBRARIES NEEDED**

1. Pandas
2. Matplotlib
3. Sklearn
4. NumPy
5. XGBoost
6. Tensorflow
7. Keras
8. Sci-py
9. Seaborn



**CONCLUSION**

- Random Forest and XGBoost Regression models show promising performance with lower MSE and higher R-square values for both training set and dataset.
- Decision Tree Regression achieved perfect R-square on the training set but on the test set it's value is 0.6, indicating overfitting.
- Deep Neural Network (NN) has a high MSE and negative R-square on testing set, approximately zero on training set, suggesting poor performance on both training and test sets.


**YOUR NAME**

*Ghousiya Begum*

[![LinkedIn](https://img.shields.io/badge/linkedin-%230077B5.svg?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/ghousiya-begum-a9b634258/) [![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/ghousiya47)
9 changes: 9 additions & 0 deletions Weather Analysis/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
numpy==1.19.2
pandas==1.4.3
matplotlib==3.7.1
scikit-learn~=1.0.2
scipy==1.5.0
seaborn==0.10.1
xgboost~=1.5.2
tensorflow==2.4.1
keras==2.4.0
Loading