It is expected that the most successful model, which makes the most comprehensive assessment for predicting next hour air pollution using the data, is expected.
This data set includes hourly air pollutants data from 12 nationally-controlled air-quality monitoring sites. The air-quality data are from the Beijing Municipal Environmental Monitoring Center. The meteorological data in each air-quality site are matched with the nearest weather station from the China Meteorological Administration. The time period is from March 1st, 2013 to February 28th, 2017. Missing data are denoted as NA.
- Link: https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data
- Explanation:
- No: row number
- year: year of data in this row
- month: month of data in this row
- day: day of data in this row
- hour: hour of data in this row
- pm2.5: PM2.5 concentration (pollution)
- DEWP: Dew Point
- TEMP: Temperature
- PRES: Pressure
- cbwd: Combined wind direction
- Iws: Cumulated wind speed
- Is: Cumulated hours of snow
- Ir: Cumulated hours of rain
- Merge ('year', 'month', 'day', 'hour' ) columns as 'DateTime' and convert these columns into a timestamp.
- Remove the unwanted columns
- Calculation Null Values and Filling Them with Mean Values
- Finding and Removing Outliers
- Checking correlations between the independent variables
- Split Dataset into training and test data
- Feature scaling- MinMaxScaler & ColumnTransformer
- PCA (Principai Component Analysis)