predicting-air-quality-with-lstm

Contains code for a simple lstm model (implemented using keras from tensorflows tf.keras module) to predict hourly Beijing air quality data.

The data is not included in this repository and can be found here.

If you want to make use of this repository then you will have to download the data and put the path to said data in at line 10 in data_utils.py.

The file lstm_model_weights.h5 is the weights of the model trained on somewhere around 60000 batchs, with a batch size of 8 and sequence lengths between 128 and 256 steps. Likely not enough training but I don't have a gpu.

Results

The mean absolute error over time plots for most attributes being predicted can be seen in the plots directory. The error is the average error from 128 sequence predictions. The x axis is in hours and the different lines are for different stations. All of these attributes were normalized with mean 0 and standard deviation 1 as part of the preprocessing so an error of 0.5 is about 0.5 standard deviations away from the actual value.

The model cannot accurately predict most of the measured atrtibutes more than a few hours out and even then it's not great. The average first hour error is about 0.2 standard deviations which increases quite rapidly in the next few hours. However for most attributes, the error does seem to plateau at under 1 standard deviation which suggests the model isn't making wildley outrageous predictions. Also it seems from the error plots that the network is having predictable spikes in error (the TEMP plot is a good example). Since these spikes correlate across stations and in spite of the somewhat large (128) testing sample size it seems likely that it is a failure of the model to predict cycles in the data (for example the day/night temperature cycle in the TEMP data) and not a random artifact from the testing.

On average the error is roughly similar or uncorrelated between stations, however this does not hold for the wind speed error in Gucheng which is significantly higher than the wind speed error for the rest of the stations which is interesting.

I think that the model still has a lot of room to improve with more training. It still seemed to be improving but after 4-5 days I felt my computer had earned a break.

In conclusion, don't try and use this to predict the pollution levels in Beijing if you have asthma.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data_stats		data_stats
plots		plots
LICENSE		LICENSE
README.md		README.md
data_utils.py		data_utils.py
lstm_model.py		lstm_model.py
lstm_model_weights.h5		lstm_model_weights.h5
testing.py		testing.py
training.py		training.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

predicting-air-quality-with-lstm

Results

About

Releases

Packages

Languages

License

sixChar/predicting-air-quality-with-lstm

Folders and files

Latest commit

History

Repository files navigation

predicting-air-quality-with-lstm

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages