Environmental Insights is a Python package for downloading and visualising air pollution concentration data in the UK and globally. Alongside the downloaded data, a set of functions have also been provided to manipulate the air pollution concentrations and explore air pollution futures. The Python package is a companion to the paper entitled "Environmental Insights: Democratizing Access to Ambient Air Pollution Data and Predictive Analytics with an Open-Source Python Package", with the following abstract:
Ambient air pollution is a pervasive issue with wide-ranging effects on human health, ecosystem vitality, and economic structures. Utilizing data on ambient air pollution concentrations, researchers can perform comprehensive analyses to uncover the multifaceted impacts of air pollution across society. To this end, we introduce Environmental Insights, an open-source Python package designed to democratize access to air pollution concentration data. This tool enables users to easily retrieve historical air pollution data and employ a Machine Learning model for forecasting potential future conditions. Moreover, Environmental Insights includes a suite of tools aimed at facilitating the dissemination of analytical findings and enhancing user engagement through dynamic visualizations. This comprehensive approach ensures that the package caters to the diverse needs of individuals looking to explore and understand air pollution trends and their implications.
This is a contituation of the work started in our recently published paper in Environment and Planning B: Urban Analytics and City Science Estimating annual ambient air pollution using structural properties of road networks
If you have accessed this work from Github, please read the associated paper that describes the use of this package and its purpose, available here:
Due to Github file size limitations, all of the models and data to use this package have been hosted on Google Drive. The link to the Google Drive folder is: https://drive.google.com/drive/folders/18ZLO8XqtFp3c4WrUJVfSH0fmAXFmL8il?usp=sharing
The Google Drive folder relates to a range of research studies outputs, with the package being designed to be used in conjucntion with the data and models. The easiest manner to integrate the data into the package is to put the corresponding google drive contents into the relevant packaghe folder, the mapping between the two is as follows:
- environmental_insights_data/air_pollution/uk_complete_set : Data for A Framework for Scalable Ambient Air Pollution Concentration Estimation
- This directory contains the data for the air pollution concentrations at a 1kmx1km hourly resolution over England as described in the paper A Framework for Scalable Ambient Air Pollution Concentration Estimation
- environmental_insights_data/air_pollution/global_complete_set : Data for A Data-Driven Supervised Machine Learning Approach to Estimating Global Ambient Air Pollution Concentrations With Associated Prediction Intervals
- This directory contains the data for the air pollution concentrations at a 0.25x0.25 degree hourly resolution globally as described in the paper A Data-Driven Supervised Machine Learning Approach to Estimating Global Ambient Air Pollution Concentrations With Associated Prediction Intervals
- environmental_insights_data/feature_vector/uk_typical_day : Data for Environmental Insights
- This directory contains the data for the feature vectors used to predict the air pollution concentrations across England using the typical day framework proposed in this packages companion paper.
- environmental_insights_data/feature_vector/supporting_data : Supporting Data for Environmental Insights
- This directory contains supporting datasets not generated by the research but important to running the tutorial and some other functions of the package.
- environmental_insights_models/uk : Models/UK
- This directory contains the data driven supervised machine learning models for England.
- environmental_insights_models/global : Models/Global
- This directory contains the data driven supervised machine learning models globally.
The code has been validated with Python version 3.9.12. You can install a conda environment with this version with the command "conda create -n "environmental_insight_enviro" python=3.9.12". This will install a conda environment with version 3.9.12 for you to use.
- Step 1: Ensure that jupyter lab is installed. Install instructions are avaliable here.
- Step 2: Run the code in "package_installation.ipynb" to ensure all of the required packages are avaliable.
- Step 2: Run through the "tutorial.ipynb" file, which will explain the basic concepts of the package.
- Step 4: Look through the file in "Documentation" that describe the complete functionality of the package.
The tutorial contains a code snippet that will donwload the required packages for the software via the requirements.txt file. The python packages that are required are:
- lightgbm (3.3.3)
- geopandas (0.14.1)
- pandas (2.1.3)
- scipy (1.11.4)
- matplotlib (3.8.2)
- overpy (0.6)
- shapely (2.0.2)
- pyarrow (14.0.1)
- pyogrio (0.7.2)
While the code may work with other versions of these packages, these packages are the ones testing has been conducted on.
These packages can be installed via code provided in the "package_installation.ipynb" file.
The recommended method of using this package is with a jupyter notebook, which the tutorial for this package is written in. The tutorial is avaliable in the file "tutorial.ipynb". The use of Conda and Jupyter Labs to this end is also recommended.
There are three critical components to the package:
- Data: The data aspects of the work provide access to air pollution concentration data, both in the UK (at a 1kmx1km hourly resolution) and globally (0.25-degree hourly resolution). Further feature vector data is included for both the UK and global models to see the environmental conditions resulting in the model's development and making the predictions it did.
- Models: The model's aspect of the work provides access to the trained LightGBM models. With the feature vector data, you can predict air pollution concentrations. Further, the feature vectors can be changed to explore hypothetical situations such as "What would happen to the air pollution if the average wind speed doubled on a Friday in June in London?"
- Functions: A set of supporting functions has been created to simplify the package's use. This includes accessing the data and models, alongside visualisation and making predictions.
All visualisations made through the function within the program are stored within the directory "environmental_insights_visulisations".
The testing for the package can be found in the "tests" directory making use of the built-in python unittests. To make the use of the tests easier, they are included within the test_workbook.ipynb jupyter notebook.
The documentation for the project is included within the directory "Documentation". The documentation provides the overview of the different functions included in the package.
Using python -m build
will use the pyproject.toml
file to locally build the package and store it within the dist
directory. This can then be installed using pip install dist/*.whl
. The final dist
files can be uploaded to PyPI via twine, using "".