This Python project aims to evaluate and predict the health of vegetation in a specific region, utilizing satellite data and machine learning techniques. The project employs NASA's EarthExplorer's Landsat 8 satellite imagery for the city of Tempe, Arizona, focusing on data from the years I attended Arizona State Univeristy.
-
Data Acquisition: Using the
landsatxplore
Python library, we interface with NASA's EarthExplorer API to retrieve Landsat 8 scenes that cover the specified location and date range. We then download the corresponding.tar
files and extract them. -
Data Processing: Each scene is processed to calculate the Normalized Difference Vegetation Index (NDVI), a common indicator of plant health. The red and near-infrared (NIR) bands from each scene are leveraged for this purpose.
-
Data Analysis: After processing all scenes, we compute the average NDVI value for each scene and generate a time-series plot of these values to visualize changes in vegetation health over the year.
-
Machine Learning Prediction: We utilize LSTM model to forecast future NDVI values. LSTM, a type of recurrent neural network, works particularly well with sequential data. We assess the model's performance using metrics such as Root Mean Squared Error (RMSE).
The output is a graphical visualization of the NDVI values over time, combined with the predictive models' forecasts. This time-series analysis allows us to track and predict changes in the health of vegetation over a certain period.
This project serves as a strong starting point for more comprehensive analyses and sophisticated methodologies:
-
Land Cover Type Filtering: Future analyses could be more granular by specifically filtering for certain land cover types (like forests, croplands, etc.), providing more targeted insights into different types of vegetation health.
-
Advanced Machine Learning Models: More accurate predictions could be achieved by implementing more complex time-series forecasting models such as:
- Prophet: Prophet is a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.
- VAR (Vector Autoregression): This model is used when two or more time series influence each other. That is, the variables interact with each other.
- State Space Models: These are a flexible class of models that can capture a wide range of time series patterns.
-
Inclusion of Ancillary Data: Additional environmental data such as climate (temperature, precipitation) or human activity (land use change, urban development) could be integrated into the model to help explain and predict changes in vegetation health.
-
Spatial Analysis: Investigate spatial patterns and changes in NDVI over the study area. This could include identifying areas of greatest change or areas of consistently high or low NDVI.
Remove the extension .example
from config.yml.example
and update with your username and password.
If you would like to change your location and time period you can do so through config.yml
. Run download.py
to retrieve and extract the necessary Landsat 8 scenes. Next, execute the Jupyter notebook for data processing, analysis, and machine learning predictions.