Deep neural networks (DNNs) are extraordinarily versatile artificial intelligence models that have achieved widespread use over the last five years. These neural networks excel at automated feature creation and processing of complex data types like images, audio, and free-form text. Common business use cases for DNNs include:
- Determining whether an uploaded video, audio, or text file contains inappropriate content
- Inferring a user's intent from their spoken or typed input
- Identifying objects or persons in a still image
- Translating speech or text between languages or modalities
Unfortunately, DNNs are also among the most time- and resource-intensive machine learning models. Whereas a trained linear regression model results can typically score input in negligible time, applying a DNN to a single file of interest may take hundreds or thousands of milliseconds -- a processing rate insufficient for some business needs. Fortunately, DNNs can be applied in parallel and scalable fashion when evaluation is performed on Spark clusters.
This repository demonstrates how trained DNNs produced with two common deep learning frameworks, Microsoft's Cognitive Toolkit (CNTK) and Google's TensorFlow, can be operationalized on Spark to score a large image set. Files stored on Azure Data Lake Store, Microsoft's HDFS-based cloud storage resource, are processed in parallel by workers on the Spark cluster. The guide follows a specific example use case: land use classification from aerial imagery.
To get started right away,
- Follow the instructions in the Image Set Preparation notebook to generate the training and validation datasets.
- If you will use our provided image sets, you only need to complete the "Prepare an Azure Data Science Virtual Machine for image extraction" and "Dataset preparation for deep learning" sections.
- If you seek a CNTK Spark operationalization example that doesn't require image set preparation or VM deployment, you may prefer this walkthrough instead. A brief description of the technique is included in this blog post.
- If you want to retrain an image classification DNN using transfer learning, complete the Model Training notebook.
- You can skip this step if you choose to use our example DNNs.
- If you want to operationalize trainedDNNs on Spark, complete the Scoring on Spark notebook.
- If you want to learn how the retrained DNN can be used to study urban development trends, see the Middlesex County Land Use Prediction page.
- For the motivation and summary of our work, see below.
In this guide, we develop a classifier that can predict how a parcel of land has been used -- e.g., whether it is developed, cultivated, forested, etc. -- from an aerial image. We apply the classifier to track recent land development in Middlesex County, MA: the home of Microsoft's New England Research and Development (NERD) Center. Aerial image classification has many important applications in industry and government, including:
- Enforcing tax codes (cf. identification of home pools in Greece)
- Monitoring agricultural crop performance
- Quantifying the impact of climate change on natural resources
- Property value estimation and feature tracking for marketing purposes
- Geopolitical surveillance
This use case was chosen because sample images and ground-truth labels are available in abundance. We use aerial imagery provided by the U.S. National Agriculture Imagery Program, and land use labels from the National Land Cover Database. NLCD labels are published roughly every five years, while NAIP data are collected more frequently: we were able to apply our land use classification DNN to images collected five years after the most recent training data available. For more information on dataset creation, please see the Image Set Preparation Jupyter notebook.
We applied transfer learning to retrain the final layers of existing TensorFlow (ResNet) and CNTK (AlexNet) models for classification of 1-meter resolution NAIP aerial images of 224 meter x 224 meter regions selected from across the United States. Retraining was performed on Azure N-Series GPU VMs with the Deep Learning Toolkit pre-installed. We created balanced training and validation sets containing aerial images in six major land use categories (Developed, Cultivated, Forest, Shrub, Barren, and Herbaceous) from non-neighboring counties and collection years. For more information on model creation, please see the Model Training Jupyter notebook.
We used Spark to apply the trained CNTK and TensorFlow models to the 11,760 images in the validation set. Spreading the scoring task across multiple worker nodes allowed us to decrease the total time required to under one minute:
Our retrained models achieved an overall classification accuracy of ~80% on these six categories, with the majority of errors occurring between different types of undeveloped land (see the confusion matrix for the CNTK model's predictions, below):
For a subsequent application -- identifying and quantifying recently-developed land -- we further grouped these land use labels into "Developed," "Cultivated," and "Undeveloped" classes. Our model's overall accuracy at predicting these higher-level labels was roughly 95% in our validation set. For more information on model validation on Spark, see the Scoring on Spark Jupyter notebook.
The trained land use models were applied to 2016 aerial images tiling Middlesex County. The predicted 2016 labels were then compared to the ground-truth 2011 labels to identify putative regions of recent development: such an application may be useful for regulatory bodies seeking to automatically identify new structures or cultivated land in remote locations. Example results (with surrounding tiles for context) are included below:
Development could also be visualized and quantified at the county level. In the figure below, regions classified as developed land are represented by red pixels, cultivated land by white pixels, and undeveloped land by green pixels.
The predicted land classes largely matched the true 2011 labels. Unfortunately, noisy year-to-year variation (likely reflecting differences in coloration and vegetation) were too large in magnitude to quantify general trends in development.
For more information on inferring recent land development with our trained DNNs, please see the Middlesex County Land Use Prediction page.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
The code in this repository is shared under the MIT and Apache licenses included in this directory. Some TensorFlow scripts have been adapted from the TensorFlow Models repository's slim subdirectory (indicated where applicable). Cognitive Toolkit (CNTK) scripts for network definition and training have been adapted from the CIFAR-10 Image Classification example.