Machine learning project developed at Insight Data Science, 2019 AI session. A slide deck is available here.
In August 2018, AB-2681 Seismic safety: potentially vulnerable buildings was passed. This bill requires the state of California to identify all potentially vulnerable buildings before Jan 1, 2021.
One important type of vulnerable buildings are those with soft stories. A soft story is classified as a level that is less than 70% as stiff as the floor immediately above it.
In this project, I built an application that uses Google Street View images and computer vision techniques as well as classical machine learning to determine whether a given building address has a soft story.
On a high level, the model training consists of three separate steps:
- Obtain Training Images
- Download Street View images from all buildings in the San Francisco soft story property list.
- Object Segmentation
- Detect Houses
- Isolate Houses
- Detect Openings
- Classification
- Identify number of stories via K-means clustering
- Compute softness-score as the quotient of the total width of openings on the second story over the total width of openings on the first story.
Based on the softness-score, buildings are either classified as soft or non_soft.
The model uses two supervised image detection deep learning approaches (both based on YOLOv3) located in Detector_Training.
- Train House Identifier
- Manually label houses using VoTT.
- Use transfer learning to train a YOLOv3 detector.
- Train Opening Identifier
- Use the CMP facade dataset.
- Use transfer learning to train a YOLOv3 detector.
The model also uses un-supervised K-means clustering in the final classification step.
1_Pre_Processing
: All Preprocessing Tasks2_Computer_Vision
: Both Image Segmentation Tasks3_Classification
: Final Classification TaskData
: Input Data, Output Data and Results
The code uses python 3.6, Keras with Tensorflow backend. Training was performed on an AWS p2.xlarge instance (Tesla K80 GPU with 12GB memory). Inference is faster on a GPU (~5 images per second on the same setup), but also works fine on a modest CPU setup (~0.3 images per second on an AWS t2.medium with a 2 VCPUs and 4GB of memory). To run this code on AWS, it is recommended to use the Deep Learning AMI
(this makes sure that all GPU drivers are working).
Clone this repo with:
git clone https://github.com/AntonMu/EQanalytics
cd EQanalytics/
Create Virtual Environment (Venv needs to be installed on your system).
python3 -m venv env
source env/bin/activate
Next, install all required packages. If you are running EQanalytics on a machine with GPU run:
pip3 install -r requirements.txt
Otherwise, run:
pip3 install -r requirements_cpu.txt
To get started on a minimal example on two images located in Data/Minimal_Example
run the Minimal_Example.py
script.
python Minimal_Example.py
The outputs of all detections are saved in the Data/Minimal_Example
folder. This includes:
- Results of the housing detector
- Cropped housing images
- Results of the opening detector
- Results of the level detector
- Softness scores located in
Softness_Scores.csv
To run a full model, follow the individual instructions located in 1_Pre_Processing, 2_Computer_Vision and 3_Classification, respectively. To retrain detectors navigate to 2_Computer_Vision/Detector_Training.
Note that the Data folder is populated with a small set of sample in- and outputs for each step and thus all scripts can be run independently. For example, it is possible to run scripts in 2_Computer_Vision without having previously run the pre-processing step in 1_Pre_Processing.
Unless explicitly stated at the top of a file, all code is licensed under the MIT license.
This repo makes use of ilmonteux/logohunter which itself is inspired by qqwweee/keras-yolo3.
If you are having trouble with getting cv2 to run, try:
apt-get update
apt-get install -y libsm6 libxext6 libxrender-dev
pip install opencv-python