By Yi Jie WONG & Yin-Loon Khor et al
This code is part of our solution for 2024 IEEE BigData Cup: Building Extraction Generalization Challenge (IEEE BEGC2024). Specifically, this repository provides the code to extract additional building footprint data from the Microsoft Building Footprint (BF) dataset for Redmond, Washington, and Las Vegas, Nevada. We use the extracted dataset to train our YOLOv8-based instance segmentation model, along with the training set provided by the IEEE BEGC2024 dataset. Results show that YOLOv8 trained on BEGC2024 with the additional dataset achieves a significant F1-score improvement compared to training on the BEGC2024 training set alone. Our approach ranked 1st globally in the IEEE Big Data Cup 2024 - BEGC2024 challenge! 🏅🎉🥳
Conda environment
conda create --name yolo python=3.10.12 -y
conda activate yolo
Clone this repo
# clone this repo
git clone https://github.com/yjwong1999/RSBuildingExtraction.git
cd RSBuildingExtraction
Install dependencies
# Please adjust the torch version accordingly depending on your OS
pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
# Install Jupyter Notebook
pip install jupyter notebook==7.1.0
# Remaining dependencies (for instance segmentation)
pip install ultralytics==8.1
pip install pycocotools
pip install requests==2.32.3
pip install click==8.1.7
pip install opendatasets==0.1.22
Since we uses YOLO as our segmentation model, we have to organize our dataset following the YOLO format. The setup_data.py
code will automatically take the raw data from Kaggle and convert it into YOLO format. The mydata
directory will store the training data for our YOLO model. We also put the additional dataset (i.e. Microsoft Building Footprint Dataset, diffusion augmentation) into mydata
.
RSBuildingExtraction/mydata
├── train
│ └── images
│ └── labels
├── valid
│ └── images
│ └── labels
Model | Pretrained Weights | Batch Size | Params (M) | FLOPs (G) | Public F1-Score | |
---|---|---|---|---|---|---|
Conf = 0.50 | Conf = 0.20 | |||||
YOLOv8n-seg | DOTAv1 Aerial Detection | 16 | 3.4 | 12.6 | 0.510 | 0.645 |
YOLOv8s-seg | 16 | 11.8 | 42.6 | 0.535 | 0.654 | |
YOLOv8m-seg | 16 | 27.3 | 110.2 | 0.592 | 0.649 | |
YOLOv8x-seg | 8 | 71.8 | 344.1 | 0.579 | 0.627 | |
YOLOv9c-seg | COCO Segmentation | 4 | 27.9 | 159.4 | 0.476 | 0.577 |
Mask R-CNN (MPViT-Tiny) | COCO Segmentation | 4 | 17 | 196.0 | - | 0.596 |
EfficientNet-b0-YOLO-seg | ImageNet | 4 | 6.4 | 12.5 | - | 0.560 |
Solution | FLOPS (G) | F1-Score | |
---|---|---|---|
Public | Private | ||
YOLOv8m-seg + BEGC 2024 | 110.2 | 0.64926 | 0.66531 |
YOLOv8m-seg + BEGC 2024 + Redmond Dataset | 0.65951 | 0.67133 | |
YOLOv8m-seg + BEGC 2024 + Las Vegas Dataset | 0.68627 | 0.70326 | |
YOLOv8m-seg + BEGC 2024 + Diffusion Augmentation | 0.67189 | 0.68096 | |
2nd place (RTMDet-x + Alabama Buildings Segmentation Dataset) | 141.7 | 0.6813 | 0.68453 |
3rd Place (Custom Mask-RCNN + No extra Dataset) | 124.1 | 0.59314 | 0.60649 |
- We extract our "Redmond dataset" and "Las Vegas dataset" from the Microsoft Building Footprint dataset (please refer the details from our paper). Meanwhile, please refer our segmentation-guided diffusion model to see how we implement our diffusion augmentation pipeline.
- Note that the 2nd-place solution uses a bigger model (higher FLOPs) with an additional dataset to reach a high F1 score, whereas our diffusion augmentation pipeline allows our model (lower FLOPs) to achieve a surprisingly close F1 score without an additional dataset.
Dataset | Private F1 Score (using different NMS IoU Threshold) |
|||||
---|---|---|---|---|---|---|
0.70 | 0.75 | 0.80 | 0.85 | 0.90 | 0.95 | |
BEGC2024 + Redmond Dataset | 0.672 | 0.677 | - | - | 0.748 | 0.866 |
BEGC2024 + Las Vegas Dataset | 0.703 | 0.693 | 0.686 | 0.721 | 0.766 | 0.897 |
BEGC2024 + Diffusion Augmentation | 0.681 | - | 0.694 | 0.711 | 0.751 | 0.887 |
We thank the following works for the inspiration of our repo!
- 2024 IEEE BigData Cup: Building Extraction Generalization Challenge link
- Ultralytic YOLO code
- MPViT-based Mask RCNN code
- COCO2YOLO format original code, modified code
Our paper has been accepted by IEEE BigData 2024! Please cite our paper if this repo helps your research. The preprint is available here
@InProceedings{Wong2024,
title = {Cross-City Building Instance Segmentation: From More Data to Diffusion-Augmentation},
author = {Yi Jie Wong and Yin-Loon Khor and Mau-Luen Tham and Ban-Hoe Kwan and Anissa Mokraoui and Yoong Choon Chang},
booktitle={2024 IEEE International Conference on Big Data (Big Data)},
year={2024}}