UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web [WWW 2024]

This repo is the implementation of our manuscript entitled UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web (Accepted by the Web Conference 2024).

This repository will be kept under development for better usage. The dataset is under refinement (Part of the current data could be pseudo-data for testing only), but our team has also released a toolkit named UrbanCLIP Dataset Toolkit, a comprehensive tool chain designed to facilitate the collection, processing, and integration of satellite imagery and associated metadata for urban analysis.

Stay tuned for more updates!

【NEWS！】 Our team extended our work to a more comprehensive scope. More details can be found in the paper entitled UrbanVLP: A Multi-Granularity Vision-Language Pre-Trained Foundation Model for Urban Indicator Prediction, where we will release the dataset and code base soon.

【NEWS！】 Our team investigated the Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook. Welcome any feedback!

Data Directory

data/
├── captions/
|   ├── Beijing_captions.json # image-text pairs
|   ├── Shanghai_captions.json
|   ├── Guangzhou_captions.json
|   ├── Shenzhen_captions.json
└── downstream_task/
|   ├── downstream.csv # downstream task data
└── images/ # image data
|   ├── Beijing
|       ├── 16_12672_4745_s.jpg
|       ├── 16_12677_4730_s.jpg
|   ├── Shanghai
|   ├── Guangzhou
|   ├── Shenzhen

Data Example

Garbage in, garbage out! Please spend more time on data double-checking, cleaning, and refinement!

{
      "caption": "The image depicts a large, open field with a train track running through the middle of it",
      "image": "Beijing/16_12677_4730_s.jpg"
}

Usage

# Pretraining (example command line shown as follows)
CUDA_VISIBLE_DEVICES=7 python main.py --pretrained_model mscoco_finetuned_laion2B-s13B-b90k --dataset Beijing_captions --lr XXX --batch_size XXX --epoch_num XXX

# Downstream Task1: Indicator prediction (example command line shown as follows)
CUDA_VISIBLE_DEVICES=7 python mlp.py --indicator carbon --dataset Beijing --test_file ./data/downstream_task/Beijing_test.csv --pretrained_model  ./checkpoints/BJ.bin

# Downstream Task2: Location description generation (example command line shown as follows)
CUDA_VISIBLE_DEVICES=3 python caption.py --pretrained_model ./checkpoints/GZ_16/best_model.bin --dataset XXX

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
checkpoints/downtask2		checkpoints/downtask2
data		data
.gitignore		.gitignore
README.md		README.md
caption.py		caption.py
main.py		main.py
mlp.py		mlp.py
model_init.py		model_init.py
models.py		models.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web [WWW 2024]

Data Directory

Data Example

Usage

About

Releases

Packages

Contributors 2

Languages

StupidBuluchacha/UrbanCLIP

Folders and files

Latest commit

History

Repository files navigation

UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web [WWW 2024]

Data Directory

Data Example

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages