(Update 08/23, 2023)
We have uploaded pretrained checkpoints in this link.
Meta-trained checkpoints for each fold are included in TRAIN
directory, and fine-tuned checkpoints for each task (and each channel) are included in FINETUNE
directory.
(News) Our paper received the Outstanding Paper Award in ICLR 2023!
This repository contains official code for Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching (ICLR 2023 oral).
- Download Taskonomy Dataset (tiny split) from the official github page https://github.com/StanfordVL/taskonomy/tree/master/data.
- You may download data of
depth_euclidean
,depth_zbuffer
,edge_occlusion
,keypoints2d
,keypoints3d
,normal
,principal_curvature
,reshading
,segment_semantic
, andrgb
. - (Optional) Resize the images and labels into (256, 256) resolution.
- To reduce the I/O bottleneck of dataloader, we stored data from all buildings in a single directory. The directory structure looks like:
<root>
|--<task1>
| |--<building1>_<file_name1>
| | ...
| |--<building2>_<file_name1>
| |...
|
|--<task2>
| |--<building1>_<file_name1>
| | ...
| |--<building2>_<file_name1>
| |...
|
|...
-
Create
data_paths.yaml
file and write the root directory path (<root>
in the above structure) bytaskonomy: PATH_TO_YOUR_TASKONOMY
. -
Install pre-requirements by
pip install -r requirements.txt
. -
Create
model/pretrained_checkpoints
directory and download BEiT pre-trained checkpoints to the directory.
- We used
beit_base_patch16_224_pt22k
checkpoint for our experiment.
python main.py --stage 0 --task_fold [0/1/2/3/4]
python main.py --stage 1 --task [segment_semantic/normal/depth_euclidean/depth_zbuffer/edge_texture/edge_occlusion/keypoints2d/keypoints3d/reshading/principal_curvature]
python main.py --stage 2 --task [segment_semantic/normal/depth_euclidean/depth_zbuffer/edge_texture/edge_occlusion/keypoints2d/keypoints3d/reshading/principal_curvature]
After the evaluation, you can print the test results by running python print_results.py
Our code refers the following repositores:
- Taskonomy
- timm
- BEiT: BERT Pre-Training of Image Transformers
- Vision Transformers for Dense Prediction
- Inverted Pyramid Multi-task Transformer for Dense Scene Understanding
- Hypercorrelation Squeeze for Few-Shot Segmentation
- Cost Aggregation with 4D Convolutional Swin Transformer for Few-Shot Segmentation
If you find this work useful, please consider citing:
@inproceedings{kim2023universal,
title={Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching},
author={Donggyun Kim and Jinwoo Kim and Seongwoong Cho and Chong Luo and Seunghoon Hong},
booktitle={International Conference on Learning Representations},
year={2023},
url={https://openreview.net/forum?id=88nT0j5jAn}
}
The development of this open-sourced code was supported in part by the National Research Foundation of Korea (NRF) (No. 2021R1A4A3032834).