Prompting Attributes for FGVC

Framework

Prerequisites

pytorch
torchvision
timm
yacs
regex
ftfy
tqdm

Dataset

Download

Caltech-UCSD Birds-200-2011 (CUB-200-2011)

Alternative

CUB-200-2011 | Kaggle

Train

Run Locally

torchrun --nproc_per_node=2 train.py -n "test1" -c configs/cub.yml MODEL.PRETRAIN_FILE 'ViT-B-16.pt' MODEL.PRETRAIN_PATH './pretrained'

Run on Virtaicloud

torchrun --nproc_per_node=2 $GEMINI_RUN/Prompt/train.py \
-n "tokenflow" -i "First try"   \
-c $GEMINI_RUN/Prompt/configs/cub.yml   \
OUTPUT_DIR $GEMINI_DATA_OUT DATA.DATASET.ROOT_DIR $GEMINI_DATA_IN1  \
MODEL.PRETRAIN_PATH $GEMINI_PRETRAIN MODEL.PRETRAIN_FILE 'ViT-B-16.pt'

Dev

torchrun --nproc_per_node=2 $GEMINI_RUN/Prompt/train.py \
-n "test1_2" -i "Check stage 1"   \
-c $GEMINI_RUN/Prompt/configs/cub.yml   \
OUTPUT_DIR $GEMINI_DATA_OUT DATA.DATASET.ROOT_DIR $GEMINI_DATA_IN1  \
MODEL.PRETRAIN_PATH $GEMINI_PRETRAIN \
TRAIN.STAGE1.MAX_EPOCHS 5 TRAIN.STAGE2.MAX_EPOCHS 100

Stage TWO

torchrun --nproc_per_node=2 $GEMINI_RUN/Prompt/train_stage_2.py \
-n "s2" -i "Tuning stage 2"   \
-c $GEMINI_RUN/Prompt/configs/cub.yml   \
OUTPUT_DIR $GEMINI_DATA_OUT DATA.DATASET.ROOT_DIR $GEMINI_DATA_IN1  \
MODEL.PRETRAIN_PATH $GEMINI_PRETRAIN/model

Dev

torchrun --nproc_per_node=2 $GEMINI_RUN/Prompt/train_stage_2.py \
-n "s2" -i "Tuning lr for stage 2"   \
-c $GEMINI_RUN/Prompt/configs/cub.yml   \
OUTPUT_DIR $GEMINI_DATA_OUT DATA.DATASET.ROOT_DIR $GEMINI_DATA_IN1  \
MODEL.PRETRAIN_PATH $GEMINI_DATA_OUT

Baseline

Visual Only

torchrun --nproc_per_node=2 $GEMINI_RUN/Prompt/train_visual.py \
-n "visual" -i "Basic global prompt"   \
-c $GEMINI_RUN/Prompt/configs/cub.yml   \
OUTPUT_DIR $GEMINI_DATA_OUT DATA.DATASET.ROOT_DIR $GEMINI_DATA_IN1  \
MODEL.PRETRAIN_PATH $GEMINI_PRETRAIN MODEL.PRETRAIN_FILE 'ViT-B-16.pt'

Contrastive Learning

torchrun --nproc_per_node=2 $GEMINI_RUN/Prompt/train_baseline.py \
-n "base" -i "Basic global prompt"   \
-c $GEMINI_RUN/Prompt/configs/cub.yml   \
OUTPUT_DIR $GEMINI_DATA_OUT DATA.DATASET.ROOT_DIR $GEMINI_DATA_IN1  \
MODEL.PRETRAIN_PATH $GEMINI_PRETRAIN MODEL.PRETRAIN_FILE 'ViT-B-16.pt'

To Tune

1. Hyper-Params for Prompting

Dropout rate in text description: DATA.DATASET.DROP_RATE
Temperature in TokenFlow: MODEL.LAMB

2. Classifier

2. 1. How to utilise features from all tokens?

Global Tokens Only

element-wise multiplication
sum

Blending Patches & Words

TODO

2.2. Classifier Structure

Hidden dim: MODEL.HIDDEN_DIM
Module

2.3. Ablation Study

Visual Only -> Effect of Stage One

3. How to align the dimensions of image and text encoders?

Currently, we simply adopt the projection weights that map the global features in original CLIP. Would it be more efficient if we build another learnable mapping matrix?

Acknowledgement

Codebase from CLIP, Swin-Transformer

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
config		config
configs		configs
data		data
figs		figs
loss		loss
model		model
solver		solver
utils		utils
.gitignore		.gitignore
README.md		README.md
train.py		train.py
train_baseline.py		train_baseline.py
train_stage_2.py		train_stage_2.py
train_visual.py		train_visual.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prompting Attributes for FGVC

Framework

Prerequisites

Dataset

Download

Train

Run Locally

Run on Virtaicloud

Stage TWO

Baseline

To Tune

1. Hyper-Params for Prompting

2. Classifier

2. 1. How to utilise features from all tokens?

2.2. Classifier Structure

2.3. Ablation Study

3. How to align the dimensions of image and text encoders?

Acknowledgement

About

Languages

e-wxy/FGVC-Prompt

Folders and files

Latest commit

History

Repository files navigation

Prompting Attributes for FGVC

Framework

Prerequisites

Dataset

Download

Train

Run Locally

Run on Virtaicloud

Stage TWO

Baseline

To Tune

1. Hyper-Params for Prompting

2. Classifier

2. 1. How to utilise features from all tokens?

2.2. Classifier Structure

2.3. Ablation Study

3. How to align the dimensions of image and text encoders?

Acknowledgement

About

Resources

Stars

Watchers

Forks

Languages