This is the dataset and the training code with Tensorflow used in the paper:
Jong-Chyi Su*, Chenyun Wu*, Huaizu Jiang, Subhransu Maji, "Reasoning about Fine-grained Attribute Phrases using Reference Games", International Conference on Computer Vision (ICCV), 2017
@inproceedings{su2017reasoning,
Author = {Jong-Chyi Su and Chenyun Wu and Huaizu Jiang and Subhransu Maji},
Title = {Reasoning about Fine-grained Attribute Phrases using Reference Games},
Booktitle = {International Conference on Computer Vision (ICCV)},
Year = {2017}
}
Each pair has 1 pair of images and 5 pairs of corresponding attribute phrases
- Training set: 4700 pairs
- Val set: 2350 pairs
- Test set: 2350 pairs
- Python 2.7
- Tensorflow v1.0+
- User descriptions are included in
dataset/visdiff\_SET.json
, where SET={train, val, test, trainval} - Download images from OID dataset (http://www.robots.ox.ac.uk/~vgg/data/oid)
- Move images from
oid-aircraft-beta-1/data/images/aeroplane/\*.jpg
to the folderdataset/images/\*.jpg
Add pretrained model (e.g. vgg_16.ckpt) in models/checkpoints/
Go to utils/
and run:
python get_feature.py --dataset train
the numpy file will be saved in img_feat/vgg_16/train.npy
Step 1 fix image feature Step 2 finetune image feature
python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 0 --max_steps 2000 --batch_size 128
python train_listener.py --mode train --log_dir result/SL --pairwise 0 --train_img_model 1 --max_steps 7500 --load_model_path model-fixed-2000 --learn_rate 0.00001
python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 0 --max_steps 5000 --batch_size 128
python train_listener.py --mode train --log_dir result/SLr --pairwise 0 --ran_neg_sample 1 --train_img_model 1 --max_steps 10000 --load_model_path model-fixed-5000 --learn_rate 0.00001
python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 0 --max_steps 2000 --max_sent_length 17 --batch_size 128
python train_listener.py --mode train --log_dir result/DL --pairwise 1 --train_img_model 1 --max_steps 7000 --load_model_path model-fixed-2000 --max_sent_length 17 --learn_rate 0.00001
python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
python train_listener.py --mode eval --log_dir result/SL --pairwise 0 --train_img_model 1 --load_model_path model-finetune-7500 --dataset val
python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 0 --load_model_path model-fixed-5000 --dataset val
python train_listener.py --mode eval --log_dir result/SLr --pairwise 0 --train_img_model 1 --load_model_path model-finetune-10000 --dataset val
python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 0 --load_model_path model-fixed-2000 --dataset val
python train_listener.py --mode eval --log_dir result/DL --pairwise 1 --train_img_model 1 --load_model_path model-finetune-7000 --dataset val
- Example:
python train_speaker.py --speaker_mode=S --img_model=vgg_16 --train_img_model=1 --experiment_path=result/speaker/temp
- Options:
- --speaker_mode: S or DS
- --img_model: alexnet, inception_v3, or vgg_16
- --train_img_model: Fine-tune image model or not (0 as False, 1 as True)
- --experiment_path: where to output and save the trained model
- --load_model_dir: path to the pre-trained model. If not set, train from scratch
- --load_model_name: model name (model-%steps) in load_model_dir
- See more options in train_speaker.py
- Example:
python inference_pairwise.py --input_path=result/speaker/temp --model_step=model-5000 --dataset_name=val
- Options:
- --input_path: path to the trained speaker model that you want to use
- --model_step: model name (model-%steps) in input_path
- --dataset_name: which sub-dataset to use (train / val / test)
- See more options in inference_pairwise.py
Here we use the listener model to re-rank attribute phrases generated by speaker model. To run this step, you need to have a listenter model, and generated phrases from a speaker model.
- Example:
pyhton rerank.py --listener_path=result/SL --listener_model=model-fixed-2000 --speaker_result_path=result/speaker/temp/infer_annotations_val_model-5000_case0_beam10_sent10.json --infer_dataset=val
- Options:
- --listener_path: path to the listener model used for reranking
- --listener_model: model name (model-%steps) in listener_model
- --speaker_result_path: the file that saves the phrases generated by a speaker model
- --infer_dataset: which dataset to work on (train / val / test)
- See more options in rerank.py
- In "inference_setwise.py", set "speaker_path" as the path to the trained speaker model you want to use
- run
python inference_setwise.py
Please contact jcsu@cs.umass.edu
if you have any question.
- Jong-Chyi Su (Umass-Amherst)
- Chenyun Wu (Umass-Amherst)
- Huaizu Jiang (Umass-Amherst)