This repository contains code used in our ACL'20 paper History for Visual Dialog: Do we really need it?
This repository is build upon visdial-challenge-starter-pytorch. Previous commit history is maintained. We thank the challenge organizers for providing the starter code.
Plese see original_README.md
or point to the original repo to setup the conda environment and download the relevant data.
Alternatively, we provide setup.sh to streamline the process. Run as
cd setup_visdial
bash setup.sh
bash setup_glove.sh
We follow the directory structure
$PROJECT_DIR
|--$DATA_DIR==data
|--$MODEL_DIR==models
|--$CODE_DIR==visdial_conv
|--$CONFIG_DIR==configs
We used Python3 for our experiments and PyTorch 1.0.0/1.0.1post2 for our experiments. We oftenly use f-strings
and typing
in our code. Some basic familiarity is required.
Installation using docker can be found here.
Update: v0.1 of the code has been released. We suggest to use PyCharm for this project. See this blog to get more details.
We provide shell scripts to run our models. To reproduce the results for different models, follow these scripts:
- MCA-I
- MCA-I-H
- MCA-VGH-I
- MCA-I-HGuidedQ
- MCA-I-H-GT ( The Python script in the same new_annotations folder shows how we fixed dense gt annotations.)
Example:
# Run as:
cd shell_scripts
bash train_and_evaluate_mcan_img_only.sh
We follow the same directory structure as described above in all the shell scripts.
Some jupyter notebooks for inspection of data/images/analyses/results can be found in notebooks.
Run conda install jupyter
if you are using conda and want to run these notebooks from the environment. More data analysis is provided in data_analysis folder.
We have also provided some test cases in the tests folder. We strongly suggest to add to this folder and test your new python scripts if you build on top of this repository.
Our code follows this structure:
- train.py -- entrypoint for training. Called by all shell scripts
- evaluate.py -- python script for evaluation. Called by all shell scripts
- data -- dataset reader and vocabulary defined here
- encoders -- all encoders defined here
- decoders -- all decoders to be defined here
- model.py -- wrapper to call models with different encoders and decoders
- metrics.py -- define NDCG and other metrics
- configs -- all configs defined here
- shell_scripts -- all shell scripts here
Be careful about different indexing in the data. See notes.txt
We have released two subsets of Visdial val set (mentioned in our paper) in the folder released_datasets:
- VisdialConv - Instances which require dialog history verified by crowdsourced human annotations
- Vispro - Intersection of Vispro and Visdial val set
To evaluate on these subsets, use the shell scripts provided in evaluate_subset_data.
We used the scripts in subset_dialog_data to create these subsets from VisdialVal set.
If you are interested in our AMT interface, please refer to the repository.
See the README in the visdialconv folder to know more about the annotations.
If you use this work, please cite it as
@inproceedings{agarwal2020history,
title={History for Visual Dialog: Do we really need it?},
author={Agarwal, Shubham and Bui, Trung and Lee, Joon-Young and Konstas, Ioannis and Rieser, Verena},
booktitle={58th Annual meeting of the Association for Computational Linguistics (ACL)},
year={2020}
}
Feel free to fork and contribute to this work. Please raise a PR or any related issues. Will be happy to help. Thanks.
Badges made using shields.io