VizWiz VQA course project Multi Modal Machine Learning

Running Instructions

Download data:

Download skill data:

cd data/skill
bash download_data.sh

Download VQA data:

cd data/VQA
bash download_data.sh

Run model (SkillCLIP) variants:

With everything:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_aware_clip

Without skill embeddings:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_unaware_clip

Without object tags:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_aware_clip_nobj -nobj

Without scene text:

python -m src.main_model.clip_late_fusion -t -de "cuda:0" -exp skill_aware_clip_nsctxt -nsctxt

With multi-task training:

python -m src.main_model.clip_multitasking.py -t -de "cuda:0" -exp skill_aware_clip_multitasking -pred_file pred.json

Interesting object detections

Keys of a keyboard are detected as microwaves with relatively high confidence scores:

path: val_objects_detected/VizWiz_val_00001474_objects.png
Potential reasons: the image is very zoomed in which might be abnormal.

Illustrative Examples:

Here are some illustrative examples from our error analysis: FusionCLIP refers to the SkillCLIP model without the skill embeddings. Comparison between our model (SkillCLIP) and FusionCLIP. Some more examples:

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
data		data
examples		examples
experiments		experiments
plots		plots
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
ans_bertscore_plot.png		ans_bertscore_plot.png
clip_base_vs_fusion_error_anal.csv		clip_base_vs_fusion_error_anal.csv
clip_fusion_vs_skill_error_anal.csv		clip_fusion_vs_skill_error_anal.csv
requirements.txt		requirements.txt
result_deberta.json		result_deberta.json
result_t5.json		result_t5.json
skills_overlap_venn_diagram.png		skills_overlap_venn_diagram.png
two_dogs_in_snow.jpg		two_dogs_in_snow.jpg
verbose_result_class.json		verbose_result_class.json
verbose_result_t5.json		verbose_result_t5.json
vilt_skill_clf_error_anal.csv		vilt_skill_clf_error_anal.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VizWiz VQA course project Multi Modal Machine Learning

Running Instructions

Interesting object detections

Illustrative Examples:

About

Releases

Packages

Contributors 3

Languages

atharva-naik/MMML-TermProject-VizWiz-VQA-Challenge

Folders and files

Latest commit

History

Repository files navigation

VizWiz VQA course project Multi Modal Machine Learning

Running Instructions

Interesting object detections

Illustrative Examples:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages