3D Visual Grounding with Transformers

Introduction

3D visual grounding is the task of localizing a target object in a 3D scene given a natural language description. This work focuses on developing a transformer architecture for bounding box prediction around a target object that is described by a natural language description.

For additional details, please see our paper:
"3D Visual Grounding with Transformers"
by Stefan Frisch and Florian Stilz from the Technical University of Munich.

Setup + Dataset

For the setup and dataset preparation please check the ScanRefer github page.

Architecture

In our architecture we replaced VoteNet by 3DETR-m and added a vanilla transformer encoder to the fusion module.

Results

To reproduce our results we provide the following commands along with the results. The pretrained models are in the outputs folder. We have implemented a chunking mechanism which significantly reduced the training time compared to the normal ScanRefer. The training of the baseline model takes around 4 hours on a current GPU (NVIDIA Tesla T4).

Name	Command	Overall		Comments
Name	Command	Acc@0.25IoU	Acc@0.5IoU	Comments
ScanRefer (Baseline)	python scripts/train.py --use_color --lr 1e-3 --batch_size 14	37.05	23.93	xyz + color + height
ScanRefer with pretrained VoteNet (optimized Baseline)	python scripts/train.py --use_color --use_chunking --use_pretrained "pretrained_VoteNet" --lr 1e-3 --batch_size 14	37.11	25.21	xyz + color + height
Ours (pretrained 3DETR-m + GRU + vTransformer)	python scripts/train.py --use_color --use_chunking --detection_module 3detr --match_module transformer --use_pretrained "pretrained_3DETR" --no_detection	37.08	26.56	xyz + color + height

Name		Name	Last commit message	Last commit date
Latest commit History 136 Commits
DETR		DETR
benchmark		benchmark
data/scannet		data/scannet
lib		lib
models		models
outputs		outputs
paper & figures		paper & figures
scripts		scripts
utils		utils
word_drop		word_drop
.gitignore		.gitignore
README.md		README.md
download-scannet.py		download-scannet.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Visual Grounding with Transformers

Introduction

Setup + Dataset

Architecture

Results

About

Releases

Packages

Contributors 2

Languages

flo-stilz/3D-Visual-Grounding-with-Transformers

Folders and files

Latest commit

History

Repository files navigation

3D Visual Grounding with Transformers

Introduction

Setup + Dataset

Architecture

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages