This repository contains the implementation of the ICCV 2023 paper: End2End Multi-View Feature Matching with Differentiable Pose Optimization.
Arxiv | Video | Project Page
The multi-view matching model is implemented in this fork, which is included as a submodules, therefore please use --recursive
to clone the repository:
git clone https://github.com/barbararoessle/e2e_multi_view_matching --recursive
Required python packages are listed in requirements.txt
.
Extract the ScanNet dataset, e.g., using SenseReader, and place the files scannetv2_test.txt
, scannetv2_train.txt
, scannetv2_val.txt
from ScanNet Benchmark and the preprocessed image overlap (range [0.4, 0.8]) into the same directory <data_dir>/scannet
.
As a result, we have:
<data_dir>/scannet>
└───scans
| └───scene0000_00
| | | color
| | | depth
| | | intrinsic
| | | pose
| ...
└───scans_test
| └───scene0707_00
| | | color
| | | depth
| | | intrinsic
| | | pose
| ...
└───overlap
| └───scans
| | | scene0000_00.json
| | | ...
| └───scans_test
| | | scene0707_00.json
| | | ...
| scannetv2_train.txt
| scannetv2_val.txt
| scannetv2_test.txt
We follow the preprocessing done by LoFTR: The depth maps are used from original MegaDepth dataset, download and extract MegaDepth_v1
to <data_dir>/megadepth>
. The undistorted images and camera parameters follow the preprocessing of D2-Net, download and extract to <data_dir>/megadepth>
. As a result, we have:
<data_dir>/megadepth>
| MegaDepth_v1
| Undistorted_SfM
| scene_info
| megadepth_train.txt
| megadepth_val.txt
| megadepth_test.txt
| megadepth_valid_list.json
Pretrained models are available here.
Download test pair descriptions scannet_test_1500
and megadepth_test_1500_scene_info
from LoFTR into assets/
.
The option eval_mode
specifies the relative pose estimation method, e.g., weighted eight-point with bundle adjustment (w8pt_ba
) or RANSAC (ransac
).
python3 eval_pairs.py --eval_mode w8pt_ba --dataset scannet --exp_name two_view_scannet --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>
python3 eval_pairs.py --eval_mode w8pt_ba --dataset megadepth --exp_name two_view_megadepth --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>
To run multi-view evaluation, the bundle adjustment using Ceres solver needs to be build.
cd pose_optimization/multi_view/bundle_adjustment
mkdir build
cd build
cmake ..
make -j
It has the following dependencies:
- Ceres Solver, http://ceres-solver.org/installation.html (tested with version 2.0.0)
- Theia Vision Library, http://theia-sfm.org/building.html (tested with version 0.7.0)
- Eigen https://eigen.tuxfamily.org/dox/GettingStarted.html (tested with version 3.3.7)
- Boost https://www.boost.org/ (tested with version 1.71.0)
- GoogleTest https://github.com/google/googletest (tested with version 1.10.0)
python3 eval_multi_view.py --dataset scannet --exp_name multi_view_scannet --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>
To simplify internal processing, we convert the MegaDepth data to the same data format as ScanNet. It will be written to <path to datasets>/megadepth_640
:
python3 convert_megadepth_to_scannet_format.py --dataset_dir <path to datasets>/megadepth --image_size 640
python3 eval_multi_view.py --dataset megadepth_640 --exp_name multi_view_megadepth --data_dir <path to datasets> --checkpoint_dir <path to pretrained models>
Training stage 1 trains without pose loss, stage 2 with pose loss. Checkpoints are written into a subdirectory of the provided checkpoint directory. The subdirectory is named by the training start time of stage 1 or 2 in the format jjjjmmdd_hhmmss
, which is the experiment name. The experiment name can be specified to resume a training or it is used to initialize stage 2 or to run evaluation.
Stage 1
python3 -u -m torch.distributed.launch --nproc_per_node=2 --rdzv_endpoint=127.0.0.1:29109 train.py --tuple_size 2 --dataset scannet --batch_size 32 --n_workers 12 --data_dir <path to datasets> --checkpoint_dir <path to write checkpoints>
Training stage 1 is trained until the validation matching loss is converged.
Stage 2
Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore the following options are added to stage 1 command:
--init_exp_name <experiment name from stage 1> --pose_loss
Training stage 2 is trained until the validation rotation and translation losses are converged.
To simplify internal processing, we convert the MegaDepth data to the same data format as ScanNet. Note that for image pairs image_size=720
is used (following SuperGlue), whereas for multi-view image_size=640
is used for computational reasons (following LoFTR). It will be written to <path to datasets>/megadepth_720
:
python3 convert_megadepth_to_scannet_format.py --dataset_dir <path to datasets>/megadepth --image_size 720
Stage 1
Training is initialized with the provided pretrained weights of stage 1 on ScanNet.
python3 -u -m torch.distributed.launch --nproc_per_node=1 --rdzv_endpoint=127.0.0.1:29110 train.py --tuple_size 2 --dataset megadepth_720 --batch_size 16 --n_workers 6 --data_dir <path to datasets> --checkpoint_dir <path to write checkpoints> --init_exp_name pretrained_on_scannet_two_view_stage_1
Training stage 1 is trained until the validation matching loss is converged.
Stage 2
Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore option pose_loss
is added and init_exp_name
needs to be adjusted as follows:
--init_exp_name <experiment name from stage 1> --pose_loss
Stage 1
python3 -u -m torch.distributed.launch --nproc_per_node=3 --rdzv_endpoint=127.0.0.1:29111 train.py --tuple_size 5 --dataset scannet --batch_size 8 --n_workers 5 --data_dir <path to datasets> --checkpoint_dir <path to write checkpoints>
Training stage 1 is trained until the validation matching loss is converged.
Stage 2
Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore the following options are added to stage 1 command:
--init_exp_name <experiment name from stage 1> --pose_loss
To simplify internal processing, we convert the MegaDepth data to the same data format as ScanNet. Note that for image pairs image_size=720
is used (following SuperGlue), whereas for multi-view image_size=640
is used for computational reasons (following LoFTR). It will be written to <path to datasets>/megadepth_640
:
python3 convert_megadepth_to_scannet_format.py --dataset_dir <path to datasets>/megadepth --image_size 640
Stage 1
Training is initialized with the provided pretrained weights of stage 1 on ScanNet.
python3 -u -m torch.distributed.launch --nproc_per_node=2 --rdzv_endpoint=127.0.0.1:29112 train.py --tuple_size 5 --dataset megadepth_640 --batch_size 2 --n_workers 4 --data_dir <path to datasets> --checkpoint_dir <path to write checkpoints> --init_exp_name pretrained_on_scannet_multi_view_stage_1
Training stage 1 is trained until the validation matching loss is converged.
Stage 2
Training stage 2 trains with pose loss and loads the checkpoint from stage 1, therefore option pose_loss
is added and init_exp_name
needs to be adjusted as follows:
--init_exp_name <experiment name from stage 1> --pose_loss
If you find this repository useful, please cite:
@inproceedings{roessle2023e2emultiviewmatching,
title={End2End Multi-View Feature Matching with Differentiable Pose Optimization},
author={Barbara Roessle and Matthias Nie{\ss}ner},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month={October},
year={2023}
}
We thank SuperGluePretrainedNetwork, kornia, ceres-solver and NeuralRecon, from which this repository borrows code.