RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

This repository contains the source code for our paper:

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching
3DV 2021, Best Student Paper Award
Lahav Lipson, Zachary Teed and Jia Deng

@inproceedings{lipson2021raft,
  title={RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching},
  author={Lipson, Lahav and Teed, Zachary and Deng, Jia},
  booktitle={International Conference on 3D Vision (3DV)},
  year={2021}
}

Requirements

The code has been tested with PyTorch 1.7 and Cuda 10.2.

conda env create -f environment.yaml
conda activate raftstereo

Required Data

To evaluate/train RAFT-stereo, you will need to download the required datasets.

Sceneflow (Includes FlyingThings3D, Driving & Monkaa
Middlebury
ETH3D
KITTI

To download the ETH3D and Middlebury test datasets for the demos, run

chmod ug+x download_datasets.sh && ./download_datasets.sh

By default stereo_datasets.py will search for the datasets in these locations. You can create symbolic links to wherever the datasets were downloaded in the datasets folder

├── datasets
    ├── FlyingThings3D
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Monkaa
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── Driving
        ├── frames_cleanpass
        ├── frames_finalpass
        ├── disparity
    ├── KITTI
        ├── testing
        ├── training
        ├── devkit
    ├── Middlebury
        ├── MiddEval3
    ├── ETH3D
        ├── two_view_testing

Demos

Pretrained models can be downloaded by running

chmod ug+x download_models.sh && ./download_models.sh

or downloaded from google drive. We recommend our Middlebury model for in-the-wild images.

You can demo a trained model on pairs of images. To predict stereo for Middlebury, run

python demo.py --restore_ckpt models/raftstereo-middlebury.pth --corr_implementation alt --mixed_precision -l=datasets/Middlebury/MiddEval3/testF/*/im0.png -r=datasets/Middlebury/MiddEval3/testF/*/im1.png

Or for ETH3D:

python demo.py --restore_ckpt models/raftstereo-eth3d.pth -l=datasets/ETH3D/two_view_testing/*/im0.png -r=datasets/ETH3D/two_view_testing/*/im1.png

Our fastest model (uses the faster implementation):

python demo.py --restore_ckpt models/raftstereo-realtime.pth --shared_backbone --n_downsample 3 --n_gru_layers 2 --slow_fast_gru --valid_iters 7 --corr_implementation reg_cuda --mixed_precision

To save the disparity values as .npy files, run any of the demos with the --save_numpy flag.

Converting Disparity to Depth

If the camera intrinsics and camera baseline are known, disparity predictions can be converted to depth values using

Note that the units of the focal length are pixels not millimeters. (cx1-cx0) is the x-difference of principal points.

Evaluation

To evaluate a trained model on a validation set (e.g. Middlebury), run

python evaluate_stereo.py --restore_ckpt models/raftstereo-middlebury.pth --dataset middlebury_H

Training

Our model is trained on two RTX-6000 GPUs using the following command. Training logs will be written to runs/ which can be visualized using tensorboard.

python train_stereo.py --batch_size 8 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2 --num_steps 200000 --mixed_precision

To train using significantly less memory, change --n_downsample 2 to --n_downsample 3. This will slightly reduce accuracy.

To finetune the sceneflow model on the 23 scenes from the Middlebury 2014 stereo dataset, download the data using

chmod ug+x download_middlebury_2014.sh && ./download_middlebury_2014.sh

and run

python train_stereo.py --train_datasets middlebury_2014 --num_steps 4000 --image_size 384 1000 --restore_ckpt models/raftstereo-sceneflow.pth --batch_size 2 --train_iters 22 --valid_iters 32 --spatial_scale -0.2 0.4 --saturation_range 0 1.4 --n_downsample 2  --mixed_precision

(Optional) Faster Implementation

We provide a faster CUDA implementation of the correlation sampler which works with mixed precision feature maps.

cd sampler && python setup.py install && cd ..

Running demo.py, train_stereo.py or evaluate.py with --corr_implementation reg_cuda together with --mixed_precision will speed up the model without impacting performance.

To significantly decrease memory consumption on high resolution images, use --corr_implementation alt. This implementation is slower than the default, however.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
core		core
sampler		sampler
.gitignore		.gitignore
LICENSE		LICENSE
RAFTStereo.png		RAFTStereo.png
README.md		README.md
demo.py		demo.py
depth_eq.png		depth_eq.png
download_datasets.sh		download_datasets.sh
download_middlebury_2014.sh		download_middlebury_2014.sh
download_models.sh		download_models.sh
environment.yaml		environment.yaml
evaluate_stereo.py		evaluate_stereo.py
train-abl.sh		train-abl.sh
train.sh		train.sh
train_stereo.py		train_stereo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Requirements

Required Data

Demos

Converting Disparity to Depth

Evaluation

Training

(Optional) Faster Implementation

About

Releases

Packages

Languages

License

pijuszczyk/RAFT-Stereo

Folders and files

Latest commit

History

Repository files navigation

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

Requirements

Required Data

Demos

Converting Disparity to Depth

Evaluation

Training

(Optional) Faster Implementation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages