Skip to content

[MICCAI 2024] This is the official repository for our state-of-the-art approach to monocular depth in surgical vision as presented in our paper.

License

Notifications You must be signed in to change notification settings

charliebudd/transferring-relative-monocular-depth-to-surgical-vision

Repository files navigation

Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency (MICCAI 2024)

Example monocular depth inference

This is the official repository for our state-of-the-art approach to monocular depth in surgical vision as presented in our paper...

    Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency
    Charlie Budd, Tom Vercauteren.
    [ MICCAI, arXiv ]

Using Our Models

First, install our package...

pip install git+https://github.com/charliebudd/transferring-relative-monocular-depth-to-surgical-vision

Then download one of our models weights from the release tab in this repo. We would recommend our best performer, depthanything-sup-temp.pt. The model may then be used as follows...

import torch
from torchvision.io import read_image
from torchvision.transforms.functional import resize
import matplotlib.pyplot as plt

from trmdsv import load_model

model, resize_for_model, normalise_for_model = load_model("depthanything", "weights/path.pt", "cuda")
model.eval()

image = read_image("surgical_image.png").cuda() / 255.0
original_size = image.shape[-2:]
image_for_model = normalise_for_model(resize_for_model(image.unsqueeze(0)))

with torch.no_grad():
    depth = model(image_for_model)

depth = resize(depth, original_size)

plt.subplot(121).axis("off")
plt.imshow(image.cpu().permute(1, 2, 0))
plt.subplot(122).axis("off")
plt.imshow(depth.cpu().permute(1, 2, 0))
plt.show()

Recreating Our Results

To recreate our results first clone this repository and install all the requirements...

git clone https://github.com/charliebudd/transferring-relative-monocular-depth-to-surgical-vision
pip install -r requirements.txt

As the meta-dataset created for this project uses data with a mix of licenses, we are not able to redistribute our dataset. To recreate the dataset, you will first need to download all the input datasets. These datasets have different procedures to access, the starting point for each can be found with the following links...

Self-supervision data...
Normal Supervision data...
Evaluation data...
bash download_hamlyn.sh

Download each dataset and place them in a shared directory. It should look like this...

Datasets
|
|____Cholec80
| |____videos
|
|____EndoVis2017
| |____test
| |____train
|
|____EndoVis2018
| |____test
| |____train
|
|____Hamlyn
| |____dataset4
| ⋮
|
|____KidneyBoundary
| |____kidney_dataset_1
| ⋮
|
|____ROBUST-MIS
| |____Raw data
|
|____SCARED
| |____dataset_1
| ⋮
|
|____SERV-CT
| |____Experiment_1
| |____Experiment_2
|
|____StereoMIS
| |____P1
| ⋮

Now run the data processing script provided to generate our meta-dataset

python data_preprocess.py --input-directory Datasets

You should now be ready to run the training script...

python train.py --experiment-name my-trained-model --model depthanything --train-mode sup temp

This will finetune depthanything using a combination of normal supervision (sup) and temporal consistency self-supervision (temp). You can then compare your finetuned model against the original depthanything model using the evaluation script...

python evaluate.py --model depthanything
python evaluate.py --model depthanything --weights outputs/my-trained-model/best_weights_validation.pt

About

[MICCAI 2024] This is the official repository for our state-of-the-art approach to monocular depth in surgical vision as presented in our paper.

Resources

License

Stars

Watchers

Forks