Efficient and comprehensive pytorch implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis from Mildenhall et al. 2020.
Table of Content
This implementation has been tested on Ubuntu 20.04
with Python 3.8
, and torch 1.9
.
Install required package first pip3 install -r requirements.txt
.
You may use pyenv
or conda
to avoid confilcts with your environement.
Download the Blender Scenes Dataset.
Rename it and place it in the repo as data/blender
(ingored by default).
data/
└── blender
├── chair
├── drums
├── ficus
├── hotdog
├── lego
├── materials
├── mic
└── ship
Command Line
Action | Command |
---|---|
Train | python3 -m nerf.train |
Inference | python3 -m nerf.infer |
Distillation | python3 -m nerf.distill |
Benchmark | python3 -m nerf.bench |
Reproduction
Action | Command |
---|---|
Train | make train |
Distillation | make distill |
Hybrid | make hybrid |
Benchmark | make bench_all |
Manual
# ==== Imports
import nerf.infer # Enables inference features (NeRF.infer)
import nerf.train # Enables training features (NeRF.fit)
from nerf.core import BoundedVolumeRaymarcher as BVR, NeRF
from nerf.core import PositionalEncoding as PE
from nerf.core import NeRFScheduler
from nerf.data import BlenderDataset
DEVICE = "cuda:0"
# ==== Setup
dataset = BlenderDataset("./data/blender", scene="hotdog", split="train")
phi_x = PE(3, 6)
phi_d = PE(3, 6)
nerf = NeRF(phi_x, phi_d, width=256, depth=4).to(DEVICE)
raymarcher = BVR(tn=2., tf=6., samples_c=64, samples_f=64)
# ==== Train
history = nerf.fit(
nerf, # NeRF Module
raymarcher, # Raymarcher (BVR)
optim, # Optimizer (Adam, AdamW, ...)
scheduler, # NeRFScheduler
criterion, # Criterion (MSELoss, L1Loss, ...)
scaler, # GradScaler (torch.cuda.amp, can be disabled)
dataset: Dataset, # Dataset (BlenderDataset)
) # More options available (epochs, batch_size, ...)
# ==== Infer
frame = nerf.infer(
coarse, # coarse NeRF Module
fine, # fine NeRF Module
raymarcher, # Raymarcher (BVR)
ro, # Rays Origin (Tensor of size (B, 3))
rd, # Rays Direction (Tensor of size (B, 3))
W, # Frame Width
H, # Frame Height
) # More options available (epochs, batch_size, ...)
NeRF uses both advances in Computer Graphics and Deep Learning research.
The method allows encoding a 3D scene as a continuous volume described by density and color at any point in a given bounded volume. During raymarching, the rays query the volume representation model to obtain intersection data. It is trained in an end-to-end fashion and uses only the ground truth images as an objective signal. A first network, the coarse model, is trained using voxel grid sampling to increase sample efficiency. This first pass is used to trained a second network, the fine network, using importance sampling of the volume.
The networks are tied to one unique scene. Caching and acceleration structures can be used to decrease rendering time during inference. The same models can be used to generate a depth map and a 3D mesh of the scene.
Fourier Features In their original work, Midenhall et al. presented the use of positional encoding to allow the network to learn high-frequency functions which clasical multilayer perceptron without positiona encoding are not able to and focus only on low-frequency reconstruction.
v = xy | xyz # normalized to [-1; 1]
rgb = lambda v: mlp(v) # wo/ pe-encoding
rgb = lambda v: mlp(phi(v)) # w/ pe-encoding
phi = lambda v: [
cos(2 ** 0 * PI * v),
sin(2 ** 0 * PI * v),
cos(2 ** 1 * PI * v),
sin(2 ** 1 * PI * v),
...
].T
Fourier Features In Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains, Tancik et al 2020, NeRF authors have shown that encoding positions using fourier feature mapping enables multilayer perceptron to learn high-frequency functions in low dimensional problem domains.
v = xy | xyz # normalized to [-1; 1]
rgb = lambda v: mlp(v) # wo/ ff-encoding
rgb = lambda v: mlp(phi(v)) # w/ ff-encoding
phi = lambda v: [
a_0 * cos(2 * PI * b_0.T * v),
a_0 * sin(2 * PI * b_0.T * v),
a_1 * cos(2 * PI * b_1.T * v),
a_1 * sin(2 * PI * b_1.T * v),
...
].T
The scene is encoded by feating a simple multilayer perceptron architecture on density sigma
and color RGB
given position x
and direction d
queries.
Original Architecture
n = 4
ReLU ReLU
phi(x) --> 256 --> 256 --> ReLU(sigma)
60 | n ^ n |
| | | ReLU
-- cat -- --> 256 --> 128 --> Sigmoid(RGB)
^
|
cat
|
phi(d)
24
Volume raymarching is used to produce the final rendering.
Each ray is thrown from the camera origin to each pixel and sampled N_c
times for the coarse model and N_f
times for the fine model between a given bounded volume delimited by the near t_n
and far t_f
camera frustum parameters.
Rendering Equation
N_c, N_f = 64, 128
alpha_i = (1 - exp(-sigma_i * delta_i))
T_i = cumprod(1 - alpha_i)
w_i = T_i * alpha_i
C_c = sum(w_i * c_i)
In this equation, w_i
respresents a piecewise-constant PDF along the ray, T_i
the amount of light blocked before reaching segment t_i
, delta_i
the segment length dist(t_i-1, t_i)
, and c_i
the color of the ray intersection at t_i
.
The weights w_i
are reused for inverse transform sampling for the fine pass.
A total of N_c + N_f
is finally used to generate the last render, this time querying the coarse model instead.
Details
Feature | Reference |
---|---|
Fourier Featrure Encoding | |
Positional Encoding | |
Neural Radiance Field Model | |
Bounded Volume Raymarcher | |
Noise for Continuous Representation | |
Camera Paths (Turnaround, ...) | |
Interactive Notebook | |
Reptile Meta-Learning | Tanick et al., Nichol et al. |
Shifted Softplus for Sigma | Barron et al. |
Widened Sigmoid for RGB | Barron et al. |
Fine Network (Differs from Original, No second Network) | |
Training Opitmizations | Nvidia's PyTorch Performance Tuning Guide |
Safe Sofplus, Sigmoid | Blog Article by Jia Fu Low |
Gradient Clipping | |
NeRF/JAX-NeRF Warmup Decay Leanring Rate Scheduler | Barron et al. |
Log Decay Leanring Rate Scheduler |
Results
Scene | Ground Truth | NeRF RGB Map | NeRF Depth Map |
---|---|---|---|
Chair | |||
Lego | |||
HotDog | |||
Drums | |||
Mic | |||
Materials | |||
Ficus | |||
Ship |
Using 64 Coarse Samples, 64 Fine Samples at 400x400 Resolution
Coarse | Fine | Seconds | FPS | RGB Map | Depth Map |
---|---|---|---|---|---|
NeRF | NeRF | 1.91 | 0.52 | ||
DistillNeRF | NeRF | 1.37 | 0.73 | ||
DistillNeRF | DistillNeRF | 0.30 | 3.36 |
Original Work
@inproceedings{mildenhall2020nerf,
title={NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis},
author={Ben Mildenhall and Pratul P. Srinivasan and Matthew Tancik and Jonathan T. Barron and Ravi Ramamoorthi and Ren Ng},
year={2020},
booktitle={ECCV},
}