Awesome Video Prediction

A curated list of awesome video prediction papers with brief summary.

Blogs

...

Surveys

★ A Review on Deep Learning Techniques for Video Prediction | TPAMI 2020
Deep Learning for Vision-based Prediction: A Survey | Arxiv 2020

Papers

Baseline Video Language Modeling (BVLM) | Video (language) modeling: a baseline for generative models of natural videos | Arxiv 2014 FAIR NYU
- first video prediction | patch-level language model, CNN+RNN | no inductive bias, raw pixels
LSTM Encoder-Decoder (LSTM-ED) | Unsupervised Learning of Video Representations using LSTMs | ICML 2015
- unsupervised learning representation | LSTM encoder into representation and LSTM decoder to reconstruct, FC-LSTM | no inductive bias, raw pixels
★ Convolutional LSTM (ConvLSTM) | Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting | NeurIPS 2015 HKUST
- model well spatial correlations | just modified to convLSTM as LSTM-ED, convLSTM | no inductive bias, raw pixels
Predictive Generative Network (PGN) | Unsupervised learning of visual structure using predictive generative networks | Arxiv 2015 Harvard
- unsupervised learning representation | CNN-LSTM-deCNN and mse+adversarial loss, CNN+LSTM+GAN | no inductive bias, raw pixels
Predictive Coding Network (PredNet) | Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning | Arxiv 2016 Harvard
- unsupervised learning representation | stacked multi-level encode representation and decode reconstruction variant, convLSTM | no inductive bias, raw pixels
Predictive Recurrent Neural Network (PredRNN) | PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning | NeurIPS 2017 TPAMI 2022 Tsinghua (Yunbo Wang)
- solve several problems in design of convLSTM for spatiotemporal predictive learning | spatiotemporal memory flow + spatiotemporal LSTM + reverse scheduled sampling curriculum learning, convLSTM | no inductive bias, raw pixels
Improved Predictive Recurrent Neural Network (PredRNN++) | PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning | ICML 2018 Tsinghua (Yunbo Wang)
- deeper in time and deep-in-time RNN vanishing gradient | causal LSTM + gradient highway unit, convLSTM | no inductive bias, raw pixels
★ Convolutional Dynamic Neural Advection (CDNA) | Unsupervised Learning for Physical Interaction through Video Prediction | NeurIPS 2016 UCBerkeley (Chelsea Finn, Ian Goodfellow, Sergey Levine)
- first real-world video long-range prediction | explicitly model pixel motion then merge previous frame, convLSTM | kernel-based transformation
Object-centric Transformation (ObjectTransformation) | Learning Object-Centric Transformation for Video Prediction | ACM-MM 2017 PKU
- different objects motion | attention to object patches and predict transformation kernels, CNN+RNN | kernel-based transformation
Spatially-Displaced Convolution Network (SDC-Net) | SDC-Net: Video prediction using spatially-displaced convolution | ECCV 2018 Nvidia
- high-resolution video prediction | combine vector-based and kernel-based transformation, 3D CNN | vector-based transformation + kernel-based transformation
★ Motion-Content Network (MCnet) | Decomposing Motion and Content for Natural Video Sequence Prediction | ICLR 2017
- first decompose motion and content | motion encoder + content encoder + combination decoder, CNN+convLSTM | motion and content separation
Decompositional Disentangled Predictive Auto-Encoder (DDPAE) | Learning to Decompose and Disentangle Representations for Video Prediction | NeurIPS 2018 Stanford (Li Fei-Fei)
- deal with high-dimentionality | decompose whole frame to different components and disentangle each component to time-invariant content and low-dimensionality pose, CNN+RNN+VAE | vector-based transformation + motion and content separation
★ Spatial-Temporal Multi-Frequency Analysis Network (STMFANet) | Exploring Spatial-Temporal Multi-Frequency Analysis for High-Fidelity and Temporal-Consistency Video Prediction | CVPR 2020 CAS
- deal with image distortion and temporal inconsistency | merge multi-level both spatial and temporal wavelet analysis into prediction, CNN+LSTM+wavelet | add traditional CV, raw pixels
★ Stochastic Variational Video Prediction (SV2P) | Stochastic Variational Video Prediction | ICLR 2018 UIUC (Chelsea Finn, Sergey Levine)
- first introduce stochastic | VAE noise as stochastic condition for CDNA, 3D CNN+convLSTM+VAE | kernel-based transformation + VAE stochastic
Stochastic Video Generation with a Learned Prior (SVG-LP) | Stochastic Video Generation with a Learned Prior | ICML 2018 NYU
- "learned prior as uncertainty predictive model" | learned prior for VAE, convLSTM+VAE | VAE stochastic
Stochastic Adversarial Video Prediction (SAVP) | Stochastic Adversarial Video Prediction | ICLR 2019 UCBerkeley (Chelsea Finn, Sergey Levine)
- bring together stochastic and realistic | VAE-GAN for SV2P, 3D CNN+convLSTM+VAE+GAN | kernel-based transformation + VAE stochastic
Hierarchical VRNN (Hierarchical-VRNN) | Improved Conditional VRNNs for Video Prediction | ICCV 2019
- "still blurry and due to underfitting" | hierarchical levels of latents to increase expressiveness, CNN+RNN+VAE | VAE hierarchical stochastic
Greedy Hierarchical Variational Auto-Encoders (GHVAE) | Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction | CVPR 2021 Stanford (Li Fei-Fei, Chelsea Finn)
- deal with memory constraints and optimization instability problems for hierarchical VAE | greedy and modular optimization, CNN+RNN+VAE | VAE hierarchical stochastic
Beyond Mean Square Error (BeyondMSE) | Deep multi-scale video prediction beyond mean square error | ICLR 2016 FAIR NYU (Yann LeCun)
- deal with blur | adversarial loss + gradient difference loss, CNN+GAN | no inductive bias, raw pixels
Eidetic 3D LSTM (E3D-LSTM) | Eidetic 3D LSTM: A Model for Video Prediction and Beyond | ICLR 2019 Tsinghua (Yunbo Wang, Li Fei-Fei)
- learn good for both short-term and long-term | 3D CNN for local dynamics and recurrent modeling for temporal dependencies, 3D CNN+LSTM | no inductive bias, raw pixels
★ Simple Video Prediction (SimVP) | SimVP: Simpler yet Better Video Prediction | CVPR 2022
- investigate simple techniques for CNN in video prediction | pure 2D CNN and only MSE loss, CNN | no inductive bias, raw pixels
Video Diffusion Models (VDM) | Video Diffusion Models | NeurIPS 2022 Google (Jonathan Ho)
- first video diffusion model for primarily unconditional video generation | diffusion model with 3D U-Net, 3D CNN+diffusion | no inductive bias, raw pixels
★ Masked Conditional Video Diffusion (MCVD) | MCVD: Masked Conditional Video Diffusion for Prediction, Generation, and Interpolation | NeurIPS 2022
- general-purpose as prediction/generation/interpolation | conditioned on masked past or future frames U-Net, CNN+diffusion | no inductive bias, raw pixels
Residual Video Diffusion (RVD) | Diffusion Probabilistic Modeling for Video Generation | Arxiv 2022
- "residual errors are easier to model than future observations" | MAF for average + diffusion for residual, CNN+RNN+diffusion | no inductive bias, raw pixels
Flexible Diffusion Model (FDM) | Flexible Diffusion Modeling of Long Videos | Arxiv 2022
- deal with long duration coherent prediction | randomly sampling train, 3D CNN+diffusion | no inductive bias, raw pixels
Video Transformer (VideoTransformer) | Scaling Autoregressive Video Models | ICLR 2020 Google
- first Transformer in video prediction | block-local self-attention and spatiotemporal subscaling for reducing memory, Transformer | no inductive bias, raw pixels
★ Latent Video Transformer (LVT) | Latent Video Transformer | Arxiv 2020
- solve computation requirement problem | VQ-VAE encodes pixels into discrete latent space and VideoTransformer operates in the discrete latent space, Transformer | discrete latent space
Convolutional Transformer (ConvTransformer) | ConvTransformer: A Convolutional Transformer Network for Video Frame Synthesis | Arxiv 2021
- combine CNN and Transformer in video prediction | multi-head convolutional self-attention, Transformer+CNN | no inductive bias, raw pixels
Video Generative Pre-Training (VideoGPT) | VideoGPT: Video Generation using VQ-VAE and Transformers | Arxiv 2021 UCBerkeley (Pieter Abbeel)
- combine GPT and Transformer in video prediction | VQ-VAE encodes pixels into discrete latent space and VideoTransformer operates in the discrete latent space, Transformer | discrete latent space
Video Prediction Transformer (VPTR) | Video Prediction by Efficient Transformers | ICPR 2022 IVC 2022
- solve computation requirement problem and extensive experiments on Transformer autoregressive formats | Pix2Pix autoencoder and VidHRFormer attention, Transformer | latent space
Masked Video Transformer (MaskViT) | MaskViT: Masked Visual Pre-Training for Video Prediction | ICLR 2023 Stanford (Jiajun Wu, Fei-Fei Li)
- mask visual modeling pre-training for video | VQ-GAN quantizing frame and mask visual modeling training, Transformer | discrete latent space
MAsked Generative VIdeo Transformer (MAGVIT) | MAGVIT: Masked Generative Video Transformer | CVPR 2023 CMU Google
- single model for multiple video synthesis tasks | 3D-VQ quantizing video and multi-task mask token modeling training, Transformer | discrete latent space
MOtion Scene and Object (MOSO) | MOSO: Decomposing MOtion, Scene and Object for Video Prediction | CVPR 2023 CAS
- decompose motion, scene and object | separate VQVAE quantizing and Transformer prediction, Transformer | discrete latent space + motion and content separation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Video Prediction

Table of Contents

Blogs

Surveys

Papers

About

Releases

Packages

dadadadawjb/awesome-video-prediction

Folders and files

Latest commit

History

Repository files navigation

Awesome Video Prediction

Table of Contents

Blogs

Surveys

Papers

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages