Skip to content

"Make-A-Video", new SOTA text to video by Meta-FAIR - Tensorflow

License

Notifications You must be signed in to change notification settings

soran-ghaderi/make-a-video

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAKE-A-VIDEO

TEXT-TO-VIDEO GENERATION WITHOUT TEXT-VIDEO DATA

Main components

It consists of three main components:

  1. A base T2I model trained on text-image pairs
  2. spatiotemporal convolution and attention layers that extend the networks’ building blocks to the temporal dimension
  3. spatiotemporal networks that consist of both spatiotemporal layers, as well as another crucial element needed for T2V generation - a frame interpolation network for high frame rate generation

Spaciotemporal layers

1. Pseudo-3D convolutional layers

It stacks a 1D convolution following each 2D convolutional (conv) layer to facilitate information sharing between spacial and temporal axes with less computational power compared with 3D conv layers.

2. Pseudo-3D attention layers

3 Frame interpolation network

About

"Make-A-Video", new SOTA text to video by Meta-FAIR - Tensorflow

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages