Training Results for ElucidatedImagen on DAVIS #368
gauenk
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Training Details: I trained on the DAVIS train-val dataset (90 videos of about 80 frames) for 400k iterations on each UNet for a total of 800k iterations. I used the ElucidatedImagen with Unet3D. There is no text prompt. I trained on two Titan RTX GPUs with 24 GB memory. The UNets embedding dimensions are both 64. The low and "high" resolution UNets are 64x64 and 128x128. The UNets are trained on 12 frames and 3 frames, respectively, with a temporal downsampling of two for the first UNet. The batch size is 4 and 2, respectively.
Results: I am not sure what to think of the outcome. I am happy something happened 😄 but its not an impressive result. I suspect maybe its my small embedding size. The final videos look a bit memorized, and the temporal consistency is not very good. There is also seemingly limited diversity in my results. I include an example videos from 200k, 300k, and 400k iterations when training the second UNet below:
200k
300k
400k
Checkpoints: I have checkpoint files, but I don't know how to share them. The file size is pretty big (1.5 GB), and I can't upload them to google drive. If someone is interested and has a recommended way of sharing the weights, I can do so.
Beta Was this translation helpful? Give feedback.
All reactions