Skip to content

ZhouyaoXie/interactive-immersive-multimedia-generation

Repository files navigation

Interactive Immersive Multimedia Generation

Overview

This is the project repository for CMU 10-615 Art and Machine Learning's final project. Given a spoken speech as input, we generate a film that consits of music, lyrics, and images.

Our final outputs can be found under the /video directory.

Our Team

  • Zhouyao Xie: School of Computer Science, Language Technology Institute, Master of Computational Data Science
  • Nikhil Yadala: School of Computer Science, Language Technology Institute, Master of Computational Data Science
  • Yifan He: College of Fine Arts, School of Music, Music and Technology
  • Guannan Tang: College of Engineering, Materials Science Department

Report & Presentation

Our report is included in this repository (see report.pdf). You can also check out our report via this link.

Our presentation slide has also been uploaded to this repo (see presentation.pdf). It could also be found here.

References

[1] Liu, Xingchao, Chengyue Gong, Lemeng Wu, Shujian Zhang, Hao Su, and Qiang Liu. "FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+ GAN Space Optimization." arXiv preprint arXiv:2112.01573 (2021).
[2] Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry et al. "Learning transferable visual models from natural language supervision." In International Conference on Machine Learning, pp. 8748-8763. PMLR, 2021.
[3] HuggingfaceArtists models: https://huggingface.co/huggingartists
[4] Dhariwal, Prafulla, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, and Ilya Sutskever. "Jukebox: A generative model for music." arXiv preprint arXiv:2005.00341 (2020).

About

CMU 10-615 Art and Machine Learning Final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published