https://arxiv.org/abs/2111.12417
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion (Chenfei Wu, Jian Liang, Lei Ji, Fan Yang, Yuejian Fang, Daxin Jiang, Nan Duan)
text to image/video 성능이 갑자기 한 방에 올라왔네요. 결과물 보고 좀 소름돋았습니다.
#multimodal_generation