\n {/* */}\n {/* */}\n \n TL;DR: We introduce Comp4D, compositional 4D scene synthesis from text input. Compared with previous object-centric 4D generation pipelines, our Compositional 4D Generation (Comp4D) framework integrates GPT-4 to decompose the scene and design proper trajectories, resulting in larger-scale movements and more realistic object interactions.\n
\n {/*
*/}\n {/* */}\n {/* */}\n \n {/* */}\n {/* */}\n \n
\n \n {/* */}\n \n Compositional 4D Scene Generation\n \n \n Previous work concentrate on object-centric 4D objects. In comparison, our work extends the boundaries to the demanding task of constructing compositional 4D scenes. We integrate GPT-4 to decompose the scene and design proper trajectories, resulting in larger-scale movements and more realistic object interactions.\n
\n \n
\n \n \n Here's an overview of our proposed Comp4D method. Given an input text, first, we use LLM for scene decomposition to obtain multiple individual 3D objects. Subsequently, we adopt LLM to design the object trajectories which guide the displacements of objects in the optimization of compositional 4D scene. Thanks to the compositional 4D representation implemented with 3D Gaussians, in each iteration of the compositional score distillation process, we are able to switch between object-centric rendering and trajectory-guided rendering flexibly.\n
\n \n \n Here's the overall pipeline for object trajectory generation. First, a scene description is provided by a human user as a prompt to an LLM which yields relative object scale information required for rendering. Subsequently, the language model is prompted with this information along with environmental constraints and tasked to return a function that takes timestep as an input and returns the corresponding object's 3D position governed by kinematic equations. After the collection of a set of positions, collision checking is performed to clip the trajectory where the first collision occurs. Optionally, premature collisions can be mitigated by re-querying the LLM for an improved function.\n
\n \n \n \n Our method is evaluated against per-frame baselines and concurrent method Consistent4D. We also provide image-to-4D results by using Stable Video Diffusion to generate a video from an image.\n
\n
\n {/* */}\n \n \n \n If you want to cite our work, please kindly use:\n
\n \n {`@article{comp4d,\n title={Comp4D: LLM-Guided Compositional 4D Scene Generation},\n author={},\n journal={arXiv preprint},\n year={2023}\n}\\}`}\n
\n
\n \n \n \n \n 4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency\n \n We introduce grounded 4D content generation. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. \n \n \n \n Taming Mode Collapse in Score Distillation for Text-to-3D Generation\n \n We derive an entropy-maximizing score distillation rule that fosters view diversity and address the multi-face problem for text-to-3D generation.\n \n \n \n SteinDreamer: Variance Reduction for Text-to-3D Score Distillation via Stein Identity\n \n We adopt control variate method constructed by Stein identity to reduce variance in Monte Carlo estimation for text-to-3D score distillation.\n \n
\n