From af8e16296cc8b08d33880de5905033a6ef69c1eb Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Wed, 4 Dec 2024 00:51:27 +0000 Subject: [PATCH] =?UTF-8?q?=E8=87=AA=E5=8A=A8=E6=9B=B4=E6=96=B0:=202024-12?= =?UTF-8?q?-04?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- README.md | 42 +- data/papers_2024-12-04.json | 25332 ++++++++++++++++++++++++++++++++++ 2 files changed, 25353 insertions(+), 21 deletions(-) create mode 100644 data/papers_2024-12-04.json diff --git a/README.md b/README.md index 4e74267..b35449d 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ 这个仓库收集了与 Gaussian Splatting 相关的最新研究论文、项目和资源。内容每日自动更新。 -> 最后更新时间: 2024-12-03 00:51:40 +> 最后更新时间: 2024-12-04 00:51:27 ## 目录 @@ -16,9 +16,9 @@ > 🔄 每日更新 ### 2024年11月 -- **[GuardSplat: Robust and Efficient Watermarking for 3D Gaussian Splatting](https://arxiv.org/abs/2411.19895v1)** +- **[GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting](https://arxiv.org/abs/2411.19895v2)** 作者: Zixuan Chen, Guangcong Wang, Jiahao Zhu, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.19895v1.pdf) + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.19895v2.pdf) 摘要: 3D Gaussian Splatting (3DGS) has recently created impressive assets for various applications. However, the copyright of these assets is not well protected as existing watermarking methods are not suit... 关键词: gaussian splatting, 3d gaussian - **[DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering](https://arxiv.org/abs/2411.19756v1)** @@ -51,9 +51,9 @@ 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.19548v1.pdf) 摘要: Closed-loop simulation is crucial for end-to-end autonomous driving. Existing sensor simulation methods (e.g., NeRF and 3DGS) reconstruct driving scenes based on conditions that closely mirror trainin... 关键词: nerf -- **[GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction](https://arxiv.org/abs/2411.19454v1)** +- **[GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction](https://arxiv.org/abs/2411.19454v2)** 作者: Jiepeng Wang, Yuan Liu, Peng Wang, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.19454v1.pdf) + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.19454v2.pdf) 摘要: 3D Gaussian Splatting has achieved impressive performance in novel view synthesis with real-time rendering capabilities. However, reconstructing high-quality surfaces with fine details using 3D Gaussi... 关键词: gaussian splatting, 3d gaussian, real-time rendering - **[RF-3DGS: Wireless Channel Modeling with Radio Radiance Field and 3D Gaussian Splatting](https://arxiv.org/abs/2411.19420v1)** @@ -367,9 +367,9 @@ 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.15193v1.pdf) 摘要: We introduce a training-free method for feature field rendering in Gaussian splatting. Our approach back-projects 2D features into pre-trained 3D Gaussians, using a weighted sum based on each Gaussian... 关键词: gaussian splatting, 3d gaussian -- **[Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels](https://arxiv.org/abs/2411.12440v2)** +- **[Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels](https://arxiv.org/abs/2411.12440v3)** 作者: Haodong Chen, Runnan Chen, Qiang Qu, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.12440v2.pdf) + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2411.12440v3.pdf) 摘要: Recent advancements in 3D Gaussian Splatting (3DGS) have substantially improved novel view synthesis, enabling high-quality reconstruction and real-time rendering. However, blurring artifacts, such as... 关键词: gaussian splatting, 3d gaussian, real-time rendering - **[Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification](https://arxiv.org/abs/2411.12788v1)** @@ -1085,9 +1085,9 @@ 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2410.06475v1.pdf) 摘要: The field of 3D representation has experienced significant advancements, driven by the increasing demand for high-fidelity 3D models in various applications such as computer graphics, virtual reality,... 关键词: gaussian splatting, 3d gaussian, nerf -- **[Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting](https://arxiv.org/abs/2410.07266v3)** +- **[Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting](https://arxiv.org/abs/2410.07266v4)** 作者: Weixing Zhang, Zongrui Li, De Ma, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2410.07266v3.pdf) | [![Stars](https://img.shields.io/github/stars/zju-bmi-lab/SpikingGS?style=social)](https://github.com/zju-bmi-lab/SpikingGS) + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2410.07266v4.pdf) | [![Stars](https://img.shields.io/github/stars/zju-bmi-lab/SpikingGS?style=social)](https://github.com/zju-bmi-lab/SpikingGS) 摘要: 3D Gaussian Splatting is capable of reconstructing 3D scenes in minutes. Despite recent advances in improving surface reconstruction accuracy, the reconstructed results still exhibit bias and suffer f... 关键词: gaussian splatting, 3d gaussian - **[HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction](https://arxiv.org/abs/2410.06245v1)** @@ -1758,9 +1758,9 @@ 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.16982v1.pdf) 摘要: 2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel ... 关键词: gaussian splatting, 3d reconstruction -- **[ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model](https://arxiv.org/abs/2408.16767v1)** +- **[ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model](https://arxiv.org/abs/2408.16767v2)** 作者: Fangfu Liu, Wenqiang Sun, Hanyang Wang, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.16767v1.pdf) + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.16767v2.pdf) 摘要: Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view... 关键词: gaussian splatting, 3d gaussian - **[OmniRe: Omni Urban Scene Reconstruction](https://arxiv.org/abs/2408.16760v1)** @@ -1907,10 +1907,10 @@ 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.10906v1.pdf) 摘要: 3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the rese... 关键词: gaussian splatting, 3d gaussian -- **[Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics](https://arxiv.org/abs/2408.10789v1)** +- **[PartGS:Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics](https://arxiv.org/abs/2408.10789v2)** 作者: Zhirui Gao, Renjiao Yi, Yuhang Huang, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.10789v1.pdf) - 摘要: Low-level 3D representations, such as point clouds, meshes, NeRFs, and 3D Gaussians, are commonly used to represent 3D objects or scenes. However, humans usually perceive 3D objects or scenes at a hig... + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2408.10789v2.pdf) + 摘要: Low-level 3D representations, such as point clouds, meshes, NeRFs, and 3D Gaussians, are commonly used to represent 3D objects or scenes. However, human perception typically understands 3D objects at ... 关键词: 3d gaussian, 3d reconstruction, nerf - **[DEGAS: Detailed Expressions on Full-Body Gaussian Avatars](https://arxiv.org/abs/2408.10588v1)** 作者: Zhijing Shao, Duotun Wang, Qing-Yao Tian, 等 @@ -3357,11 +3357,11 @@ 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2404.16510v1.pdf) 摘要: 3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user ex... 关键词: gaussian splatting -- **[DIG3D: Marrying Gaussian Splatting with Deformable Transformer for Single Image 3D Reconstruction](https://arxiv.org/abs/2404.16323v1)** +- **[LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians](https://arxiv.org/abs/2404.16323v2)** 作者: Jiamin Wu, Kenkun Liu, Han Gao, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2404.16323v1.pdf) - 摘要: In this paper, we study the problem of 3D reconstruction from a single-view RGB image and propose a novel approach called DIG3D for 3D object reconstruction and novel view synthesis. Our method utiliz... - 关键词: 3d gaussian, 3d reconstruction + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2404.16323v2.pdf) | [![Stars](https://img.shields.io/github/stars/jwubz123/DIG3D?style=social)](https://github.com/jwubz123/DIG3D) + 摘要: Rencently, Gaussian splatting has demonstrated significant success in novel view synthesis. Current methods often regress Gaussians with pixel or point cloud correspondence, linking each Gaussian with... + 关键词: gaussian splatting, 3d gaussian, 3d reconstruction - **[GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting](https://arxiv.org/abs/2404.16012v2)** 作者: Kyusun Cho, Joungbin Lee, Heeji Yoon, 等 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2404.16012v2.pdf) | [![Stars](https://img.shields.io/github/stars/KU-CVLAB/GaussianTalker?style=social)](https://github.com/KU-CVLAB/GaussianTalker) @@ -4187,10 +4187,10 @@ 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2402.07181v2.pdf) 摘要: 3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural... 关键词: gaussian splatting, 3d gaussian, nerf -- **[ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting](https://arxiv.org/abs/2402.06390v1)** +- **[Deepfake for the Good: Generating Avatars through Face-Swapping with Implicit Deepfake Generation](https://arxiv.org/abs/2402.06390v2)** 作者: Georgii Stanishevskii, Jakub Steczkiewicz, Tomasz Szczepanik, 等 - 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2402.06390v1.pdf) - 摘要: Numerous emerging deep-learning techniques have had a substantial impact on computer graphics. Among the most promising breakthroughs are the recent rise of Neural Radiance Fields (NeRFs) and Gaussian... + 链接: [![PDF](https://img.shields.io/badge/PDF-arXiv-b31b1b.svg)](https://arxiv.org/pdf/2402.06390v2.pdf) + 摘要: Numerous emerging deep-learning techniques have had a substantial impact on computer graphics. Among the most promising breakthroughs are the rise of Neural Radiance Fields (NeRFs) and Gaussian Splatt... 关键词: gaussian splatting, nerf - **[GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data](https://arxiv.org/abs/2402.06198v2)** 作者: Haoyuan Li, Yanpeng Zhou, Yihan Zeng, 等 diff --git a/data/papers_2024-12-04.json b/data/papers_2024-12-04.json new file mode 100644 index 0000000..ec17484 --- /dev/null +++ b/data/papers_2024-12-04.json @@ -0,0 +1,25332 @@ +[ + { + "title": "GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting", + "authors": [ + "Zixuan Chen", + "Guangcong Wang", + "Jiahao Zhu", + "Jianhuang Lai", + "Xiaohua Xie" + ], + "abstract": "3D Gaussian Splatting (3DGS) has recently created impressive assets for various applications. However, the copyright of these assets is not well protected as existing watermarking methods are not suited for 3DGS considering security, capacity, and invisibility. Besides, these methods often require hours or even days for optimization, limiting the application scenarios. In this paper, we propose GuardSplat, an innovative and efficient framework that effectively protects the copyright of 3DGS assets. Specifically, 1) We first propose a CLIP-guided Message Decoupling Optimization module for training the message decoder, leveraging CLIP's aligning capability and rich representations to achieve a high extraction accuracy with minimal optimization costs, presenting exceptional capability and efficiency. 2) Then, we propose a Spherical-harmonic-aware (SH-aware) Message Embedding module tailored for 3DGS, which employs a set of SH offsets to seamlessly embed the message into the SH features of each 3D Gaussian while maintaining the original 3D structure. It enables the 3DGS assets to be watermarked with minimal fidelity trade-offs and prevents malicious users from removing the messages from the model files, meeting the demands for invisibility and security. 3) We further propose an Anti-distortion Message Extraction module to improve robustness against various visual distortions. Extensive experiments demonstrate that GuardSplat outperforms the state-of-the-art methods and achieves fast optimization speed.", + "arxiv_url": "http://arxiv.org/abs/2411.19895v2", + "pdf_url": "http://arxiv.org/pdf/2411.19895v2", + "published_date": "2024-11-29", + "categories": [ + "cs.CV", + "cs.CR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering", + "authors": [ + "Yihao Wang", + "Marcus Klasson", + "Matias Turkulainen", + "Shuzhe Wang", + "Juho Kannala", + "Arno Solin" + ], + "abstract": "Gaussian splatting enables fast novel view synthesis in static 3D environments. However, reconstructing real-world environments remains challenging as distractors or occluders break the multi-view consistency assumption required for accurate 3D reconstruction. Most existing methods rely on external semantic information from pre-trained models, introducing additional computational overhead as pre-processing steps or during optimization. In this work, we propose a novel method, DeSplat, that directly separates distractors and static scene elements purely based on volume rendering of Gaussian primitives. We initialize Gaussians within each camera view for reconstructing the view-specific distractors to separately model the static 3D scene and distractors in the alpha compositing stages. DeSplat yields an explicit scene separation of static elements and distractors, achieving comparable results to prior distractor-free approaches without sacrificing rendering speed. We demonstrate DeSplat's effectiveness on three benchmark data sets for distractor-free novel view synthesis. See the project website at https://aaltoml.github.io/desplat/.", + "arxiv_url": "http://arxiv.org/abs/2411.19756v1", + "pdf_url": "http://arxiv.org/pdf/2411.19756v1", + "published_date": "2024-11-29", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting", + "authors": [ + "Bojun Xiong", + "Jialun Liu", + "Jiakui Hu", + "Chenming Wu", + "Jinbo Wu", + "Xing Liu", + "Chen Zhao", + "Errui Ding", + "Zhouhui Lian" + ], + "abstract": "Physically Based Rendering (PBR) materials play a crucial role in modern graphics, enabling photorealistic rendering across diverse environment maps. Developing an effective and efficient algorithm that is capable of automatically generating high-quality PBR materials rather than RGB texture for 3D meshes can significantly streamline the 3D content creation. Most existing methods leverage pre-trained 2D diffusion models for multi-view image synthesis, which often leads to severe inconsistency between the generated textures and input 3D meshes. This paper presents TexGaussian, a novel method that uses octant-aligned 3D Gaussian Splatting for rapid PBR material generation. Specifically, we place each 3D Gaussian on the finest leaf node of the octree built from the input 3D mesh to render the multiview images not only for the albedo map but also for roughness and metallic. Moreover, our model is trained in a regression manner instead of diffusion denoising, capable of generating the PBR material for a 3D mesh in a single feed-forward process. Extensive experiments on publicly available benchmarks demonstrate that our method synthesizes more visually pleasing PBR materials and runs faster than previous methods in both unconditional and text-conditional scenarios, which exhibit better consistency with the given geometry. Our code and trained models are available at https://3d-aigc.github.io/TexGaussian.", + "arxiv_url": "http://arxiv.org/abs/2411.19654v1", + "pdf_url": "http://arxiv.org/pdf/2411.19654v1", + "published_date": "2024-11-29", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Tortho-Gaussian: Splatting True Digital Orthophoto Maps", + "authors": [ + "Xin Wang", + "Wendi Zhang", + "Hong Xie", + "Haibin Ai", + "Qiangqiang Yuan", + "Zongqian Zhan" + ], + "abstract": "True Digital Orthophoto Maps (TDOMs) are essential products for digital twins and Geographic Information Systems (GIS). Traditionally, TDOM generation involves a complex set of traditional photogrammetric process, which may deteriorate due to various challenges, including inaccurate Digital Surface Model (DSM), degenerated occlusion detections, and visual artifacts in weak texture regions and reflective surfaces, etc. To address these challenges, we introduce TOrtho-Gaussian, a novel method inspired by 3D Gaussian Splatting (3DGS) that generates TDOMs through orthogonal splatting of optimized anisotropic Gaussian kernel. More specifically, we first simplify the orthophoto generation by orthographically splatting the Gaussian kernels onto 2D image planes, formulating a geometrically elegant solution that avoids the need for explicit DSM and occlusion detection. Second, to produce TDOM of large-scale area, a divide-and-conquer strategy is adopted to optimize memory usage and time efficiency of training and rendering for 3DGS. Lastly, we design a fully anisotropic Gaussian kernel that adapts to the varying characteristics of different regions, particularly improving the rendering quality of reflective surfaces and slender structures. Extensive experimental evaluations demonstrate that our method outperforms existing commercial software in several aspects, including the accuracy of building boundaries, the visual quality of low-texture regions and building facades. These results underscore the potential of our approach for large-scale urban scene reconstruction, offering a robust alternative for enhancing TDOM quality and scalability.", + "arxiv_url": "http://arxiv.org/abs/2411.19594v1", + "pdf_url": "http://arxiv.org/pdf/2411.19594v1", + "published_date": "2024-11-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splashing: Direct Volumetric Rendering Underwater", + "authors": [ + "Nir Mualem", + "Roy Amoyal", + "Oren Freifeld", + "Derya Akkaynak" + ], + "abstract": "In underwater images, most useful features are occluded by water. The extent of the occlusion depends on imaging geometry and can vary even across a sequence of burst images. As a result, 3D reconstruction methods robust on in-air scenes, like Neural Radiance Field methods (NeRFs) or 3D Gaussian Splatting (3DGS), fail on underwater scenes. While a recent underwater adaptation of NeRFs achieved state-of-the-art results, it is impractically slow: reconstruction takes hours and its rendering rate, in frames per second (FPS), is less than 1. Here, we present a new method that takes only a few minutes for reconstruction and renders novel underwater scenes at 140 FPS. Named Gaussian Splashing, our method unifies the strengths and speed of 3DGS with an image formation model for capturing scattering, introducing innovations in the rendering and depth estimation procedures and in the 3DGS loss function. Despite the complexities of underwater adaptation, our method produces images at unparalleled speeds with superior details. Moreover, it reveals distant scene details with far greater clarity than other methods, dramatically improving reconstructed and rendered images. We demonstrate results on existing datasets and a new dataset we have collected. Additional visual results are available at: https://bgu-cs-vil.github.io/gaussiansplashingUW.github.io/ .", + "arxiv_url": "http://arxiv.org/abs/2411.19588v1", + "pdf_url": "http://arxiv.org/pdf/2411.19588v1", + "published_date": "2024-11-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding", + "authors": [ + "Wenbo Zhang", + "Lu Zhang", + "Ping Hu", + "Liqian Ma", + "Yunzhi Zhuge", + "Huchuan Lu" + ], + "abstract": "Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention. While current approaches typically distill 3D semantic features from 2D foundational models (e.g., CLIP and SAM) to facilitate novel view segmentation and semantic understanding, their heavy reliance on 2D supervision can undermine cross-view semantic consistency and necessitate complex data preparation processes, therefore hindering view-consistent scene understanding. In this work, we present FreeGS, an unsupervised semantic-embedded 3DGS framework that achieves view-consistent 3D scene understanding without the need for 2D labels. Instead of directly learning semantic features, we introduce the IDentity-coupled Semantic Field (IDSF) into 3DGS, which captures both semantic representations and view-consistent instance indices for each Gaussian. We optimize IDSF with a two-step alternating strategy: semantics help to extract coherent instances in 3D space, while the resulting instances regularize the injection of stable semantics from 2D space. Additionally, we adopt a 2D-3D joint contrastive loss to enhance the complementarity between view-consistent 3D geometry and rich semantics during the bootstrapping process, enabling FreeGS to uniformly perform tasks such as novel-view semantic segmentation, object selection, and 3D object detection. Extensive experiments on LERF-Mask, 3D-OVS, and ScanNet datasets demonstrate that FreeGS performs comparably to state-of-the-art methods while avoiding the complex data preprocessing workload.", + "arxiv_url": "http://arxiv.org/abs/2411.19551v1", + "pdf_url": "http://arxiv.org/pdf/2411.19551v1", + "published_date": "2024-11-29", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration", + "authors": [ + "Chaojun Ni", + "Guosheng Zhao", + "Xiaofeng Wang", + "Zheng Zhu", + "Wenkang Qin", + "Guan Huang", + "Chen Liu", + "Yuyin Chen", + "Yida Wang", + "Xueyang Zhang", + "Yifei Zhan", + "Kun Zhan", + "Peng Jia", + "Xianpeng Lang", + "Xingang Wang", + "Wenjun Mei" + ], + "abstract": "Closed-loop simulation is crucial for end-to-end autonomous driving. Existing sensor simulation methods (e.g., NeRF and 3DGS) reconstruct driving scenes based on conditions that closely mirror training data distributions. However, these methods struggle with rendering novel trajectories, such as lane changes. Recent works have demonstrated that integrating world model knowledge alleviates these issues. Despite their efficiency, these approaches still encounter difficulties in the accurate representation of more complex maneuvers, with multi-lane shifts being a notable example. Therefore, we introduce ReconDreamer, which enhances driving scene reconstruction through incremental integration of world model knowledge. Specifically, DriveRestorer is proposed to mitigate artifacts via online restoration. This is complemented by a progressive data update strategy designed to ensure high-quality rendering for more complex maneuvers. To the best of our knowledge, ReconDreamer is the first method to effectively render in large maneuvers. Experimental results demonstrate that ReconDreamer outperforms Street Gaussians in the NTA-IoU, NTL-IoU, and FID, with relative improvements by 24.87%, 6.72%, and 29.97%. Furthermore, ReconDreamer surpasses DriveDreamer4D with PVG during large maneuver rendering, as verified by a relative improvement of 195.87% in the NTA-IoU metric and a comprehensive user study.", + "arxiv_url": "http://arxiv.org/abs/2411.19548v1", + "pdf_url": "http://arxiv.org/pdf/2411.19548v1", + "published_date": "2024-11-29", + "categories": [ + "cs.CV", + "cs.AI", + "cs.LG", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction", + "authors": [ + "Jiepeng Wang", + "Yuan Liu", + "Peng Wang", + "Cheng Lin", + "Junhui Hou", + "Xin Li", + "Taku Komura", + "Wenping Wang" + ], + "abstract": "3D Gaussian Splatting has achieved impressive performance in novel view synthesis with real-time rendering capabilities. However, reconstructing high-quality surfaces with fine details using 3D Gaussians remains a challenging task. In this work, we introduce GausSurf, a novel approach to high-quality surface reconstruction by employing geometry guidance from multi-view consistency in texture-rich areas and normal priors in texture-less areas of a scene. We observe that a scene can be mainly divided into two primary regions: 1) texture-rich and 2) texture-less areas. To enforce multi-view consistency at texture-rich areas, we enhance the reconstruction quality by incorporating a traditional patch-match based Multi-View Stereo (MVS) approach to guide the geometry optimization in an iterative scheme. This scheme allows for mutual reinforcement between the optimization of Gaussians and patch-match refinement, which significantly improves the reconstruction results and accelerates the training process. Meanwhile, for the texture-less areas, we leverage normal priors from a pre-trained normal estimation model to guide optimization. Extensive experiments on the DTU and Tanks and Temples datasets demonstrate that our method surpasses state-of-the-art methods in terms of reconstruction quality and computation time.", + "arxiv_url": "http://arxiv.org/abs/2411.19454v2", + "pdf_url": "http://arxiv.org/pdf/2411.19454v2", + "published_date": "2024-11-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RF-3DGS: Wireless Channel Modeling with Radio Radiance Field and 3D Gaussian Splatting", + "authors": [ + "Lihao Zhang", + "Haijian Sun", + "Samuel Berweger", + "Camillo Gentile", + "Rose Qingyang Hu" + ], + "abstract": "Precisely modeling radio propagation in complex environments has been a significant challenge, especially with the advent of 5G and beyond networks, where managing massive antenna arrays demands more detailed information. Traditional methods, such as empirical models and ray tracing, often fall short, either due to insufficient details or with challenges for real-time applications. Inspired by the newly proposed 3D Gaussian Splatting method in computer vision domain, which outperforms in reconstructing optical radiance fields, we propose RF-3DGS, a novel approach that enables precise site-specific reconstruction of radio radiance fields from sparse samples. RF-3DGS can render spatial spectra at arbitrary positions within 2 ms following a brief 3-minute training period, effectively identifying dominant propagation paths at these locations. Furthermore, RF-3DGS can provide fine-grained Channel State Information (CSI) of these paths, including the angle of departure and delay. Our experiments, calibrated through real-world measurements, demonstrate that RF-3DGS not only significantly improves rendering quality, training speed, and rendering speed compared to state-of-the-art methods but also holds great potential for supporting wireless communication and advanced applications such as Integrated Sensing and Communication (ISAC).", + "arxiv_url": "http://arxiv.org/abs/2411.19420v1", + "pdf_url": "http://arxiv.org/pdf/2411.19420v1", + "published_date": "2024-11-29", + "categories": [ + "cs.NI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SAMa: Material-aware 3D Selection and Segmentation", + "authors": [ + "Michael Fischer", + "Iliyan Georgiev", + "Thibault Groueix", + "Vladimir G. Kim", + "Tobias Ritschel", + "Valentin Deschaintre" + ], + "abstract": "Decomposing 3D assets into material parts is a common task for artists and creators, yet remains a highly manual process. In this work, we introduce Select Any Material (SAMa), a material selection approach for various 3D representations. Building on the recently introduced SAM2 video selection model, we extend its capabilities to the material domain. We leverage the model's cross-view consistency to create a 3D-consistent intermediate material-similarity representation in the form of a point cloud from a sparse set of views. Nearest-neighbour lookups in this similarity cloud allow us to efficiently reconstruct accurate continuous selection masks over objects' surfaces that can be inspected from any view. Our method is multiview-consistent by design, alleviating the need for contrastive learning or feature-field pre-processing, and performs optimization-free selection in seconds. Our approach works on arbitrary 3D representations and outperforms several strong baselines in terms of selection accuracy and multiview consistency. It enables several compelling applications, such as replacing the diffuse-textured materials on a text-to-3D output, or selecting and editing materials on NeRFs and 3D-Gaussians.", + "arxiv_url": "http://arxiv.org/abs/2411.19322v1", + "pdf_url": "http://arxiv.org/pdf/2411.19322v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation", + "authors": [ + "Yichong Lu", + "Yichi Cai", + "Shangzhan Zhang", + "Hongyu Zhou", + "Haoji Hu", + "Huimin Yu", + "Andreas Geiger", + "Yiyi Liao" + ], + "abstract": "Photorealistic 3D vehicle models with high controllability are essential for autonomous driving simulation and data augmentation. While handcrafted CAD models provide flexible controllability, free CAD libraries often lack the high-quality materials necessary for photorealistic rendering. Conversely, reconstructed 3D models offer high-fidelity rendering but lack controllability. In this work, we introduce UrbanCAD, a framework that pushes the frontier of the photorealism-controllability trade-off by generating highly controllable and photorealistic 3D vehicle digital twins from a single urban image and a collection of free 3D CAD models and handcrafted materials. These digital twins enable realistic 360-degree rendering, vehicle insertion, material transfer, relighting, and component manipulation such as opening doors and rolling down windows, supporting the construction of long-tail scenarios. To achieve this, we propose a novel pipeline that operates in a retrieval-optimization manner, adapting to observational data while preserving flexible controllability and fine-grained handcrafted details. Furthermore, given multi-view background perspective and fisheye images, we approximate environment lighting using fisheye images and reconstruct the background with 3DGS, enabling the photorealistic insertion of optimized CAD models into rendered novel view backgrounds. Experimental results demonstrate that UrbanCAD outperforms baselines based on reconstruction and retrieval in terms of photorealism. Additionally, we show that various perception models maintain their accuracy when evaluated on UrbanCAD with in-distribution configurations but degrade when applied to realistic out-of-distribution data generated by our method. This suggests that UrbanCAD is a significant advancement in creating photorealistic, safety-critical driving scenarios for downstream applications.", + "arxiv_url": "http://arxiv.org/abs/2411.19292v1", + "pdf_url": "http://arxiv.org/pdf/2411.19292v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SADG: Segment Any Dynamic Gaussian Without Object Trackers", + "authors": [ + "Yun-Jin Li", + "Mariia Gladkova", + "Yan Xia", + "Daniel Cremers" + ], + "abstract": "Understanding dynamic 3D scenes is fundamental for various applications, including extended reality (XR) and autonomous driving. Effectively integrating semantic information into 3D reconstruction enables holistic representation that opens opportunities for immersive and interactive applications. We introduce SADG, Segment Any Dynamic Gaussian Without Object Trackers, a novel approach that combines dynamic Gaussian Splatting representation and semantic information without reliance on object IDs. In contrast to existing works, we do not rely on supervision based on object identities to enable consistent segmentation of dynamic 3D objects. To this end, we propose to learn semantically-aware features by leveraging masks generated from the Segment Anything Model (SAM) and utilizing our novel contrastive learning objective based on hard pixel mining. The learned Gaussian features can be effectively clustered without further post-processing. This enables fast computation for further object-level editing, such as object removal, composition, and style transfer by manipulating the Gaussians in the scene. We further extend several dynamic novel-view datasets with segmentation benchmarks to enable testing of learned feature fields from unseen viewpoints. We evaluate SADG on proposed benchmarks and demonstrate the superior performance of our approach in segmenting objects within dynamic scenes along with its effectiveness for further downstream editing tasks.", + "arxiv_url": "http://arxiv.org/abs/2411.19290v1", + "pdf_url": "http://arxiv.org/pdf/2411.19290v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones", + "authors": [ + "Xuqian Ren", + "Matias Turkulainen", + "Jiepeng Wang", + "Otto Seiskari", + "Iaroslav Melekhov", + "Juho Kannala", + "Esa Rahtu" + ], + "abstract": "Geometric priors are often used to enhance 3D reconstruction. With many smartphones featuring low-resolution depth sensors and the prevalence of off-the-shelf monocular geometry estimators, incorporating geometric priors as regularization signals has become common in 3D vision tasks. However, the accuracy of depth estimates from mobile devices is typically poor for highly detailed geometry, and monocular estimators often suffer from poor multi-view consistency and precision. In this work, we propose an approach for joint surface depth and normal refinement of Gaussian Splatting methods for accurate 3D reconstruction of indoor scenes. We develop supervision strategies that adaptively filters low-quality depth and normal estimates by comparing the consistency of the priors during optimization. We mitigate regularization in regions where prior estimates have high uncertainty or ambiguities. Our filtering strategy and optimization design demonstrate significant improvements in both mesh estimation and novel-view synthesis for both 3D and 2D Gaussian Splatting-based methods on challenging indoor room datasets. Furthermore, we explore the use of alternative meshing strategies for finer geometry extraction. We develop a scale-aware meshing strategy inspired by TSDF and octree-based isosurface extraction, which recovers finer details from Gaussian models compared to other commonly used open-source meshing tools. Our code is released in https://xuqianren.github.io/ags_mesh_website/.", + "arxiv_url": "http://arxiv.org/abs/2411.19271v1", + "pdf_url": "http://arxiv.org/pdf/2411.19271v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception", + "authors": [ + "Haijie Li", + "Yanmin Wu", + "Jiarui Meng", + "Qiankun Gao", + "Zhiyao Zhang", + "Ronggang Wang", + "Jian Zhang" + ], + "abstract": "3D scene understanding has become an essential area of research with applications in autonomous driving, robotics, and augmented reality. Recently, 3D Gaussian Splatting (3DGS) has emerged as a powerful approach, combining explicit modeling with neural adaptability to provide efficient and detailed scene representations. However, three major challenges remain in leveraging 3DGS for scene understanding: 1) an imbalance between appearance and semantics, where dense Gaussian usage for fine-grained texture modeling does not align with the minimal requirements for semantic attributes; 2) inconsistencies between appearance and semantics, as purely appearance-based Gaussians often misrepresent object boundaries; and 3) reliance on top-down instance segmentation methods, which struggle with uneven category distributions, leading to over- or under-segmentation. In this work, we propose InstanceGaussian, a method that jointly learns appearance and semantic features while adaptively aggregating instances. Our contributions include: i) a novel Semantic-Scaffold-GS representation balancing appearance and semantics to improve feature representations and boundary delineation; ii) a progressive appearance-semantic joint training strategy to enhance stability and segmentation accuracy; and iii) a bottom-up, category-agnostic instance aggregation approach that addresses segmentation challenges through farthest point sampling and connected component analysis. Our approach achieves state-of-the-art performance in category-agnostic, open-vocabulary 3D point-level segmentation, highlighting the effectiveness of the proposed representation and training strategies. Project page: https://lhj-git.github.io/InstanceGaussian/", + "arxiv_url": "http://arxiv.org/abs/2411.19235v1", + "pdf_url": "http://arxiv.org/pdf/2411.19235v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes", + "authors": [ + "Thomas Wimmer", + "Michael Oechsle", + "Michael Niemeyer", + "Federico Tombari" + ], + "abstract": "State-of-the-art novel view synthesis methods achieve impressive results for multi-view captures of static 3D scenes. However, the reconstructed scenes still lack \"liveliness,\" a key component for creating engaging 3D experiences. Recently, novel video diffusion models generate realistic videos with complex motion and enable animations of 2D images, however they cannot naively be used to animate 3D scenes as they lack multi-view consistency. To breathe life into the static world, we propose Gaussians2Life, a method for animating parts of high-quality 3D scenes in a Gaussian Splatting representation. Our key idea is to leverage powerful video diffusion models as the generative component of our model and to combine these with a robust technique to lift 2D videos into meaningful 3D motion. We find that, in contrast to prior work, this enables realistic animations of complex, pre-existing 3D scenes and further enables the animation of a large variety of object classes, while related work is mostly focused on prior-based character animation, or single 3D objects. Our model enables the creation of consistent, immersive 3D experiences for arbitrary scenes.", + "arxiv_url": "http://arxiv.org/abs/2411.19233v1", + "pdf_url": "http://arxiv.org/pdf/2411.19233v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors", + "authors": [ + "Rui Xu", + "Wenyue Chen", + "Jiepeng Wang", + "Yuan Liu", + "Peng Wang", + "Lin Gao", + "Shiqing Xin", + "Taku Komura", + "Xin Li", + "Wenping Wang" + ], + "abstract": "Gaussian Splattings demonstrate impressive results in multi-view reconstruction based on Gaussian explicit representations. However, the current Gaussian primitives only have a single view-dependent color and an opacity to represent the appearance and geometry of the scene, resulting in a non-compact representation. In this paper, we introduce a new method called SuperGaussians that utilizes spatially varying colors and opacity in a single Gaussian primitive to improve its representation ability. We have implemented bilinear interpolation, movable kernels, and even tiny neural networks as spatially varying functions. Quantitative and qualitative experimental results demonstrate that all three functions outperform the baseline, with the best movable kernels achieving superior novel view synthesis performance on multiple datasets, highlighting the strong potential of spatially varying functions.", + "arxiv_url": "http://arxiv.org/abs/2411.18966v1", + "pdf_url": "http://arxiv.org/pdf/2411.18966v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV", + "cs.GR", + "cs.MM" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning", + "authors": [ + "Jiacheng Wang", + "Zhedong Zheng", + "Wei Xu", + "Ping Liu" + ], + "abstract": "Given a single image of a target object, image-to-3D generation aims to reconstruct its texture and geometric shape. Recent methods often utilize intermediate media, such as multi-view images or videos, to bridge the gap between input image and the 3D target, thereby guiding the generation of both shape and texture. However, inconsistencies in the generated multi-view snapshots frequently introduce noise and artifacts along object boundaries, undermining the 3D reconstruction process. To address this challenge, we leverage 3D Gaussian Splatting (3DGS) for 3D reconstruction, and explicitly integrate uncertainty-aware learning into the reconstruction process. By capturing the stochasticity between two Gaussian models, we estimate an uncertainty map, which is subsequently used for uncertainty-aware regularization to rectify the impact of inconsistencies. Specifically, we optimize both Gaussian models simultaneously, calculating the uncertainty map by evaluating the discrepancies between rendered images from identical viewpoints. Based on the uncertainty map, we apply adaptive pixel-wise loss weighting to regularize the models, reducing reconstruction intensity in high-uncertainty regions. This approach dynamically detects and mitigates conflicts in multi-view labels, leading to smoother results and effectively reducing artifacts. Extensive experiments show the effectiveness of our method in improving 3D generation quality by reducing inconsistencies and artifacts.", + "arxiv_url": "http://arxiv.org/abs/2411.18866v1", + "pdf_url": "http://arxiv.org/pdf/2411.18866v1", + "published_date": "2024-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Textured Gaussians for Enhanced 3D Scene Appearance Modeling", + "authors": [ + "Brian Chao", + "Hung-Yu Tseng", + "Lorenzo Porzi", + "Chen Gao", + "Tuotuo Li", + "Qinbo Li", + "Ayush Saraf", + "Jia-Bin Huang", + "Johannes Kopf", + "Gordon Wetzstein", + "Changil Kim" + ], + "abstract": "3D Gaussian Splatting (3DGS) has recently emerged as a state-of-the-art 3D reconstruction and rendering technique due to its high-quality results and fast training and rendering time. However, pixels covered by the same Gaussian are always shaded in the same color up to a Gaussian falloff scaling factor. Furthermore, the finest geometric detail any individual Gaussian can represent is a simple ellipsoid. These properties of 3DGS greatly limit the expressivity of individual Gaussian primitives. To address these issues, we draw inspiration from texture and alpha mapping in traditional graphics and integrate it with 3DGS. Specifically, we propose a new generalized Gaussian appearance representation that augments each Gaussian with alpha~(A), RGB, or RGBA texture maps to model spatially varying color and opacity across the extent of each Gaussian. As such, each Gaussian can represent a richer set of texture patterns and geometric structures, instead of just a single color and ellipsoid as in naive Gaussian Splatting. Surprisingly, we found that the expressivity of Gaussians can be greatly improved by using alpha-only texture maps, and further augmenting Gaussians with RGB texture maps achieves the highest expressivity. We validate our method on a wide variety of standard benchmark datasets and our own custom captures at both the object and scene levels. We demonstrate image quality improvements over existing methods while using a similar or lower number of Gaussians.", + "arxiv_url": "http://arxiv.org/abs/2411.18625v1", + "pdf_url": "http://arxiv.org/pdf/2411.18625v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models", + "authors": [ + "Rundi Wu", + "Ruiqi Gao", + "Ben Poole", + "Alex Trevithick", + "Changxi Zheng", + "Jonathan T. Barron", + "Aleksander Holynski" + ], + "abstract": "We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconstruction via optimization of a deformable 3D Gaussian representation. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks, and highlight the creative capabilities for 4D scene generation from real or generated videos. See our project page for results and interactive demos: \\url{cat-4d.github.io}.", + "arxiv_url": "http://arxiv.org/abs/2411.18613v1", + "pdf_url": "http://arxiv.org/pdf/2411.18613v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianSpeech: Audio-Driven Gaussian Avatars", + "authors": [ + "Shivangi Aneja", + "Artem Sevastopolsky", + "Tobias Kirschstein", + "Justus Thies", + "Angela Dai", + "Matthias Nießner" + ], + "abstract": "We introduce GaussianSpeech, a novel approach that synthesizes high-fidelity animation sequences of photo-realistic, personalized 3D human head avatars from spoken audio. To capture the expressive, detailed nature of human heads, including skin furrowing and finer-scale facial movements, we propose to couple speech signal with 3D Gaussian splatting to create realistic, temporally coherent motion sequences. We propose a compact and efficient 3DGS-based avatar representation that generates expression-dependent color and leverages wrinkle- and perceptually-based losses to synthesize facial details, including wrinkles that occur with different expressions. To enable sequence modeling of 3D Gaussian splats with audio, we devise an audio-conditioned transformer model capable of extracting lip and expression features directly from audio input. Due to the absence of high-quality datasets of talking humans in correspondence with audio, we captured a new large-scale multi-view dataset of audio-visual sequences of talking humans with native English accents and diverse facial geometry. GaussianSpeech consistently achieves state-of-the-art performance with visually natural motion at real time rendering rates, while encompassing diverse facial expressions and styles.", + "arxiv_url": "http://arxiv.org/abs/2411.18675v1", + "pdf_url": "http://arxiv.org/pdf/2411.18675v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "cs.SD", + "eess.AS" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image", + "authors": [ + "Han Yan", + "Mingrui Zhang", + "Yang Li", + "Chao Ma", + "Pan Ji" + ], + "abstract": "We present PhyCAGE, the first approach for physically plausible compositional 3D asset generation from a single image. Given an input image, we first generate consistent multi-view images for components of the assets. These images are then fitted with 3D Gaussian Splatting representations. To ensure that the Gaussians representing objects are physically compatible with each other, we introduce a Physical Simulation-Enhanced Score Distillation Sampling (PSE-SDS) technique to further optimize the positions of the Gaussians. It is achieved by setting the gradient of the SDS loss as the initial velocity of the physical simulation, allowing the simulator to act as a physics-guided optimizer that progressively corrects the Gaussians' positions to a physically compatible state. Experimental results demonstrate that the proposed method can generate physically plausible compositional 3D assets given a single image.", + "arxiv_url": "http://arxiv.org/abs/2411.18548v1", + "pdf_url": "http://arxiv.org/pdf/2411.18548v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting", + "authors": [ + "Hao Liu", + "Minglin Chen", + "Yanni Ma", + "Haihong Xiao", + "Ying He" + ], + "abstract": "Pre-training on large-scale unlabeled datasets contribute to the model achieving powerful performance on 3D vision tasks, especially when annotations are limited. However, existing rendering-based self-supervised frameworks are computationally demanding and memory-intensive during pre-training due to the inherent nature of volume rendering. In this paper, we propose an efficient framework named GS$^3$ to learn point cloud representation, which seamlessly integrates fast 3D Gaussian Splatting into the rendering-based framework. The core idea behind our framework is to pre-train the point cloud encoder by comparing rendered RGB images with real RGB images, as only Gaussian points enriched with learned rich geometric and appearance information can produce high-quality renderings. Specifically, we back-project the input RGB-D images into 3D space and use a point cloud encoder to extract point-wise features. Then, we predict 3D Gaussian points of the scene from the learned point cloud features and uses a tile-based rasterizer for image rendering. Finally, the pre-trained point cloud encoder can be fine-tuned to adapt to various downstream 3D tasks, including high-level perception tasks such as 3D segmentation and detection, as well as low-level tasks such as 3D scene reconstruction. Extensive experiments on downstream tasks demonstrate the strong transferability of the pre-trained point cloud encoder and the effectiveness of our self-supervised learning framework. In addition, our GS$^3$ framework is highly efficient, achieving approximately 9$\\times$ pre-training speedup and less than 0.25$\\times$ memory cost compared to the previous rendering-based framework Ponder.", + "arxiv_url": "http://arxiv.org/abs/2411.18667v1", + "pdf_url": "http://arxiv.org/pdf/2411.18667v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HEMGS: A Hybrid Entropy Model for 3D Gaussian Splatting Data Compression", + "authors": [ + "Lei Liu", + "Zhenghao Chen", + "Dong Xu" + ], + "abstract": "Fast progress in 3D Gaussian Splatting (3DGS) has made 3D Gaussians popular for 3D modeling and image rendering, but this creates big challenges in data storage and transmission. To obtain a highly compact 3DGS representation, we propose a hybrid entropy model for Gaussian Splatting (HEMGS) data compression, which comprises two primary components, a hyperprior network and an autoregressive network. To effectively reduce structural redundancy across attributes, we apply a progressive coding algorithm to generate hyperprior features, in which we use previously compressed attributes and location as prior information. In particular, to better extract the location features from these compressed attributes, we adopt a domain-aware and instance-aware architecture to respectively capture domain-aware structural relations without additional storage costs and reveal scene-specific features through MLPs. Additionally, to reduce redundancy within each attribute, we leverage relationships between neighboring compressed elements within the attributes through an autoregressive network. Given its unique structure, we propose an adaptive context coding algorithm with flexible receptive fields to effectively capture adjacent compressed elements. Overall, we integrate our HEMGS into an end-to-end optimized 3DGS compression framework and the extensive experimental results on four benchmarks indicate that our method achieves about 40\\% average reduction in size while maintaining the rendering quality over our baseline method and achieving state-of-the-art compression results.", + "arxiv_url": "http://arxiv.org/abs/2411.18473v1", + "pdf_url": "http://arxiv.org/pdf/2411.18473v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Neural Surface Priors for Editable Gaussian Splatting", + "authors": [ + "Jakub Szymkowiak", + "Weronika Jakubowska", + "Dawid Malarz", + "Weronika Smolak-Dyżewska", + "Maciej Zięba", + "Przemysław Musialski", + "Wojtek Pałubicki", + "Przemysław Spurek" + ], + "abstract": "In computer graphics, there is a need to recover easily modifiable representations of 3D geometry and appearance from image data. We introduce a novel method for this task using 3D Gaussian Splatting, which enables intuitive scene editing through mesh adjustments. Starting with input images and camera poses, we reconstruct the underlying geometry using a neural Signed Distance Field and extract a high-quality mesh. Our model then estimates a set of Gaussians, where each component is flat, and the opacity is conditioned on the recovered neural surface. To facilitate editing, we produce a proxy representation that encodes information about the Gaussians' shape and position. Unlike other methods, our pipeline allows modifications applied to the extracted mesh to be propagated to the proxy representation, from which we recover the updated parameters of the Gaussians. This effectively transfers the mesh edits back to the recovered appearance representation. By leveraging mesh-guided transformations, our approach simplifies 3D scene editing and offers improvements over existing methods in terms of usability and visual fidelity of edits. The complete source code for this project can be accessed at \\url{https://github.com/WJakubowska/NeuralSurfacePriors}", + "arxiv_url": "http://arxiv.org/abs/2411.18311v1", + "pdf_url": "http://arxiv.org/pdf/2411.18311v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/WJakubowska/NeuralSurfacePriors", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters", + "authors": [ + "Zhiyang Guo", + "Jinxu Xiang", + "Kai Ma", + "Wengang Zhou", + "Houqiang Li", + "Ran Zhang" + ], + "abstract": "3D characters are essential to modern creative industries, but making them animatable often demands extensive manual work in tasks like rigging and skinning. Existing automatic rigging tools face several limitations, including the necessity for manual annotations, rigid skeleton topologies, and limited generalization across diverse shapes and poses. An alternative approach is to generate animatable avatars pre-bound to a rigged template mesh. However, this method often lacks flexibility and is typically limited to realistic human shapes. To address these issues, we present Make-It-Animatable, a novel data-driven method to make any 3D humanoid model ready for character animation in less than one second, regardless of its shapes and poses. Our unified framework generates high-quality blend weights, bones, and pose transformations. By incorporating a particle-based shape autoencoder, our approach supports various 3D representations, including meshes and 3D Gaussian splats. Additionally, we employ a coarse-to-fine representation and a structure-aware modeling strategy to ensure both accuracy and robustness, even for characters with non-standard skeleton structures. We conducted extensive experiments to validate our framework's effectiveness. Compared to existing methods, our approach demonstrates significant improvements in both quality and speed.", + "arxiv_url": "http://arxiv.org/abs/2411.18197v1", + "pdf_url": "http://arxiv.org/pdf/2411.18197v1", + "published_date": "2024-11-27", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images", + "authors": [ + "Yanyan Li", + "Yixin Fang", + "Federico Tombari", + "Gim Hee Lee" + ], + "abstract": "Sparse Multi-view Images can be Learned to predict explicit radiance fields via Generalizable Gaussian Splatting approaches, which can achieve wider application prospects in real-life when ground-truth camera parameters are not required as inputs. In this paper, a novel generalizable Gaussian Splatting method, SmileSplat, is proposed to reconstruct pixel-aligned Gaussian surfels for diverse scenarios only requiring unconstrained sparse multi-view images. First, Gaussian surfels are predicted based on the multi-head Gaussian regression decoder, which can are represented with less degree-of-freedom but have better multi-view consistency. Furthermore, the normal vectors of Gaussian surfel are enhanced based on high-quality of normal priors. Second, the Gaussians and camera parameters (both extrinsic and intrinsic) are optimized to obtain high-quality Gaussian radiance fields for novel view synthesis tasks based on the proposed Bundle-Adjusting Gaussian Splatting module. Extensive experiments on novel view rendering and depth map prediction tasks are conducted on public datasets, demonstrating that the proposed method achieves state-of-the-art performance in various 3D vision tasks. More information can be found on our project page (https://yanyan-li.github.io/project/gs/smilesplat)", + "arxiv_url": "http://arxiv.org/abs/2411.18072v1", + "pdf_url": "http://arxiv.org/pdf/2411.18072v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GLS: Geometry-aware 3D Language Gaussian Splatting", + "authors": [ + "Jiaxiong Qiu", + "Liu Liu", + "Zhizhong Su", + "Tianwei Lin" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has achieved significant performance on indoor surface reconstruction and open-vocabulary segmentation. This paper presents GLS, a unified framework of surface reconstruction and open-vocabulary segmentation based on 3DGS. GLS extends two fields by exploring the correlation between them. For indoor surface reconstruction, we introduce surface normal prior as a geometric cue to guide the rendered normal, and use the normal error to optimize the rendered depth. For open-vocabulary segmentation, we employ 2D CLIP features to guide instance features and utilize DEVA masks to enhance their view consistency. Extensive experiments demonstrate the effectiveness of jointly optimizing surface reconstruction and open-vocabulary segmentation, where GLS surpasses state-of-the-art approaches of each task on MuSHRoom, ScanNet++, and LERF-OVS datasets. Code will be available at https://github.com/JiaxiongQ/GLS.", + "arxiv_url": "http://arxiv.org/abs/2411.18066v1", + "pdf_url": "http://arxiv.org/pdf/2411.18066v1", + "published_date": "2024-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/JiaxiongQ/GLS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HI-SLAM2: Geometry-Aware Gaussian SLAM for Fast Monocular Scene Reconstruction", + "authors": [ + "Wei Zhang", + "Qing Cheng", + "David Skuddis", + "Niclas Zeller", + "Daniel Cremers", + "Norbert Haala" + ], + "abstract": "We present HI-SLAM2, a geometry-aware Gaussian SLAM system that achieves fast and accurate monocular scene reconstruction using only RGB input. Existing Neural SLAM or 3DGS-based SLAM methods often trade off between rendering quality and geometry accuracy, our research demonstrates that both can be achieved simultaneously with RGB input alone. The key idea of our approach is to enhance the ability for geometry estimation by combining easy-to-obtain monocular priors with learning-based dense SLAM, and then using 3D Gaussian splatting as our core map representation to efficiently model the scene. Upon loop closure, our method ensures on-the-fly global consistency through efficient pose graph bundle adjustment and instant map updates by explicitly deforming the 3D Gaussian units based on anchored keyframe updates. Furthermore, we introduce a grid-based scale alignment strategy to maintain improved scale consistency in prior depths for finer depth details. Through extensive experiments on Replica, ScanNet, and ScanNet++, we demonstrate significant improvements over existing Neural SLAM methods and even surpass RGB-D-based methods in both reconstruction and rendering quality. The project page and source code will be made available at https://hi-slam2.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2411.17982v1", + "pdf_url": "http://arxiv.org/pdf/2411.17982v1", + "published_date": "2024-11-27", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting", + "authors": [ + "Christian Homeyer", + "Leon Begiristain", + "Christoph Schnörr" + ], + "abstract": "Recent progress in scene synthesis makes standalone SLAM systems purely based on optimizing hyperprimitives with a Rendering objective possible. However, the tracking performance still lacks behind traditional and end-to-end SLAM systems. An optimal trade-off between robustness, speed and accuracy has not yet been reached, especially for monocular video. In this paper, we introduce a SLAM system based on an end-to-end Tracker and extend it with a Renderer based on recent 3D Gaussian Splatting techniques. Our framework \\textbf{DroidSplat} achieves both SotA tracking and rendering results on common SLAM benchmarks. We implemented multiple building blocks of modern SLAM systems to run in parallel, allowing for fast inference on common consumer GPU's. Recent progress in monocular depth prediction and camera calibration allows our system to achieve strong results even on in-the-wild data without known camera intrinsics. Code will be available at \\url{https://github.com/ChenHoy/DROID-Splat}.", + "arxiv_url": "http://arxiv.org/abs/2411.17660v2", + "pdf_url": "http://arxiv.org/pdf/2411.17660v2", + "published_date": "2024-11-26", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/ChenHoy/DROID-Splat", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Distractor-free Generalizable 3D Gaussian Splatting", + "authors": [ + "Yanqi Bao", + "Jing Liao", + "Jing Huo", + "Yang Gao" + ], + "abstract": "We present DGGS, a novel framework addressing the previously unexplored challenge of Distractor-free Generalizable 3D Gaussian Splatting (3DGS). It accomplishes two key objectives: fortifying generalizable 3DGS against distractor-laden data during both training and inference phases, while successfully extending cross-scene adaptation capabilities to conventional distractor-free approaches. To achieve these objectives, DGGS introduces a scene-agnostic reference-based mask prediction and refinement methodology during training phase, coupled with a training view selection strategy, effectively improving distractor prediction accuracy and training stability. Moreover, to address distractor-induced voids and artifacts during inference stage, we propose a two-stage inference framework for better reference selection based on the predicted distractor masks, complemented by a distractor pruning module to eliminate residual distractor effects. Extensive generalization experiments demonstrate DGGS's advantages under distractor-laden conditions. Additionally, experimental results show that our scene-agnostic mask inference achieves accuracy comparable to scene-specific trained methods. Homepage is \\url{https://github.com/bbbbby-99/DGGS}.", + "arxiv_url": "http://arxiv.org/abs/2411.17605v1", + "pdf_url": "http://arxiv.org/pdf/2411.17605v1", + "published_date": "2024-11-26", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/bbbbby-99/DGGS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters", + "authors": [ + "Mingze Sun", + "Junhao Chen", + "Junting Dong", + "Yurun Chen", + "Xinyu Jiang", + "Shiwei Mao", + "Puhua Jiang", + "Jingbo Wang", + "Bo Dai", + "Ruqi Huang" + ], + "abstract": "Recent advances in generative models have enabled high-quality 3D character reconstruction from multi-modal. However, animating these generated characters remains a challenging task, especially for complex elements like garments and hair, due to the lack of large-scale datasets and effective rigging methods. To address this gap, we curate AnimeRig, a large-scale dataset with detailed skeleton and skinning annotations. Building upon this, we propose DRiVE, a novel framework for generating and rigging 3D human characters with intricate structures. Unlike existing methods, DRiVE utilizes a 3D Gaussian representation, facilitating efficient animation and high-quality rendering. We further introduce GSDiff, a 3D Gaussian-based diffusion module that predicts joint positions as spatial distributions, overcoming the limitations of regression-based approaches. Extensive experiments demonstrate that DRiVE achieves precise rigging results, enabling realistic dynamics for clothing and hair, and surpassing previous methods in both quality and versatility. The code and dataset will be made public for academic use upon acceptance.", + "arxiv_url": "http://arxiv.org/abs/2411.17423v1", + "pdf_url": "http://arxiv.org/pdf/2411.17423v1", + "published_date": "2024-11-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting", + "authors": [ + "Gyeongjin Kang", + "Jisang Yoo", + "Jihyeon Park", + "Seungtae Nam", + "Hyeonsoo Im", + "Sangheon Shin", + "Sangpil Kim", + "Eunbyung Park" + ], + "abstract": "We propose SelfSplat, a novel 3D Gaussian Splatting model designed to perform pose-free and 3D prior-free generalizable 3D reconstruction from unposed multi-view images. These settings are inherently ill-posed due to the lack of ground-truth data, learned geometric information, and the need to achieve accurate 3D reconstruction without finetuning, making it difficult for conventional methods to achieve high-quality results. Our model addresses these challenges by effectively integrating explicit 3D representations with self-supervised depth and pose estimation techniques, resulting in reciprocal improvements in both pose accuracy and 3D reconstruction quality. Furthermore, we incorporate a matching-aware pose estimation network and a depth refinement module to enhance geometry consistency across views, ensuring more accurate and stable 3D reconstructions. To present the performance of our method, we evaluated it on large-scale real-world datasets, including RealEstate10K, ACID, and DL3DV. SelfSplat achieves superior results over previous state-of-the-art methods in both appearance and geometry quality, also demonstrates strong cross-dataset generalization capabilities. Extensive ablation studies and analysis also validate the effectiveness of our proposed methods. Code and pretrained models are available at https://gynjn.github.io/selfsplat/", + "arxiv_url": "http://arxiv.org/abs/2411.17190v3", + "pdf_url": "http://arxiv.org/pdf/2411.17190v3", + "published_date": "2024-11-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PhysMotion: Physics-Grounded Dynamics From a Single Image", + "authors": [ + "Xiyang Tan", + "Ying Jiang", + "Xuan Li", + "Zeshun Zong", + "Tianyi Xie", + "Yin Yang", + "Chenfanfu Jiang" + ], + "abstract": "We introduce PhysMotion, a novel framework that leverages principled physics-based simulations to guide intermediate 3D representations generated from a single image and input conditions (e.g., applied force and torque), producing high-quality, physically plausible video generation. By utilizing continuum mechanics-based simulations as a prior knowledge, our approach addresses the limitations of traditional data-driven generative models and result in more consistent physically plausible motions. Our framework begins by reconstructing a feed-forward 3D Gaussian from a single image through geometry optimization. This representation is then time-stepped using a differentiable Material Point Method (MPM) with continuum mechanics-based elastoplasticity models, which provides a strong foundation for realistic dynamics, albeit at a coarse level of detail. To enhance the geometry, appearance and ensure spatiotemporal consistency, we refine the initial simulation using a text-to-image (T2I) diffusion model with cross-frame attention, resulting in a physically plausible video that retains intricate details comparable to the input image. We conduct comprehensive qualitative and quantitative evaluations to validate the efficacy of our method. Our project page is available at: https://supertan0204.github.io/physmotion_website/.", + "arxiv_url": "http://arxiv.org/abs/2411.17189v2", + "pdf_url": "http://arxiv.org/pdf/2411.17189v2", + "published_date": "2024-11-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "4D Scaffold Gaussian Splatting for Memory Efficient Dynamic Scene Reconstruction", + "authors": [ + "Woong Oh Cho", + "In Cho", + "Seoha Kim", + "Jeongmin Bae", + "Youngjung Uh", + "Seon Joo Kim" + ], + "abstract": "Existing 4D Gaussian methods for dynamic scene reconstruction offer high visual fidelity and fast rendering. However, these methods suffer from excessive memory and storage demands, which limits their practical deployment. This paper proposes a 4D anchor-based framework that retains visual quality and rendering speed of 4D Gaussians while significantly reducing storage costs. Our method extends 3D scaffolding to 4D space, and leverages sparse 4D grid-aligned anchors with compressed feature vectors. Each anchor models a set of neural 4D Gaussians, each of which represent a local spatiotemporal region. In addition, we introduce a temporal coverage-aware anchor growing strategy to effectively assign additional anchors to under-reconstructed dynamic regions. Our method adjusts the accumulated gradients based on Gaussians' temporal coverage, improving reconstruction quality in dynamic regions. To reduce the number of anchors, we further present enhanced formulations of neural 4D Gaussians. These include the neural velocity, and the temporal opacity derived from a generalized Gaussian distribution. Experimental results demonstrate that our method achieves state-of-the-art visual quality and 97.8% storage reduction over 4DGS.", + "arxiv_url": "http://arxiv.org/abs/2411.17044v1", + "pdf_url": "http://arxiv.org/pdf/2411.17044v1", + "published_date": "2024-11-26", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "G2SDF: Surface Reconstruction from Explicit Gaussians with Implicit SDFs", + "authors": [ + "Kunyi Li", + "Michael Niemeyer", + "Zeyu Chen", + "Nassir Navab", + "Federico Tombari" + ], + "abstract": "State-of-the-art novel view synthesis methods such as 3D Gaussian Splatting (3DGS) achieve remarkable visual quality. While 3DGS and its variants can be rendered efficiently using rasterization, many tasks require access to the underlying 3D surface, which remains challenging to extract due to the sparse and explicit nature of this representation. In this paper, we introduce G2SDF, a novel approach that addresses this limitation by integrating a neural implicit Signed Distance Field (SDF) into the Gaussian Splatting framework. Our method links the opacity values of Gaussians with their distances to the surface, ensuring a closer alignment of Gaussians with the scene surface. To extend this approach to unbounded scenes at varying scales, we propose a normalization function that maps any range to a fixed interval. To further enhance reconstruction quality, we leverage an off-the-shelf depth estimator as pseudo ground truth during Gaussian Splatting optimization. By establishing a differentiable connection between the explicit Gaussians and the implicit SDF, our approach enables high-quality surface reconstruction and rendering. Experimental results on several real-world datasets demonstrate that G2SDF achieves superior reconstruction quality than prior works while maintaining the efficiency of 3DGS.", + "arxiv_url": "http://arxiv.org/abs/2411.16898v1", + "pdf_url": "http://arxiv.org/pdf/2411.16898v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PreF3R: Pose-Free Feed-Forward 3D Gaussian Splatting from Variable-length Image Sequence", + "authors": [ + "Zequn Chen", + "Jiezhi Yang", + "Heng Yang" + ], + "abstract": "We present PreF3R, Pose-Free Feed-forward 3D Reconstruction from an image sequence of variable length. Unlike previous approaches, PreF3R removes the need for camera calibration and reconstructs the 3D Gaussian field within a canonical coordinate frame directly from a sequence of unposed images, enabling efficient novel-view rendering. We leverage DUSt3R's ability for pair-wise 3D structure reconstruction, and extend it to sequential multi-view input via a spatial memory network, eliminating the need for optimization-based global alignment. Additionally, PreF3R incorporates a dense Gaussian parameter prediction head, which enables subsequent novel-view synthesis with differentiable rasterization. This allows supervising our model with the combination of photometric loss and pointmap regression loss, enhancing both photorealism and structural accuracy. Given a sequence of ordered images, PreF3R incrementally reconstructs the 3D Gaussian field at 20 FPS, therefore enabling real-time novel-view rendering. Empirical experiments demonstrate that PreF3R is an effective solution for the challenging task of pose-free feed-forward novel-view synthesis, while also exhibiting robust generalization to unseen scenes.", + "arxiv_url": "http://arxiv.org/abs/2411.16877v1", + "pdf_url": "http://arxiv.org/pdf/2411.16877v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving", + "authors": [ + "Georg Hess", + "Carl Lindström", + "Maryam Fatemi", + "Christoffer Petersson", + "Lennart Svensson" + ], + "abstract": "Ensuring the safety of autonomous robots, such as self-driving vehicles, requires extensive testing across diverse driving scenarios. Simulation is a key ingredient for conducting such testing in a cost-effective and scalable way. Neural rendering methods have gained popularity, as they can build simulation environments from collected logs in a data-driven manner. However, existing neural radiance field (NeRF) methods for sensor-realistic rendering of camera and lidar data suffer from low rendering speeds, limiting their applicability for large-scale testing. While 3D Gaussian Splatting (3DGS) enables real-time rendering, current methods are limited to camera data and are unable to render lidar data essential for autonomous driving. To address these limitations, we propose SplatAD, the first 3DGS-based method for realistic, real-time rendering of dynamic scenes for both camera and lidar data. SplatAD accurately models key sensor-specific phenomena such as rolling shutter effects, lidar intensity, and lidar ray dropouts, using purpose-built algorithms to optimize rendering efficiency. Evaluation across three autonomous driving datasets demonstrates that SplatAD achieves state-of-the-art rendering quality with up to +2 PSNR for NVS and +3 PSNR for reconstruction while increasing rendering speed over NeRF-based methods by an order of magnitude. See https://research.zenseact.com/publications/splatad/ for our project page.", + "arxiv_url": "http://arxiv.org/abs/2411.16816v2", + "pdf_url": "http://arxiv.org/pdf/2411.16816v2", + "published_date": "2024-11-25", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis", + "authors": [ + "Hyojun Go", + "Byeongjun Park", + "Jiho Jang", + "Jin-Young Kim", + "Soonwoo Kwon", + "Changick Kim" + ], + "abstract": "Text-based generation and editing of 3D scenes hold significant potential for streamlining content creation through intuitive user interactions. While recent advances leverage 3D Gaussian Splatting (3DGS) for high-fidelity and real-time rendering, existing methods are often specialized and task-focused, lacking a unified framework for both generation and editing. In this paper, we introduce SplatFlow, a comprehensive framework that addresses this gap by enabling direct 3DGS generation and editing. SplatFlow comprises two main components: a multi-view rectified flow (RF) model and a Gaussian Splatting Decoder (GSDecoder). The multi-view RF model operates in latent space, generating multi-view images, depths, and camera poses simultaneously, conditioned on text prompts, thus addressing challenges like diverse scene scales and complex camera trajectories in real-world settings. Then, the GSDecoder efficiently translates these latent outputs into 3DGS representations through a feed-forward 3DGS method. Leveraging training-free inversion and inpainting techniques, SplatFlow enables seamless 3DGS editing and supports a broad range of 3D tasks-including object editing, novel view synthesis, and camera pose estimation-within a unified framework without requiring additional complex pipelines. We validate SplatFlow's capabilities on the MVImgNet and DL3DV-7K datasets, demonstrating its versatility and effectiveness in various 3D generation, editing, and inpainting-based tasks.", + "arxiv_url": "http://arxiv.org/abs/2411.16443v1", + "pdf_url": "http://arxiv.org/pdf/2411.16443v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Quadratic Gaussian Splatting for Efficient and Detailed Surface Reconstruction", + "authors": [ + "Ziyu Zhang", + "Binbin Huang", + "Hanqing Jiang", + "Liyang Zhou", + "Xiaojun Xiang", + "Shunhan Shen" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has attracted attention for its superior rendering quality and speed over Neural Radiance Fields (NeRF). To address 3DGS's limitations in surface representation, 2D Gaussian Splatting (2DGS) introduced disks as scene primitives to model and reconstruct geometries from multi-view images, offering view-consistent geometry. However, the disk's first-order linear approximation often leads to over-smoothed results. We propose Quadratic Gaussian Splatting (QGS), a novel method that replaces disks with quadric surfaces, enhancing geometric fitting, whose code will be open-sourced. QGS defines Gaussian distributions in non-Euclidean space, allowing primitives to capture more complex textures. As a second-order surface approximation, QGS also renders spatial curvature to guide the normal consistency term, to effectively reduce over-smoothing. Moreover, QGS is a generalized version of 2DGS that achieves more accurate and detailed reconstructions, as verified by experiments on DTU and TNT, demonstrating its effectiveness in surpassing current state-of-the-art methods in geometry reconstruction. Our code willbe released as open source.", + "arxiv_url": "http://arxiv.org/abs/2411.16392v1", + "pdf_url": "http://arxiv.org/pdf/2411.16392v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM", + "authors": [ + "Vladimir Yugay", + "Theo Gevers", + "Martin R. Oswald" + ], + "abstract": "Simultaneous localization and mapping (SLAM) systems with novel view synthesis capabilities are widely used in computer vision, with applications in augmented reality, robotics, and autonomous driving. However, existing approaches are limited to single-agent operation. Recent work has addressed this problem using a distributed neural scene representation. Unfortunately, existing methods are slow, cannot accurately render real-world data, are restricted to two agents, and have limited tracking accuracy. In contrast, we propose a rigidly deformable 3D Gaussian-based scene representation that dramatically speeds up the system. However, improving tracking accuracy and reconstructing a globally consistent map from multiple agents remains challenging due to trajectory drift and discrepancies across agents' observations. Therefore, we propose new tracking and map-merging mechanisms and integrate loop closure in the Gaussian-based SLAM pipeline. We evaluate MAGiC-SLAM on synthetic and real-world datasets and find it more accurate and faster than the state of the art.", + "arxiv_url": "http://arxiv.org/abs/2411.16785v1", + "pdf_url": "http://arxiv.org/pdf/2411.16785v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV", + "cs.AI", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Event-boosted Deformable 3D Gaussians for Fast Dynamic Scene Reconstruction", + "authors": [ + "Wenhao Xu", + "Wenming Weng", + "Yueyi Zhang", + "Ruikang Xu", + "Zhiwei Xiong" + ], + "abstract": "3D Gaussian Splatting (3D-GS) enables real-time rendering but struggles with fast motion due to low temporal resolution of RGB cameras. To address this, we introduce the first approach combining event cameras, which capture high-temporal-resolution, continuous motion data, with deformable 3D-GS for fast dynamic scene reconstruction. We observe that threshold modeling for events plays a crucial role in achieving high-quality reconstruction. Therefore, we propose a GS-Threshold Joint Modeling (GTJM) strategy, creating a mutually reinforcing process that greatly improves both 3D reconstruction and threshold modeling. Moreover, we introduce a Dynamic-Static Decomposition (DSD) strategy that first identifies dynamic areas by exploiting the inability of static Gaussians to represent motions, then applies a buffer-based soft decomposition to separate dynamic and static areas. This strategy accelerates rendering by avoiding unnecessary deformation in static areas, and focuses on dynamic areas to enhance fidelity. Our approach achieves high-fidelity dynamic reconstruction at 156 FPS with a 400$\\times$400 resolution on an RTX 3090 GPU.", + "arxiv_url": "http://arxiv.org/abs/2411.16180v1", + "pdf_url": "http://arxiv.org/pdf/2411.16180v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model", + "authors": [ + "Jinpeng Liu", + "Jiale Xu", + "Weihao Cheng", + "Yiming Gao", + "Xintao Wang", + "Ying Shan", + "Yansong Tang" + ], + "abstract": "We introduce NovelGS, a diffusion model for Gaussian Splatting (GS) given sparse-view images. Recent works leverage feed-forward networks to generate pixel-aligned Gaussians, which could be fast rendered. Unfortunately, the method was unable to produce satisfactory results for areas not covered by the input images due to the formulation of these methods. In contrast, we leverage the novel view denoising through a transformer-based network to generate 3D Gaussians. Specifically, by incorporating both conditional views and noisy target views, the network predicts pixel-aligned Gaussians for each view. During training, the rendered target and some additional views of the Gaussians are supervised. During inference, the target views are iteratively rendered and denoised from pure noise. Our approach demonstrates state-of-the-art performance in addressing the multi-view image reconstruction challenge. Due to generative modeling of unseen regions, NovelGS effectively reconstructs 3D objects with consistent and sharp textures. Experimental results on publicly available datasets indicate that NovelGS substantially surpasses existing image-to-3D frameworks, both qualitatively and quantitatively. We also demonstrate the potential of NovelGS in generative tasks, such as text-to-3D and image-to-3D, by integrating it with existing multiview diffusion models. We will make the code publicly accessible.", + "arxiv_url": "http://arxiv.org/abs/2411.16779v1", + "pdf_url": "http://arxiv.org/pdf/2411.16779v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GAST: Sequential Gaussian Avatars with Hierarchical Spatio-temporal Context", + "authors": [ + "Wangze Xu", + "Yifan Zhan", + "Zhihang Zhong", + "Xiao Sun" + ], + "abstract": "3D human avatars, through the use of canonical radiance fields and per-frame observed warping, enable high-fidelity rendering and animating. However, existing methods, which rely on either spatial SMPL(-X) poses or temporal embeddings, respectively suffer from coarse rendering quality or limited animation flexibility. To address these challenges, we propose GAST, a framework that unifies 3D human modeling with 3DGS by hierarchically integrating both spatial and temporal information. Specifically, we design a sequential conditioning framework for the non-rigid warping of the human body, under whose guidance more accurate 3D Gaussians can be obtained in the observation space. Moreover, the explicit properties of Gaussians allow us to embed richer sequential information, encompassing both the coarse sequence of human poses and finer per-vertex motion details. These sequence conditions are further sampled across different temporal scales, in a coarse-to-fine manner, ensuring unbiased inputs for non-rigid warping. Experimental results demonstrate that our method combined with hierarchical spatio-temporal modeling surpasses concurrent baselines, delivering both high-quality rendering and flexible animating capabilities.", + "arxiv_url": "http://arxiv.org/abs/2411.16768v1", + "pdf_url": "http://arxiv.org/pdf/2411.16768v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation", + "authors": [ + "Guangzhao Dai", + "Jian Zhao", + "Yuantao Chen", + "Yusen Qin", + "Hao Zhao", + "Guosen Xie", + "Yazhou Yao", + "Xiangbo Shu", + "Xuelong Li" + ], + "abstract": "Vision-and-Language Navigation (VLN), where an agent follows instructions to reach a target destination, has recently seen significant advancements. In contrast to navigation in discrete environments with predefined trajectories, VLN in Continuous Environments (VLN-CE) presents greater challenges, as the agent is free to navigate any unobstructed location and is more vulnerable to visual occlusions or blind spots. Recent approaches have attempted to address this by imagining future environments, either through predicted future visual images or semantic features, rather than relying solely on current observations. However, these RGB-based and feature-based methods lack intuitive appearance-level information or high-level semantic complexity crucial for effective navigation. To overcome these limitations, we introduce a novel, generalizable 3DGS-based pre-training paradigm, called UnitedVLN, which enables agents to better explore future environments by unitedly rendering high-fidelity 360 visual images and semantic features. UnitedVLN employs two key schemes: search-then-query sampling and separate-then-united rendering, which facilitate efficient exploitation of neural primitives, helping to integrate both appearance and semantic information for more robust navigation. Extensive experiments demonstrate that UnitedVLN outperforms state-of-the-art methods on existing VLN-CE benchmarks.", + "arxiv_url": "http://arxiv.org/abs/2411.16053v1", + "pdf_url": "http://arxiv.org/pdf/2411.16053v1", + "published_date": "2024-11-25", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PG-SLAM: Photo-realistic and Geometry-aware RGB-D SLAM in Dynamic Environments", + "authors": [ + "Haoang Li", + "Xiangqi Meng", + "Xingxing Zuo", + "Zhe Liu", + "Hesheng Wang", + "Daniel Cremers" + ], + "abstract": "Simultaneous localization and mapping (SLAM) has achieved impressive performance in static environments. However, SLAM in dynamic environments remains an open question. Many methods directly filter out dynamic objects, resulting in incomplete scene reconstruction and limited accuracy of camera localization. The other works express dynamic objects by point clouds, sparse joints, or coarse meshes, which fails to provide a photo-realistic representation. To overcome the above limitations, we propose a photo-realistic and geometry-aware RGB-D SLAM method by extending Gaussian splatting. Our method is composed of three main modules to 1) map the dynamic foreground including non-rigid humans and rigid items, 2) reconstruct the static background, and 3) localize the camera. To map the foreground, we focus on modeling the deformations and/or motions. We consider the shape priors of humans and exploit geometric and appearance constraints of humans and items. For background mapping, we design an optimization strategy between neighboring local maps by integrating appearance constraint into geometric alignment. As to camera localization, we leverage both static background and dynamic foreground to increase the observations for noise compensation. We explore the geometric and appearance constraints by associating 3D Gaussians with 2D optical flows and pixel patches. Experiments on various real-world datasets demonstrate that our method outperforms state-of-the-art approaches in terms of camera localization and scene representation. Source codes will be publicly available upon paper acceptance.", + "arxiv_url": "http://arxiv.org/abs/2411.15800v1", + "pdf_url": "http://arxiv.org/pdf/2411.15800v1", + "published_date": "2024-11-24", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ZeroGS: Training 3D Gaussian Splatting from Unposed Images", + "authors": [ + "Yu Chen", + "Rolandos Alexandros Potamias", + "Evangelos Ververas", + "Jifei Song", + "Jiankang Deng", + "Gim Hee Lee" + ], + "abstract": "Neural radiance fields (NeRF) and 3D Gaussian Splatting (3DGS) are popular techniques to reconstruct and render photo-realistic images. However, the pre-requisite of running Structure-from-Motion (SfM) to get camera poses limits their completeness. While previous methods can reconstruct from a few unposed images, they are not applicable when images are unordered or densely captured. In this work, we propose ZeroGS to train 3DGS from hundreds of unposed and unordered images. Our method leverages a pretrained foundation model as the neural scene representation. Since the accuracy of the predicted pointmaps does not suffice for accurate image registration and high-fidelity image rendering, we propose to mitigate the issue by initializing and finetuning the pretrained model from a seed image. Images are then progressively registered and added to the training buffer, which is further used to train the model. We also propose to refine the camera poses and pointmaps by minimizing a point-to-camera ray consistency loss across multiple views. Experiments on the LLFF dataset, the MipNeRF360 dataset, and the Tanks-and-Temples dataset show that our method recovers more accurate camera poses than state-of-the-art pose-free NeRF/3DGS methods, and even renders higher quality images than 3DGS with COLMAP poses. Our project page is available at https://aibluefisher.github.io/ZeroGS.", + "arxiv_url": "http://arxiv.org/abs/2411.15779v1", + "pdf_url": "http://arxiv.org/pdf/2411.15779v1", + "published_date": "2024-11-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Bundle Adjusted Gaussian Avatars Deblurring", + "authors": [ + "Muyao Niu", + "Yifan Zhan", + "Qingtian Zhu", + "Zhuoxiao Li", + "Wei Wang", + "Zhihang Zhong", + "Xiao Sun", + "Yinqiang Zheng" + ], + "abstract": "The development of 3D human avatars from multi-view videos represents a significant yet challenging task in the field. Recent advancements, including 3D Gaussian Splattings (3DGS), have markedly progressed this domain. Nonetheless, existing techniques necessitate the use of high-quality sharp images, which are often impractical to obtain in real-world settings due to variations in human motion speed and intensity. In this study, we attempt to explore deriving sharp intrinsic 3D human Gaussian avatars from blurry video footage in an end-to-end manner. Our approach encompasses a 3D-aware, physics-oriented model of blur formation attributable to human movement, coupled with a 3D human motion model to clarify ambiguities found in motion-induced blurry images. This methodology facilitates the concurrent learning of avatar model parameters and the refinement of sub-frame motion parameters from a coarse initialization. We have established benchmarks for this task through a synthetic dataset derived from existing multi-view captures, alongside a real-captured dataset acquired through a 360-degree synchronous hybrid-exposure camera system. Comprehensive evaluations demonstrate that our model surpasses existing baselines.", + "arxiv_url": "http://arxiv.org/abs/2411.16758v1", + "pdf_url": "http://arxiv.org/pdf/2411.16758v1", + "published_date": "2024-11-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DynamicAvatars: Accurate Dynamic Facial Avatars Reconstruction and Precise Editing with Diffusion Models", + "authors": [ + "Yangyang Qian", + "Yuan Sun", + "Yu Guo" + ], + "abstract": "Generating and editing dynamic 3D head avatars are crucial tasks in virtual reality and film production. However, existing methods often suffer from facial distortions, inaccurate head movements, and limited fine-grained editing capabilities. To address these challenges, we present DynamicAvatars, a dynamic model that generates photorealistic, moving 3D head avatars from video clips and parameters associated with facial positions and expressions. Our approach enables precise editing through a novel prompt-based editing model, which integrates user-provided prompts with guiding parameters derived from large language models (LLMs). To achieve this, we propose a dual-tracking framework based on Gaussian Splatting and introduce a prompt preprocessing module to enhance editing stability. By incorporating a specialized GAN algorithm and connecting it to our control module, which generates precise guiding parameters from LLMs, we successfully address the limitations of existing methods. Additionally, we develop a dynamic editing strategy that selectively utilizes specific training datasets to improve the efficiency and adaptability of the model for dynamic editing tasks.", + "arxiv_url": "http://arxiv.org/abs/2411.15732v1", + "pdf_url": "http://arxiv.org/pdf/2411.15732v1", + "published_date": "2024-11-24", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSurf: 3D Reconstruction via Signed Distance Fields with Direct Gaussian Supervision", + "authors": [ + "Baixin Xu", + "Jiangbei Hu", + "Jiaze Li", + "Ying He" + ], + "abstract": "Surface reconstruction from multi-view images is a core challenge in 3D vision. Recent studies have explored signed distance fields (SDF) within Neural Radiance Fields (NeRF) to achieve high-fidelity surface reconstructions. However, these approaches often suffer from slow training and rendering speeds compared to 3D Gaussian splatting (3DGS). Current state-of-the-art techniques attempt to fuse depth information to extract geometry from 3DGS, but frequently result in incomplete reconstructions and fragmented surfaces. In this paper, we introduce GSurf, a novel end-to-end method for learning a signed distance field directly from Gaussian primitives. The continuous and smooth nature of SDF addresses common issues in the 3DGS family, such as holes resulting from noisy or missing depth data. By using Gaussian splatting for rendering, GSurf avoids the redundant volume rendering typically required in other GS and SDF integrations. Consequently, GSurf achieves faster training and rendering speeds while delivering 3D reconstruction quality comparable to neural implicit surface methods, such as VolSDF and NeuS. Experimental results across various benchmark datasets demonstrate the effectiveness of our method in producing high-fidelity 3D reconstructions.", + "arxiv_url": "http://arxiv.org/abs/2411.15723v2", + "pdf_url": "http://arxiv.org/pdf/2411.15723v2", + "published_date": "2024-11-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting", + "authors": [ + "Xiaobao Wei", + "Qingpo Wuwu", + "Zhongyu Zhao", + "Zhuangzhe Wu", + "Nan Huang", + "Ming Lu", + "Ningning MA", + "Shanghang Zhang" + ], + "abstract": "Photorealistic reconstruction of street scenes is essential for developing real-world simulators in autonomous driving. While recent methods based on 3D/4D Gaussian Splatting (GS) have demonstrated promising results, they still encounter challenges in complex street scenes due to the unpredictable motion of dynamic objects. Current methods typically decompose street scenes into static and dynamic objects, learning the Gaussians in either a supervised manner (e.g., w/ 3D bounding-box) or a self-supervised manner (e.g., w/o 3D bounding-box). However, these approaches do not effectively model the motions of dynamic objects (e.g., the motion speed of pedestrians is clearly different from that of vehicles), resulting in suboptimal scene decomposition. To address this, we propose Explicit Motion Decomposition (EMD), which models the motions of dynamic objects by introducing learnable motion embeddings to the Gaussians, enhancing the decomposition in street scenes. The proposed EMD is a plug-and-play approach applicable to various baseline methods. We also propose tailored training strategies to apply EMD to both supervised and self-supervised baselines. Through comprehensive experimentation, we illustrate the effectiveness of our approach with various established baselines. The code will be released at: https://qingpowuwu.github.io/emdgaussian.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2411.15582v1", + "pdf_url": "http://arxiv.org/pdf/2411.15582v1", + "published_date": "2024-11-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving", + "authors": [ + "Su Sun", + "Cheng Zhao", + "Zhuoyang Sun", + "Yingjie Victor Chen", + "Mei Chen" + ], + "abstract": "Most existing Dynamic Gaussian Splatting methods for complex dynamic urban scenarios rely on accurate object-level supervision from expensive manual labeling, limiting their scalability in real-world applications. In this paper, we introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Neural Motion Flow Fields (NMFF) to learn 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB, depth and flow synthesis. SplatFlow designs a unified framework to seamlessly integrate time-dependent 4D Gaussian representation within NMFF, where NMFF is a set of implicit functions to model temporal motions of both LiDAR points and Gaussians as continuous motion flow fields. Leveraging NMFF, SplatFlow effectively decomposes static background and dynamic objects, representing them with 3D and 4D Gaussian primitives, respectively. NMFF also models the status correspondences of each 4D Gaussian across time, which aggregates temporal features to enhance cross-view consistency of dynamic components. SplatFlow further improves dynamic scene identification by distilling features from 2D foundational models into 4D space-time representation. Comprehensive evaluations conducted on the Waymo Open Dataset and KITTI Dataset validate SplatFlow's state-of-the-art (SOTA) performance for both image reconstruction and novel view synthesis in dynamic urban scenarios.", + "arxiv_url": "http://arxiv.org/abs/2411.15482v1", + "pdf_url": "http://arxiv.org/pdf/2411.15482v1", + "published_date": "2024-11-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gassidy: Gaussian Splatting SLAM in Dynamic Environments", + "authors": [ + "Long Wen", + "Shixin Li", + "Yu Zhang", + "Yuhong Huang", + "Jianjie Lin", + "Fengjunjie Pan", + "Zhenshan Bing", + "Alois Knoll" + ], + "abstract": "3D Gaussian Splatting (3DGS) allows flexible adjustments to scene representation, enabling continuous optimization of scene quality during dense visual simultaneous localization and mapping (SLAM) in static environments. However, 3DGS faces challenges in handling environmental disturbances from dynamic objects with irregular movement, leading to degradation in both camera tracking accuracy and map reconstruction quality. To address this challenge, we develop an RGB-D dense SLAM which is called Gaussian Splatting SLAM in Dynamic Environments (Gassidy). This approach calculates Gaussians to generate rendering loss flows for each environmental component based on a designed photometric-geometric loss function. To distinguish and filter environmental disturbances, we iteratively analyze rendering loss flows to detect features characterized by changes in loss values between dynamic objects and static components. This process ensures a clean environment for accurate scene reconstruction. Compared to state-of-the-art SLAM methods, experimental results on open datasets show that Gassidy improves camera tracking precision by up to 97.9% and enhances map quality by up to 6%.", + "arxiv_url": "http://arxiv.org/abs/2411.15476v1", + "pdf_url": "http://arxiv.org/pdf/2411.15476v1", + "published_date": "2024-11-23", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion", + "authors": [ + "Runfa Blark Li", + "Keito Suzuki", + "Bang Du", + "Ki Myung Brian Le", + "Nikolay Atanasov", + "Truong Nguyen" + ], + "abstract": "A signed distance function (SDF) is a useful representation for continuous-space geometry and many related operations, including rendering, collision checking, and mesh generation. Hence, reconstructing SDF from image observations accurately and efficiently is a fundamental problem. Recently, neural implicit SDF (SDF-NeRF) techniques, trained using volumetric rendering, have gained a lot of attention. Compared to earlier truncated SDF (TSDF) fusion algorithms that rely on depth maps and voxelize continuous space, SDF-NeRF enables continuous-space SDF reconstruction with better geometric and photometric accuracy. However, the accuracy and convergence speed of scene-level SDF reconstruction require further improvements for many applications. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, several works have focused on improving SDF-NeRF by introducing consistency losses on depth and surface normals between 3DGS and SDF-NeRF. However, loss-level connections alone lead to incremental improvements. We propose a novel neural implicit SDF called \"SplatSDF\" to fuse 3DGSandSDF-NeRF at an architecture level with significant boosts to geometric and photometric accuracy and convergence speed. Our SplatSDF relies on 3DGS as input only during training, and keeps the same complexity and efficiency as the original SDF-NeRF during inference. Our method outperforms state-of-the-art SDF-NeRF models on geometric and photometric evaluation by the time of submission.", + "arxiv_url": "http://arxiv.org/abs/2411.15468v1", + "pdf_url": "http://arxiv.org/pdf/2411.15468v1", + "published_date": "2024-11-23", + "categories": [ + "cs.CV", + "cs.GR", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "UniGaussian: Driving Scene Reconstruction from Multiple Camera Models via Unified Gaussian Representations", + "authors": [ + "Yuan Ren", + "Guile Wu", + "Runhao Li", + "Zheyuan Yang", + "Yibo Liu", + "Xingxin Chen", + "Tongtong Cao", + "Bingbing Liu" + ], + "abstract": "Urban scene reconstruction is crucial for real-world autonomous driving simulators. Although existing methods have achieved photorealistic reconstruction, they mostly focus on pinhole cameras and neglect fisheye cameras. In fact, how to effectively simulate fisheye cameras in driving scene remains an unsolved problem. In this work, we propose UniGaussian, a novel approach that learns a unified 3D Gaussian representation from multiple camera models for urban scene reconstruction in autonomous driving. Our contributions are two-fold. First, we propose a new differentiable rendering method that distorts 3D Gaussians using a series of affine transformations tailored to fisheye camera models. This addresses the compatibility issue of 3D Gaussian splatting with fisheye cameras, which is hindered by light ray distortion caused by lenses or mirrors. Besides, our method maintains real-time rendering while ensuring differentiability. Second, built on the differentiable rendering method, we design a new framework that learns a unified Gaussian representation from multiple camera models. By applying affine transformations to adapt different camera models and regularizing the shared Gaussians with supervision from different modalities, our framework learns a unified 3D Gaussian representation with input data from multiple sources and achieves holistic driving scene understanding. As a result, our approach models multiple sensors (pinhole and fisheye cameras) and modalities (depth, semantic, normal and LiDAR point clouds). Our experiments show that our method achieves superior rendering quality and fast rendering speed for driving scene simulation.", + "arxiv_url": "http://arxiv.org/abs/2411.15355v1", + "pdf_url": "http://arxiv.org/pdf/2411.15355v1", + "published_date": "2024-11-22", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Neural 4D Evolution under Large Topological Changes from 2D Images", + "authors": [ + "AmirHossein Naghi Razlighi", + "Tiago Novello", + "Asen Nachkov", + "Thomas Probst", + "Danda Paudel" + ], + "abstract": "In the literature, it has been shown that the evolution of the known explicit 3D surface to the target one can be learned from 2D images using the instantaneous flow field, where the known and target 3D surfaces may largely differ in topology. We are interested in capturing 4D shapes whose topology changes largely over time. We encounter that the straightforward extension of the existing 3D-based method to the desired 4D case performs poorly. In this work, we address the challenges in extending 3D neural evolution to 4D under large topological changes by proposing two novel modifications. More precisely, we introduce (i) a new architecture to discretize and encode the deformation and learn the SDF and (ii) a technique to impose the temporal consistency. (iii) Also, we propose a rendering scheme for color prediction based on Gaussian splatting. Furthermore, to facilitate learning directly from 2D images, we propose a learning framework that can disentangle the geometry and appearance from RGB images. This method of disentanglement, while also useful for the 4D evolution problem that we are concentrating on, is also novel and valid for static scenes. Our extensive experiments on various data provide awesome results and, most importantly, open a new approach toward reconstructing challenging scenes with significant topological changes and deformations. Our source code and the dataset are publicly available at https://github.com/insait-institute/N4DE.", + "arxiv_url": "http://arxiv.org/abs/2411.15018v1", + "pdf_url": "http://arxiv.org/pdf/2411.15018v1", + "published_date": "2024-11-22", + "categories": [ + "cs.CV", + "I.4.5; I.3.5" + ], + "github_url": "https://github.com/insait-institute/N4DE", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes", + "authors": [ + "Jan Held", + "Renaud Vandeghen", + "Abdullah Hamdi", + "Adrien Deliege", + "Anthony Cioppa", + "Silvio Giancola", + "Andrea Vedaldi", + "Bernard Ghanem", + "Marc Van Droogenbroeck" + ], + "abstract": "Recent advances in radiance field reconstruction, such as 3D Gaussian Splatting (3DGS), have achieved high-quality novel view synthesis and fast rendering by representing scenes with compositions of Gaussian primitives. However, 3D Gaussians present several limitations for scene reconstruction. Accurately capturing hard edges is challenging without significantly increasing the number of Gaussians, creating a large memory footprint. Moreover, they struggle to represent flat surfaces, as they are diffused in space. Without hand-crafted regularizers, they tend to disperse irregularly around the actual surface. To circumvent these issues, we introduce a novel method, named 3D Convex Splatting (3DCS), which leverages 3D smooth convexes as primitives for modeling geometrically-meaningful radiance fields from multi-view images. Smooth convex shapes offer greater flexibility than Gaussians, allowing for a better representation of 3D scenes with hard edges and dense volumes using fewer primitives. Powered by our efficient CUDA-based rasterizer, 3DCS achieves superior performance over 3DGS on benchmarks such as Mip-NeRF360, Tanks and Temples, and Deep Blending. Specifically, our method attains an improvement of up to 0.81 in PSNR and 0.026 in LPIPS compared to 3DGS while maintaining high rendering speeds and reducing the number of required primitives. Our results highlight the potential of 3D Convex Splatting to become the new standard for high-quality scene reconstruction and novel view synthesis. Project page: convexsplatting.github.io.", + "arxiv_url": "http://arxiv.org/abs/2411.14974v2", + "pdf_url": "http://arxiv.org/pdf/2411.14974v2", + "published_date": "2024-11-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dynamics-Aware Gaussian Splatting Streaming Towards Fast On-the-Fly Training for 4D Reconstruction", + "authors": [ + "Zhening Liu", + "Yingdong Hu", + "Xinjie Zhang", + "Jiawei Shao", + "Zehong Lin", + "Jun Zhang" + ], + "abstract": "The recent development of 3D Gaussian Splatting (3DGS) has led to great interest in 4D dynamic spatial reconstruction from multi-view visual inputs. While existing approaches mainly rely on processing full-length multi-view videos for 4D reconstruction, there has been limited exploration of iterative online reconstruction methods that enable on-the-fly training and per-frame streaming. Current 3DGS-based streaming methods treat the Gaussian primitives uniformly and constantly renew the densified Gaussians, thereby overlooking the difference between dynamic and static features and also neglecting the temporal continuity in the scene. To address these limitations, we propose a novel three-stage pipeline for iterative streamable 4D dynamic spatial reconstruction. Our pipeline comprises a selective inheritance stage to preserve temporal continuity, a dynamics-aware shift stage for distinguishing dynamic and static primitives and optimizing their movements, and an error-guided densification stage to accommodate emerging objects. Our method achieves state-of-the-art performance in online 4D reconstruction, demonstrating a 20% improvement in on-the-fly training speed, superior representation quality, and real-time rendering capability. Project page: https://www.liuzhening.top/DASS", + "arxiv_url": "http://arxiv.org/abs/2411.14847v1", + "pdf_url": "http://arxiv.org/pdf/2411.14847v1", + "published_date": "2024-11-22", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving", + "authors": [ + "Haiming Zhang", + "Wending Zhou", + "Yiyao Zhu", + "Xu Yan", + "Jiantao Gao", + "Dongfeng Bai", + "Yingjie Cai", + "Bingbing Liu", + "Shuguang Cui", + "Zhen Li" + ], + "abstract": "This paper introduces VisionPAD, a novel self-supervised pre-training paradigm designed for vision-centric algorithms in autonomous driving. In contrast to previous approaches that employ neural rendering with explicit depth supervision, VisionPAD utilizes more efficient 3D Gaussian Splatting to reconstruct multi-view representations using only images as supervision. Specifically, we introduce a self-supervised method for voxel velocity estimation. By warping voxels to adjacent frames and supervising the rendered outputs, the model effectively learns motion cues in the sequential data. Furthermore, we adopt a multi-frame photometric consistency approach to enhance geometric perception. It projects adjacent frames to the current frame based on rendered depths and relative poses, boosting the 3D geometric representation through pure image supervision. Extensive experiments on autonomous driving datasets demonstrate that VisionPAD significantly improves performance in 3D object detection, occupancy prediction and map segmentation, surpassing state-of-the-art pre-training strategies by a considerable margin.", + "arxiv_url": "http://arxiv.org/abs/2411.14716v1", + "pdf_url": "http://arxiv.org/pdf/2411.14716v1", + "published_date": "2024-11-22", + "categories": [ + "cs.CV", + "cs.LG", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation", + "authors": [ + "Zhuoman Liu", + "Weicai Ye", + "Yan Luximon", + "Pengfei Wan", + "Di Zhang" + ], + "abstract": "Realistic simulation of dynamic scenes requires accurately capturing diverse material properties and modeling complex object interactions grounded in physical principles. However, existing methods are constrained to basic material types with limited predictable parameters, making them insufficient to represent the complexity of real-world materials. We introduce a novel approach that leverages multi-modal foundation models and video diffusion to achieve enhanced 4D dynamic scene simulation. Our method utilizes multi-modal models to identify material types and initialize material parameters through image queries, while simultaneously inferring 3D Gaussian splats for detailed scene representation. We further refine these material parameters using video diffusion with a differentiable Material Point Method (MPM) and optical flow guidance rather than render loss or Score Distillation Sampling (SDS) loss. This integrated framework enables accurate prediction and realistic simulation of dynamic interactions in real-world scenarios, advancing both accuracy and flexibility in physics-based simulations.", + "arxiv_url": "http://arxiv.org/abs/2411.14423v1", + "pdf_url": "http://arxiv.org/pdf/2411.14423v1", + "published_date": "2024-11-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation", + "authors": [ + "Yuanhao Cai", + "He Zhang", + "Kai Zhang", + "Yixun Liang", + "Mengwei Ren", + "Fujun Luan", + "Qing Liu", + "Soo Ye Kim", + "Jianming Zhang", + "Zhifei Zhang", + "Yuqian Zhou", + "Zhe Lin", + "Alan Yuille" + ], + "abstract": "Existing feed-forward image-to-3D methods mainly rely on 2D multi-view diffusion models that cannot guarantee 3D consistency. These methods easily collapse when changing the prompt view direction and mainly handle object-centric prompt images. In this paper, we propose a novel single-stage 3D diffusion model, DiffusionGS, for object and scene generation from a single view. DiffusionGS directly outputs 3D Gaussian point clouds at each timestep to enforce view consistency and allow the model to generate robustly given prompt views of any directions, beyond object-centric inputs. Plus, to improve the capability and generalization ability of DiffusionGS, we scale up 3D training data by developing a scene-object mixed training strategy. Experiments show that our method enjoys better generation quality (2.20 dB higher in PSNR and 23.25 lower in FID) and over 5x faster speed (~6s on an A100 GPU) than SOTA methods. The user study and text-to-3D applications also reveals the practical values of our method. Our Project page at https://caiyuanhao1998.github.io/project/DiffusionGS/ shows the video and interactive generation results.", + "arxiv_url": "http://arxiv.org/abs/2411.14384v2", + "pdf_url": "http://arxiv.org/pdf/2411.14384v2", + "published_date": "2024-11-21", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatR : Experience Goal Visual Rearrangement with 3D Gaussian Splatting and Dense Feature Matching", + "authors": [ + "Arjun P S", + "Andrew Melnik", + "Gora Chand Nandi" + ], + "abstract": "Experience Goal Visual Rearrangement task stands as a foundational challenge within Embodied AI, requiring an agent to construct a robust world model that accurately captures the goal state. The agent uses this world model to restore a shuffled scene to its original configuration, making an accurate representation of the world essential for successfully completing the task. In this work, we present a novel framework that leverages on 3D Gaussian Splatting as a 3D scene representation for experience goal visual rearrangement task. Recent advances in volumetric scene representation like 3D Gaussian Splatting, offer fast rendering of high quality and photo-realistic novel views. Our approach enables the agent to have consistent views of the current and the goal setting of the rearrangement task, which enables the agent to directly compare the goal state and the shuffled state of the world in image space. To compare these views, we propose to use a dense feature matching method with visual features extracted from a foundation model, leveraging its advantages of a more universal feature representation, which facilitates robustness, and generalization. We validate our approach on the AI2-THOR rearrangement challenge benchmark and demonstrate improvements over the current state of the art methods", + "arxiv_url": "http://arxiv.org/abs/2411.14322v1", + "pdf_url": "http://arxiv.org/pdf/2411.14322v1", + "published_date": "2024-11-21", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NexusSplats: Efficient 3D Gaussian Splatting in the Wild", + "authors": [ + "Yuzhou Tang", + "Dejun Xu", + "Yongjie Hou", + "Zhenzhong Wang", + "Min Jiang" + ], + "abstract": "While 3D Gaussian Splatting (3DGS) has recently demonstrated remarkable rendering quality and efficiency in 3D scene reconstruction, it struggles with varying lighting conditions and incidental occlusions in real-world scenarios. To accommodate varying lighting conditions, existing 3DGS extensions apply color mapping to the massive Gaussian primitives with individually optimized appearance embeddings. To handle occlusions, they predict pixel-wise uncertainties via 2D image features for occlusion capture. Nevertheless, such massive color mapping and pixel-wise uncertainty prediction strategies suffer from not only additional computational costs but also coarse-grained lighting and occlusion handling. In this work, we propose a nexus kernel-driven approach, termed NexusSplats, for efficient and finer 3D scene reconstruction under complex lighting and occlusion conditions. In particular, NexusSplats leverages a novel light decoupling strategy where appearance embeddings are optimized based on nexus kernels instead of massive Gaussian primitives, thus accelerating reconstruction speeds while ensuring local color consistency for finer textures. Additionally, a Gaussian-wise uncertainty mechanism is developed, aligning 3D structures with 2D image features for fine-grained occlusion handling. Experimental results demonstrate that NexusSplats achieves state-of-the-art rendering quality while reducing reconstruction time by up to 70.4% compared to the current best in quality.", + "arxiv_url": "http://arxiv.org/abs/2411.14514v4", + "pdf_url": "http://arxiv.org/pdf/2411.14514v4", + "published_date": "2024-11-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting", + "authors": [ + "Ola Shorinwa", + "Jiankai Sun", + "Mac Schwager" + ], + "abstract": "We present FAST-Splat for fast, ambiguity-free semantic Gaussian Splatting, which seeks to address the main limitations of existing semantic Gaussian Splatting methods, namely: slow training and rendering speeds; high memory usage; and ambiguous semantic object localization. In deriving FAST-Splat , we formulate open-vocabulary semantic Gaussian Splatting as the problem of extending closed-set semantic distillation to the open-set (open-vocabulary) setting, enabling FAST-Splat to provide precise semantic object localization results, even when prompted with ambiguous user-provided natural-language queries. Further, by exploiting the explicit form of the Gaussian Splatting scene representation to the fullest extent, FAST-Splat retains the remarkable training and rendering speeds of Gaussian Splatting. Specifically, while existing semantic Gaussian Splatting methods distill semantics into a separate neural field or utilize neural models for dimensionality reduction, FAST-Splat directly augments each Gaussian with specific semantic codes, preserving the training, rendering, and memory-usage advantages of Gaussian Splatting over neural field methods. These Gaussian-specific semantic codes, together with a hash-table, enable semantic similarity to be measured with open-vocabulary user prompts and further enable FAST-Splat to respond with unambiguous semantic object labels and 3D masks, unlike prior methods. In experiments, we demonstrate that FAST-Splat is 4x to 6x faster to train with a 13x faster data pre-processing step, achieves between 18x to 75x faster rendering speeds, and requires about 3x smaller GPU memory, compared to the best-competing semantic Gaussian Splatting methods. Further, FAST-Splat achieves relatively similar or better semantic segmentation performance compared to existing methods. After the review period, we will provide links to the project website and the codebase.", + "arxiv_url": "http://arxiv.org/abs/2411.13753v1", + "pdf_url": "http://arxiv.org/pdf/2411.13753v1", + "published_date": "2024-11-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Generating 3D-Consistent Videos from Unposed Internet Photos", + "authors": [ + "Gene Chou", + "Kai Zhang", + "Sai Bi", + "Hao Tan", + "Zexiang Xu", + "Fujun Luan", + "Bharath Hariharan", + "Noah Snavely" + ], + "abstract": "We address the problem of generating videos from unposed internet photos. A handful of input images serve as keyframes, and our model interpolates between them to simulate a path moving between the cameras. Given random images, a model's ability to capture underlying geometry, recognize scene identity, and relate frames in terms of camera position and orientation reflects a fundamental understanding of 3D structure and scene layout. However, existing video models such as Luma Dream Machine fail at this task. We design a self-supervised method that takes advantage of the consistency of videos and variability of multiview internet photos to train a scalable, 3D-aware video model without any 3D annotations such as camera parameters. We validate that our method outperforms all baselines in terms of geometric and appearance consistency. We also show our model benefits applications that enable camera control, such as 3D Gaussian Splatting. Our results suggest that we can scale up scene-level 3D learning using only 2D data such as videos and multiview internet photos.", + "arxiv_url": "http://arxiv.org/abs/2411.13549v1", + "pdf_url": "http://arxiv.org/pdf/2411.13549v1", + "published_date": "2024-11-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting", + "authors": [ + "Xiaobao Wei", + "Peng Chen", + "Guangyu Li", + "Ming Lu", + "Hui Chen", + "Feng Tian" + ], + "abstract": "Gaze estimation encounters generalization challenges when dealing with out-of-distribution data. To address this problem, recent methods use neural radiance fields (NeRF) to generate augmented data. However, existing methods based on NeRF are computationally expensive and lack facial details. 3D Gaussian Splatting (3DGS) has become the prevailing representation of neural fields. While 3DGS has been extensively examined in head avatars, it faces challenges with accurate gaze control and generalization across different subjects. In this work, we propose GazeGaussian, a high-fidelity gaze redirection method that uses a two-stream 3DGS model to represent the face and eye regions separately. By leveraging the unstructured nature of 3DGS, we develop a novel eye representation for rigid eye rotation based on the target gaze direction. To enhance synthesis generalization across various subjects, we integrate an expression-conditional module to guide the neural renderer. Comprehensive experiments show that GazeGaussian outperforms existing methods in rendering speed, gaze redirection accuracy, and facial synthesis across multiple datasets. We also demonstrate that existing gaze estimation methods can leverage GazeGaussian to improve their generalization performance. The code will be available at: https://ucwxb.github.io/GazeGaussian/.", + "arxiv_url": "http://arxiv.org/abs/2411.12981v1", + "pdf_url": "http://arxiv.org/pdf/2411.12981v1", + "published_date": "2024-11-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization", + "authors": [ + "Hao Ju", + "Zhedong Zheng" + ], + "abstract": "Existing approaches to drone visual geo-localization predominantly adopt the image-based setting, where a single drone-view snapshot is matched with images from other platforms. Such task formulation, however, underutilizes the inherent video output of the drone and is sensitive to occlusions and environmental constraints. To address these limitations, we formulate a new video-based drone geo-localization task and propose the Video2BEV paradigm. This paradigm transforms the video into a Bird's Eye View (BEV), simplifying the subsequent matching process. In particular, we employ Gaussian Splatting to reconstruct a 3D scene and obtain the BEV projection. Different from the existing transform methods, \\eg, polar transform, our BEVs preserve more fine-grained details without significant distortion. To further improve model scalability toward diverse BEVs and satellite figures, our Video2BEV paradigm also incorporates a diffusion-based module for generating hard negative samples, which facilitates discriminative feature learning. To validate our approach, we introduce UniV, a new video-based geo-localization dataset that extends the image-based University-1652 dataset. UniV features flight paths at $30^\\circ$ and $45^\\circ$ elevation angles with increased frame rates of up to 10 frames per second (FPS). Extensive experiments on the UniV dataset show that our Video2BEV paradigm achieves competitive recall rates and outperforms conventional video-based methods. Compared to other methods, our proposed approach exhibits robustness at lower elevations with more occlusions.", + "arxiv_url": "http://arxiv.org/abs/2411.13610v1", + "pdf_url": "http://arxiv.org/pdf/2411.13610v1", + "published_date": "2024-11-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy", + "authors": [ + "Joanna Kaleta", + "Weronika Smolak-Dyżewska", + "Dawid Malarz", + "Diego Dall'Alba", + "Przemysław Korzeniowski", + "Przemysław Spurek" + ], + "abstract": "Endoscopic procedures are crucial for colorectal cancer diagnosis, and three-dimensional reconstruction of the environment for real-time novel-view synthesis can significantly enhance diagnosis. We present PR-ENDO, a framework that leverages 3D Gaussian Splatting within a physically based, relightable model tailored for the complex acquisition conditions in endoscopy, such as restricted camera rotations and strong view-dependent illumination. By exploiting the connection between the camera and light source, our approach introduces a relighting model to capture the intricate interactions between light and tissue using physically based rendering and MLP. Existing methods often produce artifacts and inconsistencies under these conditions, which PR-ENDO overcomes by incorporating a specialized diffuse MLP that utilizes light angles and normal vectors, achieving stable reconstructions even with limited training camera rotations. We benchmarked our framework using a publicly available dataset and a newly introduced dataset with wider camera rotations. Our methods demonstrated superior image quality compared to baseline approaches.", + "arxiv_url": "http://arxiv.org/abs/2411.12510v1", + "pdf_url": "http://arxiv.org/pdf/2411.12510v1", + "published_date": "2024-11-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image", + "authors": [ + "Zixu Wang", + "Hao Yang", + "Yu Guo", + "Fei Wang" + ], + "abstract": "Snapshot Compressive Imaging (SCI) offers a possibility for capturing information in high-speed dynamic scenes, requiring efficient reconstruction method to recover scene information. Despite promising results, current deep learning-based and NeRF-based reconstruction methods face challenges: 1) deep learning-based reconstruction methods struggle to maintain 3D structural consistency within scenes, and 2) NeRF-based reconstruction methods still face limitations in handling dynamic scenes. To address these challenges, we propose SCIGS, a variant of 3DGS, and develop a primitive-level transformation network that utilizes camera pose stamps and Gaussian primitive coordinates as embedding vectors. This approach resolves the necessity of camera pose in vanilla 3DGS and enhances multi-view 3D structural consistency in dynamic scenes by utilizing transformed primitives. Additionally, a high-frequency filter is introduced to eliminate the artifacts generated during the transformation. The proposed SCIGS is the first to reconstruct a 3D explicit scene from a single compressed image, extending its application to dynamic 3D scenes. Experiments on both static and dynamic scenes demonstrate that SCIGS not only enhances SCI decoding but also outperforms current state-of-the-art methods in reconstructing dynamic 3D scenes from a single compressed image. The code will be made available upon publication.", + "arxiv_url": "http://arxiv.org/abs/2411.12471v2", + "pdf_url": "http://arxiv.org/pdf/2411.12471v2", + "published_date": "2024-11-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting", + "authors": [ + "Haoyu Zhao", + "Hao Wang", + "Xingyue Zhao", + "Hongqiu Wang", + "Zhiyu Wu", + "Chengjiang Long", + "Hua Zou" + ], + "abstract": "Recent advancements in 3D generation models have opened new possibilities for simulating dynamic 3D object movements and customizing behaviors, yet creating this content remains challenging. Current methods often require manual assignment of precise physical properties for simulations or rely on video generation models to predict them, which is computationally intensive. In this paper, we rethink the usage of multi-modal large language model (MLLM) in physics-based simulation, and present Sim Anything, a physics-based approach that endows static 3D objects with interactive dynamics. We begin with detailed scene reconstruction and object-level 3D open-vocabulary segmentation, progressing to multi-view image in-painting. Inspired by human visual reasoning, we propose MLLM-based Physical Property Perception (MLLM-P3) to predict mean physical properties of objects in a zero-shot manner. Based on the mean values and the object's geometry, the Material Property Distribution Prediction model (MPDP) model then estimates the full distribution, reformulating the problem as probability distribution estimation to reduce computational costs. Finally, we simulate objects in an open-world scene with particles sampled via the Physical-Geometric Adaptive Sampling (PGAS) strategy, efficiently capturing complex deformations and significantly reducing computational costs. Extensive experiments and user studies demonstrate our Sim Anything achieves more realistic motion than state-of-the-art methods within 2 minutes on a single GPU.", + "arxiv_url": "http://arxiv.org/abs/2411.12789v1", + "pdf_url": "http://arxiv.org/pdf/2411.12789v1", + "published_date": "2024-11-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving", + "authors": [ + "Shaoqing Xu", + "Fang Li", + "Shengyin Jiang", + "Ziying Song", + "Li Liu", + "Zhi-xin Yang" + ], + "abstract": "Self-supervised learning has made substantial strides in image processing, while visual pre-training for autonomous driving is still in its infancy. Existing methods often focus on learning geometric scene information while neglecting texture or treating both aspects separately, hindering comprehensive scene understanding. In this context, we are excited to introduce GaussianPretrain, a novel pre-training paradigm that achieves a holistic understanding of the scene by uniformly integrating geometric and texture representations. Conceptualizing 3D Gaussian anchors as volumetric LiDAR points, our method learns a deepened understanding of scenes to enhance pre-training performance with detailed spatial structure and texture, achieving that 40.6% faster than NeRF-based method UniPAD with 70% GPU memory only. We demonstrate the effectiveness of GaussianPretrain across multiple 3D perception tasks, showing significant performance improvements, such as a 7.05% increase in NDS for 3D object detection, boosts mAP by 1.9% in HD map construction and 0.8% improvement on Occupancy prediction. These significant gains highlight GaussianPretrain's theoretical innovation and strong practical potential, promoting visual pre-training development for autonomous driving. Source code will be available at https://github.com/Public-BOTs/GaussianPretrain", + "arxiv_url": "http://arxiv.org/abs/2411.12452v1", + "pdf_url": "http://arxiv.org/pdf/2411.12452v1", + "published_date": "2024-11-19", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/Public-BOTs/GaussianPretrain", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting", + "authors": [ + "Joji Joseph", + "Bharadwaj Amrutur", + "Shalabh Bhatnagar" + ], + "abstract": "We introduce a training-free method for feature field rendering in Gaussian splatting. Our approach back-projects 2D features into pre-trained 3D Gaussians, using a weighted sum based on each Gaussian's influence in the final rendering. While most training-based feature field rendering methods excel at 2D segmentation but perform poorly at 3D segmentation without post-processing, our method achieves high-quality results in both 2D and 3D segmentation. Experimental results demonstrate that our approach is fast, scalable, and offers performance comparable to training-based methods.", + "arxiv_url": "http://arxiv.org/abs/2411.15193v1", + "pdf_url": "http://arxiv.org/pdf/2411.15193v1", + "published_date": "2024-11-19", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels", + "authors": [ + "Haodong Chen", + "Runnan Chen", + "Qiang Qu", + "Zhaoqing Wang", + "Tongliang Liu", + "Xiaoming Chen", + "Yuk Ying Chung" + ], + "abstract": "Recent advancements in 3D Gaussian Splatting (3DGS) have substantially improved novel view synthesis, enabling high-quality reconstruction and real-time rendering. However, blurring artifacts, such as floating primitives and over-reconstruction, remain challenging. Current methods address these issues by refining scene structure, enhancing geometric representations, addressing blur in training images, improving rendering consistency, and optimizing density control, yet the role of kernel design remains underexplored. We identify the soft boundaries of Gaussian ellipsoids as one of the causes of these artifacts, limiting detail capture in high-frequency regions. To bridge this gap, we introduce 3D Linear Splatting (3DLS), which replaces Gaussian kernels with linear kernels to achieve sharper and more precise results, particularly in high-frequency regions. Through evaluations on three datasets, 3DLS demonstrates state-of-the-art fidelity and accuracy, along with a 30% FPS improvement over baseline 3DGS. The implementation will be made publicly available upon acceptance.", + "arxiv_url": "http://arxiv.org/abs/2411.12440v3", + "pdf_url": "http://arxiv.org/pdf/2411.12440v3", + "published_date": "2024-11-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification", + "authors": [ + "Guangchi Fang", + "Bing Wang" + ], + "abstract": "In this study, we explore the essential challenge of fast scene optimization for Gaussian Splatting. Through a thorough analysis of the geometry modeling process, we reveal that dense point clouds can be effectively reconstructed early in optimization through Gaussian representations. This insight leads to our approach of aggressive Gaussian densification, which provides a more efficient alternative to conventional progressive densification methods. By significantly increasing the number of critical Gaussians, we enhance the model capacity to capture dense scene geometry at the early stage of optimization. This strategy is seamlessly integrated into the Mini-Splatting densification and simplification framework, enabling rapid convergence without compromising quality. Additionally, we introduce visibility culling within Gaussian Splatting, leveraging per-view Gaussian importance as precomputed visibility to accelerate the optimization process. Our Mini-Splatting2 achieves a balanced trade-off among optimization time, the number of Gaussians, and rendering quality, establishing a strong baseline for future Gaussian-Splatting-based works. Our work sets the stage for more efficient, high-quality 3D scene modeling in real-world applications, and the code will be made available no matter acceptance.", + "arxiv_url": "http://arxiv.org/abs/2411.12788v1", + "pdf_url": "http://arxiv.org/pdf/2411.12788v1", + "published_date": "2024-11-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LiV-GS: LiDAR-Vision Integration for 3D Gaussian Splatting SLAM in Outdoor Environments", + "authors": [ + "Renxiang Xiao", + "Wei Liu", + "Yushuai Chen", + "Liang Hu" + ], + "abstract": "We present LiV-GS, a LiDAR-visual SLAM system in outdoor environments that leverages 3D Gaussian as a differentiable spatial representation. Notably, LiV-GS is the first method that directly aligns discrete and sparse LiDAR data with continuous differentiable Gaussian maps in large-scale outdoor scenes, overcoming the limitation of fixed resolution in traditional LiDAR mapping. The system aligns point clouds with Gaussian maps using shared covariance attributes for front-end tracking and integrates the normal orientation into the loss function to refines the Gaussian map. To reliably and stably update Gaussians outside the LiDAR field of view, we introduce a novel conditional Gaussian constraint that aligns these Gaussians closely with the nearest reliable ones. The targeted adjustment enables LiV-GS to achieve fast and accurate mapping with novel view synthesis at a rate of 7.98 FPS. Extensive comparative experiments demonstrate LiV-GS's superior performance in SLAM, image rendering and mapping. The successful cross-modal radar-LiDAR localization highlights the potential of LiV-GS for applications in cross-modal semantic positioning and object segmentation with Gaussian maps.", + "arxiv_url": "http://arxiv.org/abs/2411.12185v1", + "pdf_url": "http://arxiv.org/pdf/2411.12185v1", + "published_date": "2024-11-19", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sketch-guided Cage-based 3D Gaussian Splatting Deformation", + "authors": [ + "Tianhao Xie", + "Noam Aigerman", + "Eugene Belilovsky", + "Tiberiu Popa" + ], + "abstract": "3D Gaussian Splatting (GS) is one of the most promising novel 3D representations that has received great interest in computer graphics and computer vision. While various systems have introduced editing capabilities for 3D GS, such as those guided by text prompts, fine-grained control over deformation remains an open challenge. In this work, we present a novel sketch-guided 3D GS deformation system that allows users to intuitively modify the geometry of a 3D GS model by drawing a silhouette sketch from a single viewpoint. Our approach introduces a new deformation method that combines cage-based deformations with a variant of Neural Jacobian Fields, enabling precise, fine-grained control. Additionally, it leverages large-scale 2D diffusion priors and ControlNet to ensure the generated deformations are semantically plausible. Through a series of experiments, we demonstrate the effectiveness of our method and showcase its ability to animate static 3D GS models as one of its key applications.", + "arxiv_url": "http://arxiv.org/abs/2411.12168v2", + "pdf_url": "http://arxiv.org/pdf/2411.12168v2", + "published_date": "2024-11-19", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting", + "authors": [ + "Fangyu Wu", + "Yuhao Chen" + ], + "abstract": "In the real world, objects reveal internal textures when sliced or cut, yet this behavior is not well-studied in 3D generation tasks today. For example, slicing a virtual 3D watermelon should reveal flesh and seeds. Given that no available dataset captures an object's full internal structure and collecting data from all slices is impractical, generative methods become the obvious approach. However, current 3D generation and inpainting methods often focus on visible appearance and overlook internal textures. To bridge this gap, we introduce FruitNinja, the first method to generate internal textures for 3D objects undergoing geometric and topological changes. Our approach produces objects via 3D Gaussian Splatting (3DGS) with both surface and interior textures synthesized, enabling real-time slicing and rendering without additional optimization. FruitNinja leverages a pre-trained diffusion model to progressively inpaint cross-sectional views and applies voxel-grid-based smoothing to achieve cohesive textures throughout the object. Our OpaqueAtom GS strategy overcomes 3DGS limitations by employing densely distributed opaque Gaussians, avoiding biases toward larger particles that destabilize training and sharp color transitions for fine-grained textures. Experimental results show that FruitNinja substantially outperforms existing approaches, showcasing unmatched visual quality in real-time rendered internal views across arbitrary geometry manipulations.", + "arxiv_url": "http://arxiv.org/abs/2411.12089v2", + "pdf_url": "http://arxiv.org/pdf/2411.12089v2", + "published_date": "2024-11-18", + "categories": [ + "cs.CV", + "cs.GR", + "cs.HC" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RoboGSim: A Real2Sim2Real Robotic Gaussian Splatting Simulator", + "authors": [ + "Xinhai Li", + "Jialin Li", + "Ziheng Zhang", + "Rui Zhang", + "Fan Jia", + "Tiancai Wang", + "Haoqiang Fan", + "Kuo-Kun Tseng", + "Ruiping Wang" + ], + "abstract": "Efficient acquisition of real-world embodied data has been increasingly critical. However, large-scale demonstrations captured by remote operation tend to take extremely high costs and fail to scale up the data size in an efficient manner. Sampling the episodes under a simulated environment is a promising way for large-scale collection while existing simulators fail to high-fidelity modeling on texture and physics. To address these limitations, we introduce the RoboGSim, a real2sim2real robotic simulator, powered by 3D Gaussian Splatting and the physics engine. RoboGSim mainly includes four parts: Gaussian Reconstructor, Digital Twins Builder, Scene Composer, and Interactive Engine. It can synthesize the simulated data with novel views, objects, trajectories, and scenes. RoboGSim also provides an online, reproducible, and safe evaluation for different manipulation policies. The real2sim and sim2real transfer experiments show a high consistency in the texture and physics. Moreover, the effectiveness of synthetic data is validated under the real-world manipulated tasks. We hope RoboGSim serves as a closed-loop simulator for fair comparison on policy learning. More information can be found on our project page https://robogsim.github.io/ .", + "arxiv_url": "http://arxiv.org/abs/2411.11839v1", + "pdf_url": "http://arxiv.org/pdf/2411.11839v1", + "published_date": "2024-11-18", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction", + "authors": [ + "DaDong Jiang", + "Zhihui Ke", + "Xiaobo Zhou", + "Zhi Hou", + "Xianghui Yang", + "Wenbo Hu", + "Tie Qiu", + "Chunchao Guo" + ], + "abstract": "Dynamic scene reconstruction is a long-term challenge in 3D vision. Recent methods extend 3D Gaussian Splatting to dynamic scenes via additional deformation fields and apply explicit constraints like motion flow to guide the deformation. However, they learn motion changes from individual timestamps independently, making it challenging to reconstruct complex scenes, particularly when dealing with violent movement, extreme-shaped geometries, or reflective surfaces. To address the above issue, we design a plug-and-play module called TimeFormer to enable existing deformable 3D Gaussians reconstruction methods with the ability to implicitly model motion patterns from a learning perspective. Specifically, TimeFormer includes a Cross-Temporal Transformer Encoder, which adaptively learns the temporal relationships of deformable 3D Gaussians. Furthermore, we propose a two-stream optimization strategy that transfers the motion knowledge learned from TimeFormer to the base stream during the training phase. This allows us to remove TimeFormer during inference, thereby preserving the original rendering speed. Extensive experiments in the multi-view and monocular dynamic scenes validate qualitative and quantitative improvement brought by TimeFormer. Project Page: https://patrickddj.github.io/TimeFormer/", + "arxiv_url": "http://arxiv.org/abs/2411.11941v1", + "pdf_url": "http://arxiv.org/pdf/2411.11941v1", + "published_date": "2024-11-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GPS-Gaussian+: Generalizable Pixel-wise 3D Gaussian Splatting for Real-Time Human-Scene Rendering from Sparse Views", + "authors": [ + "Boyao Zhou", + "Shunyuan Zheng", + "Hanzhang Tu", + "Ruizhi Shao", + "Boning Liu", + "Shengping Zhang", + "Liqiang Nie", + "Yebin Liu" + ], + "abstract": "Differentiable rendering techniques have recently shown promising results for free-viewpoint video synthesis of characters. However, such methods, either Gaussian Splatting or neural implicit rendering, typically necessitate per-subject optimization which does not meet the requirement of real-time rendering in an interactive application. We propose a generalizable Gaussian Splatting approach for high-resolution image rendering under a sparse-view camera setting. To this end, we introduce Gaussian parameter maps defined on the source views and directly regress Gaussian properties for instant novel view synthesis without any fine-tuning or optimization. We train our Gaussian parameter regression module on human-only data or human-scene data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable with both depth and rendering supervision or with only rendering supervision. We further introduce a regularization term and an epipolar attention mechanism to preserve geometry consistency between two source views, especially when neglecting depth supervision. Experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2411.11363v1", + "pdf_url": "http://arxiv.org/pdf/2411.11363v1", + "published_date": "2024-11-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes", + "authors": [ + "Chensheng Peng", + "Chengwei Zhang", + "Yixiao Wang", + "Chenfeng Xu", + "Yichen Xie", + "Wenzhao Zheng", + "Kurt Keutzer", + "Masayoshi Tomizuka", + "Wei Zhan" + ], + "abstract": "We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling effective static-dynamic decomposition and high-fidelity surface reconstruction in complex driving scenarios. Our approach employs a two-stage optimization pipeline of dynamic street Gaussians. In the first stage, we extract 2D motion masks based on the observation that 3D Gaussian Splatting inherently can reconstruct only the static regions in dynamic environments. These extracted 2D motion priors are then mapped into the Gaussian space in a differentiable manner, leveraging an efficient formulation of dynamic Gaussians in the second stage. Combined with the introduced geometric regularizations, our method are able to address the over-fitting issues caused by data sparsity in autonomous driving, reconstructing physically plausible Gaussians that align with object surfaces rather than floating in air. Furthermore, we introduce temporal cross-view consistency to ensure coherence across time and viewpoints, resulting in high-quality surface reconstruction. Comprehensive experiments demonstrate the efficiency and effectiveness of DeSiRe-GS, surpassing prior self-supervised arts and achieving accuracy comparable to methods relying on external 3D bounding box annotations. Code is available at \\url{https://github.com/chengweialan/DeSiRe-GS}", + "arxiv_url": "http://arxiv.org/abs/2411.11921v1", + "pdf_url": "http://arxiv.org/pdf/2411.11921v1", + "published_date": "2024-11-18", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/chengweialan/DeSiRe-GS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VeGaS: Video Gaussian Splatting", + "authors": [ + "Weronika Smolak-Dyżewska", + "Dawid Malarz", + "Kornel Howil", + "Jan Kaczmarczyk", + "Marcin Mazur", + "Przemysław Spurek" + ], + "abstract": "Implicit Neural Representations (INRs) employ neural networks to approximate discrete data as continuous functions. In the context of video data, such models can be utilized to transform the coordinates of pixel locations along with frame occurrence times (or indices) into RGB color values. Although INRs facilitate effective compression, they are unsuitable for editing purposes. One potential solution is to use a 3D Gaussian Splatting (3DGS) based model, such as the Video Gaussian Representation (VGR), which is capable of encoding video as a multitude of 3D Gaussians and is applicable for numerous video processing operations, including editing. Nevertheless, in this case, the capacity for modification is constrained to a limited set of basic transformations. To address this issue, we introduce the Video Gaussian Splatting (VeGaS) model, which enables realistic modifications of video data. To construct VeGaS, we propose a novel family of Folded-Gaussian distributions designed to capture nonlinear dynamics in a video stream and model consecutive frames by 2D Gaussians obtained as respective conditional distributions. Our experiments demonstrate that VeGaS outperforms state-of-the-art solutions in frame reconstruction tasks and allows realistic modifications of video data. The code is available at: https://github.com/gmum/VeGaS.", + "arxiv_url": "http://arxiv.org/abs/2411.11024v1", + "pdf_url": "http://arxiv.org/pdf/2411.11024v1", + "published_date": "2024-11-17", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/gmum/VeGaS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Direct and Explicit 3D Generation from a Single Image", + "authors": [ + "Haoyu Wu", + "Meher Gitika Karumuri", + "Chuhang Zou", + "Seungbae Bang", + "Yuelong Li", + "Dimitris Samaras", + "Sunil Hadap" + ], + "abstract": "Current image-to-3D approaches suffer from high computational costs and lack scalability for high-resolution outputs. In contrast, we introduce a novel framework to directly generate explicit surface geometry and texture using multi-view 2D depth and RGB images along with 3D Gaussian features using a repurposed Stable Diffusion model. We introduce a depth branch into U-Net for efficient and high quality multi-view, cross-domain generation and incorporate epipolar attention into the latent-to-pixel decoder for pixel-level multi-view consistency. By back-projecting the generated depth pixels into 3D space, we create a structured 3D representation that can be either rendered via Gaussian splatting or extracted to high-quality meshes, thereby leveraging additional novel view synthesis loss to further improve our performance. Extensive experiments demonstrate that our method surpasses existing baselines in geometry and texture quality while achieving significantly faster generation time.", + "arxiv_url": "http://arxiv.org/abs/2411.10947v1", + "pdf_url": "http://arxiv.org/pdf/2411.10947v1", + "published_date": "2024-11-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DGS-SLAM: Gaussian Splatting SLAM in Dynamic Environment", + "authors": [ + "Mangyu Kong", + "Jaewon Lee", + "Seongwon Lee", + "Euntai Kim" + ], + "abstract": "We introduce Dynamic Gaussian Splatting SLAM (DGS-SLAM), the first dynamic SLAM framework built on the foundation of Gaussian Splatting. While recent advancements in dense SLAM have leveraged Gaussian Splatting to enhance scene representation, most approaches assume a static environment, making them vulnerable to photometric and geometric inconsistencies caused by dynamic objects. To address these challenges, we integrate Gaussian Splatting SLAM with a robust filtering process to handle dynamic objects throughout the entire pipeline, including Gaussian insertion and keyframe selection. Within this framework, to further improve the accuracy of dynamic object removal, we introduce a robust mask generation method that enforces photometric consistency across keyframes, reducing noise from inaccurate segmentation and artifacts such as shadows. Additionally, we propose the loop-aware window selection mechanism, which utilizes unique keyframe IDs of 3D Gaussians to detect loops between the current and past frames, facilitating joint optimization of the current camera poses and the Gaussian map. DGS-SLAM achieves state-of-the-art performance in both camera tracking and novel view synthesis on various dynamic SLAM benchmarks, proving its effectiveness in handling real-world dynamic scenes.", + "arxiv_url": "http://arxiv.org/abs/2411.10722v1", + "pdf_url": "http://arxiv.org/pdf/2411.10722v1", + "published_date": "2024-11-16", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction", + "authors": [ + "Yutao Tang", + "Yuxiang Guo", + "Deming Li", + "Cheng Peng" + ], + "abstract": "Recent efforts in Gaussian-Splat-based Novel View Synthesis can achieve photorealistic rendering; however, such capability is limited in sparse-view scenarios due to sparse initialization and over-fitting floaters. Recent progress in depth estimation and alignment can provide dense point cloud with few views; however, the resulting pose accuracy is suboptimal. In this work, we present SPARS3R, which combines the advantages of accurate pose estimation from Structure-from-Motion and dense point cloud from depth estimation. To this end, SPARS3R first performs a Global Fusion Alignment process that maps a prior dense point cloud to a sparse point cloud from Structure-from-Motion based on triangulated correspondences. RANSAC is applied during this process to distinguish inliers and outliers. SPARS3R then performs a second, Semantic Outlier Alignment step, which extracts semantically coherent regions around the outliers and performs local alignment in these regions. Along with several improvements in the evaluation process, we demonstrate that SPARS3R can achieve photorealistic rendering with sparse images and significantly outperforms existing approaches.", + "arxiv_url": "http://arxiv.org/abs/2411.12592v1", + "pdf_url": "http://arxiv.org/pdf/2411.12592v1", + "published_date": "2024-11-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods", + "authors": [ + "Yifu Tao", + "Miguel Ángel Muñoz-Bañón", + "Lintong Zhang", + "Jiahao Wang", + "Lanke Frank Tarimo Fu", + "Maurice Fallon" + ], + "abstract": "This paper introduces a large-scale multi-modal dataset captured in and around well-known landmarks in Oxford using a custom-built multi-sensor perception unit as well as a millimetre-accurate map from a Terrestrial LiDAR Scanner (TLS). The perception unit includes three synchronised global shutter colour cameras, an automotive 3D LiDAR scanner, and an inertial sensor - all precisely calibrated. We also establish benchmarks for tasks involving localisation, reconstruction, and novel-view synthesis, which enable the evaluation of Simultaneous Localisation and Mapping (SLAM) methods, Structure-from-Motion (SfM) and Multi-view Stereo (MVS) methods as well as radiance field methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting. To evaluate 3D reconstruction the TLS 3D models are used as ground truth. Localisation ground truth is computed by registering the mobile LiDAR scans to the TLS 3D models. Radiance field methods are evaluated not only with poses sampled from the input trajectory, but also from viewpoints that are from trajectories which are distant from the training poses. Our evaluation demonstrates a key limitation of state-of-the-art radiance field methods: we show that they tend to overfit to the training poses/images and do not generalise well to out-of-sequence poses. They also underperform in 3D reconstruction compared to MVS systems using the same visual inputs. Our dataset and benchmarks are intended to facilitate better integration of radiance field methods and SLAM systems. The raw and processed data, along with software for parsing and evaluation, can be accessed at https://dynamic.robots.ox.ac.uk/datasets/oxford-spires/.", + "arxiv_url": "http://arxiv.org/abs/2411.10546v1", + "pdf_url": "http://arxiv.org/pdf/2411.10546v1", + "published_date": "2024-11-15", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting", + "authors": [ + "Kang Chen", + "Jiyuan Zhang", + "Zecheng Hao", + "Yajing Zheng", + "Tiejun Huang", + "Zhaofei Yu" + ], + "abstract": "Spike cameras, as an innovative neuromorphic camera that captures scenes with the 0-1 bit stream at 40 kHz, are increasingly employed for the 3D reconstruction task via Neural Radiance Fields (NeRF) or 3D Gaussian Splatting (3DGS). Previous spike-based 3D reconstruction approaches often employ a casecased pipeline: starting with high-quality image reconstruction from spike streams based on established spike-to-image reconstruction algorithms, then progressing to camera pose estimation and 3D reconstruction. However, this cascaded approach suffers from substantial cumulative errors, where quality limitations of initial image reconstructions negatively impact pose estimation, ultimately degrading the fidelity of the 3D reconstruction. To address these issues, we propose a synergistic optimization framework, \\textbf{USP-Gaussian}, that unifies spike-based image reconstruction, pose correction, and Gaussian splatting into an end-to-end framework. Leveraging the multi-view consistency afforded by 3DGS and the motion capture capability of the spike camera, our framework enables a joint iterative optimization that seamlessly integrates information between the spike-to-image network and 3DGS. Experiments on synthetic datasets with accurate poses demonstrate that our method surpasses previous approaches by effectively eliminating cascading errors. Moreover, we integrate pose optimization to achieve robust 3D reconstruction in real-world scenarios with inaccurate initial poses, outperforming alternative methods by effectively reducing noise and preserving fine texture details. Our code, data and trained models will be available at \\url{https://github.com/chenkang455/USP-Gaussian}.", + "arxiv_url": "http://arxiv.org/abs/2411.10504v1", + "pdf_url": "http://arxiv.org/pdf/2411.10504v1", + "published_date": "2024-11-15", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "https://github.com/chenkang455/USP-Gaussian", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Efficient Density Control for 3D Gaussian Splatting", + "authors": [ + "Xiaobin Deng", + "Changyu Diao", + "Min Li", + "Ruohan Yu", + "Duanqing Xu" + ], + "abstract": "3D Gaussian Splatting (3DGS) excels in novel view synthesis, balancing advanced rendering quality with real-time performance. However, in trained scenes, a large number of Gaussians with low opacity significantly increase rendering costs. This issue arises due to flaws in the split and clone operations during the densification process, which lead to extensive Gaussian overlap and subsequent opacity reduction. To enhance the efficiency of Gaussian utilization, we improve the adaptive density control of 3DGS. First, we introduce a more efficient long-axis split operation to replace the original clone and split, which mitigates Gaussian overlap and improves densification efficiency.Second, we propose a simple adaptive pruning technique to reduce the number of low-opacity Gaussians. Finally, by dynamically lowering the splitting threshold and applying importance weighting, the efficiency of Gaussian utilization is further improved.We evaluate our proposed method on various challenging real-world datasets. Experimental results show that our Efficient Density Control (EDC) can enhance both the rendering speed and quality.", + "arxiv_url": "http://arxiv.org/abs/2411.10133v1", + "pdf_url": "http://arxiv.org/pdf/2411.10133v1", + "published_date": "2024-11-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSEditPro: 3D Gaussian Splatting Editing with Attention-based Progressive Localization", + "authors": [ + "Yanhao Sun", + "RunZe Tian", + "Xiao Han", + "XinYao Liu", + "Yan Zhang", + "Kai Xu" + ], + "abstract": "With the emergence of large-scale Text-to-Image(T2I) models and implicit 3D representations like Neural Radiance Fields (NeRF), many text-driven generative editing methods based on NeRF have appeared. However, the implicit encoding of geometric and textural information poses challenges in accurately locating and controlling objects during editing. Recently, significant advancements have been made in the editing methods of 3D Gaussian Splatting, a real-time rendering technology that relies on explicit representation. However, these methods still suffer from issues including inaccurate localization and limited manipulation over editing. To tackle these challenges, we propose GSEditPro, a novel 3D scene editing framework which allows users to perform various creative and precise editing using text prompts only. Leveraging the explicit nature of the 3D Gaussian distribution, we introduce an attention-based progressive localization module to add semantic labels to each Gaussian during rendering. This enables precise localization on editing areas by classifying Gaussians based on their relevance to the editing prompts derived from cross-attention layers of the T2I model. Furthermore, we present an innovative editing optimization method based on 3D Gaussian Splatting, obtaining stable and refined editing results through the guidance of Score Distillation Sampling and pseudo ground truth. We prove the efficacy of our method through extensive experiments.", + "arxiv_url": "http://arxiv.org/abs/2411.10033v1", + "pdf_url": "http://arxiv.org/pdf/2411.10033v1", + "published_date": "2024-11-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video", + "authors": [ + "Jingxuan Chen" + ], + "abstract": "Avatar modelling has broad applications in human animation and virtual try-ons. Recent advancements in this field have focused on high-quality and comprehensive human reconstruction but often overlook the separation of clothing from the body. To bridge this gap, this paper introduces GGAvatar (Garment-separated 3D Gaussian Splatting Avatar), which relies on monocular videos. Through advanced parameterized templates and unique phased training, this model effectively achieves decoupled, editable, and realistic reconstruction of clothed humans. Comparative evaluations with other costly models confirm GGAvatar's superior quality and efficiency in modelling both clothed humans and separable garments. The paper also showcases applications in clothing editing, as illustrated in Figure 1, highlighting the model's benefits and the advantages of effective disentanglement. The code is available at https://github.com/J-X-Chen/GGAvatar/.", + "arxiv_url": "http://arxiv.org/abs/2411.09952v1", + "pdf_url": "http://arxiv.org/pdf/2411.09952v1", + "published_date": "2024-11-15", + "categories": [ + "cs.CV", + "cs.AI", + "cs.MM" + ], + "github_url": "https://github.com/J-X-Chen/GGAvatar/", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Adversarial Attacks Using Differentiable Rendering: A Survey", + "authors": [ + "Matthew Hull", + "Chao Zhang", + "Zsolt Kira", + "Duen Horng Chau" + ], + "abstract": "Differentiable rendering methods have emerged as a promising means for generating photo-realistic and physically plausible adversarial attacks by manipulating 3D objects and scenes that can deceive deep neural networks (DNNs). Recently, differentiable rendering capabilities have evolved significantly into a diverse landscape of libraries, such as Mitsuba, PyTorch3D, and methods like Neural Radiance Fields and 3D Gaussian Splatting for solving inverse rendering problems that share conceptually similar properties commonly used to attack DNNs, such as back-propagation and optimization. However, the adversarial machine learning research community has not yet fully explored or understood such capabilities for generating attacks. Some key reasons are that researchers often have different attack goals, such as misclassification or misdetection, and use different tasks to accomplish these goals by manipulating different representation in a scene, such as the mesh or texture of an object. This survey adopts a task-oriented unifying framework that systematically summarizes common tasks, such as manipulating textures, altering illumination, and modifying 3D meshes to exploit vulnerabilities in DNNs. Our framework enables easy comparison of existing works, reveals research gaps and spotlights exciting future research directions in this rapidly evolving field. Through focusing on how these tasks enable attacks on various DNNs such as image classification, facial recognition, object detection, optical flow and depth estimation, our survey helps researchers and practitioners better understand the vulnerabilities of computer vision systems against photorealistic adversarial attacks that could threaten real-world applications.", + "arxiv_url": "http://arxiv.org/abs/2411.09749v1", + "pdf_url": "http://arxiv.org/pdf/2411.09749v1", + "published_date": "2024-11-14", + "categories": [ + "cs.LG", + "cs.CR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DyGASR: Dynamic Generalized Exponential Splatting with Surface Alignment for Accelerated 3D Mesh Reconstruction", + "authors": [ + "Shengchao Zhao", + "Yundong Li" + ], + "abstract": "Recent advancements in 3D Gaussian Splatting (3DGS), which lead to high-quality novel view synthesis and accelerated rendering, have remarkably improved the quality of radiance field reconstruction. However, the extraction of mesh from a massive number of minute 3D Gaussian points remains great challenge due to the large volume of Gaussians and difficulty of representation of sharp signals caused by their inherent low-pass characteristics. To address this issue, we propose DyGASR, which utilizes generalized exponential function instead of traditional 3D Gaussian to decrease the number of particles and dynamically optimize the representation of the captured signal. In addition, it is observed that reconstructing mesh with Generalized Exponential Splatting(GES) without modifications frequently leads to failures since the generalized exponential distribution centroids may not precisely align with the scene surface. To overcome this, we adopt Sugar's approach and introduce Generalized Surface Regularization (GSR), which reduces the smallest scaling vector of each point cloud to zero and ensures normal alignment perpendicular to the surface, facilitating subsequent Poisson surface mesh reconstruction. Additionally, we propose a dynamic resolution adjustment strategy that utilizes a cosine schedule to gradually increase image resolution from low to high during the training stage, thus avoiding constant full resolution, which significantly boosts the reconstruction speed. Our approach surpasses existing 3DGS-based mesh reconstruction methods, as evidenced by extensive evaluations on various scene datasets, demonstrating a 25\\% increase in speed, and a 30\\% reduction in memory usage.", + "arxiv_url": "http://arxiv.org/abs/2411.09156v2", + "pdf_url": "http://arxiv.org/pdf/2411.09156v2", + "published_date": "2024-11-14", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization", + "authors": [ + "Mijeong Kim", + "Jongwoo Lim", + "Bohyung Han" + ], + "abstract": "Novel view synthesis of dynamic scenes is becoming important in various applications, including augmented and virtual reality. We propose a novel 4D Gaussian Splatting (4DGS) algorithm for dynamic scenes from casually recorded monocular videos. To overcome the overfitting problem of existing work for these real-world videos, we introduce an uncertainty-aware regularization that identifies uncertain regions with few observations and selectively imposes additional priors based on diffusion models and depth smoothness on such regions. This approach improves both the performance of novel view synthesis and the quality of training image reconstruction. We also identify the initialization problem of 4DGS in fast-moving dynamic regions, where the Structure from Motion (SfM) algorithm fails to provide reliable 3D landmarks. To initialize Gaussian primitives in such regions, we present a dynamic region densification method using the estimated depth maps and scene flow. Our experiments show that the proposed method improves the performance of 4DGS reconstruction from a video captured by a handheld monocular camera and also exhibits promising results in few-shot static scene reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2411.08879v1", + "pdf_url": "http://arxiv.org/pdf/2411.08879v1", + "published_date": "2024-11-13", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models", + "authors": [ + "Chengdong Dong", + "Vijayakumar Bhagavatula", + "Zhenyu Zhou", + "Ajay Kumar" + ], + "abstract": "The remarkable progress in neural-network-driven visual data generation, especially with neural rendering techniques like Neural Radiance Fields and 3D Gaussian splatting, offers a powerful alternative to GANs and diffusion models. These methods can produce high-fidelity images and lifelike avatars, highlighting the need for robust detection methods. In response, an unsupervised training technique is proposed that enables the model to extract comprehensive features from the Fourier spectrum magnitude, thereby overcoming the challenges of reconstructing the spectrum due to its centrosymmetric properties. By leveraging the spectral domain and dynamically combining it with spatial domain information, we create a robust multimodal detector that demonstrates superior generalization capabilities in identifying challenging synthetic images generated by the latest image synthesis techniques. To address the absence of a 3D neural rendering-based fake image database, we develop a comprehensive database that includes images generated by diverse neural rendering techniques, providing a robust foundation for evaluating and advancing detection methods.", + "arxiv_url": "http://arxiv.org/abs/2411.08642v1", + "pdf_url": "http://arxiv.org/pdf/2411.08642v1", + "published_date": "2024-11-13", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis", + "authors": [ + "David Svitov", + "Pietro Morerio", + "Lourdes Agapito", + "Alessio Del Bue" + ], + "abstract": "We present billboard Splatting (BBSplat) - a novel approach for 3D scene representation based on textured geometric primitives. BBSplat represents the scene as a set of optimizable textured planar primitives with learnable RGB textures and alpha-maps to control their shape. BBSplat primitives can be used in any Gaussian Splatting pipeline as drop-in replacements for Gaussians. Our method's qualitative and quantitative improvements over 3D and 2D Gaussians are most noticeable when fewer primitives are used, when BBSplat achieves over 1200 FPS. Our novel regularization term encourages textures to have a sparser structure, unlocking an efficient compression that leads to a reduction in storage space of the model. Our experiments show the efficiency of BBSplat on standard datasets of real indoor and outdoor scenes such as Tanks&Temples, DTU, and Mip-NeRF-360. We demonstrate improvements on PSNR, SSIM, and LPIPS metrics compared to the state-of-the-art, especially for the case when fewer primitives are used, which, on the other hand, leads to up to 2 times inference speed improvement for the same rendering quality.", + "arxiv_url": "http://arxiv.org/abs/2411.08508v2", + "pdf_url": "http://arxiv.org/pdf/2411.08508v2", + "published_date": "2024-11-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model", + "authors": [ + "Yutao Shen", + "Hongyu Zhou", + "Xin Yang", + "Xuqi Lu", + "Ziyue Guo", + "Lixi Jiang", + "Yong He", + "Haiyan Cen" + ], + "abstract": "Biomass estimation of oilseed rape is crucial for optimizing crop productivity and breeding strategies. While UAV-based imaging has advanced high-throughput phenotyping, current methods often rely on orthophoto images, which struggle with overlapping leaves and incomplete structural information in complex field environments. This study integrates 3D Gaussian Splatting (3DGS) with the Segment Anything Model (SAM) for precise 3D reconstruction and biomass estimation of oilseed rape. UAV multi-view oblique images from 36 angles were used to perform 3D reconstruction, with the SAM module enhancing point cloud segmentation. The segmented point clouds were then converted into point cloud volumes, which were fitted to ground-measured biomass using linear regression. The results showed that 3DGS (7k and 30k iterations) provided high accuracy, with peak signal-to-noise ratios (PSNR) of 27.43 and 29.53 and training times of 7 and 49 minutes, respectively. This performance exceeded that of structure from motion (SfM) and mipmap Neural Radiance Fields (Mip-NeRF), demonstrating superior efficiency. The SAM module achieved high segmentation accuracy, with a mean intersection over union (mIoU) of 0.961 and an F1-score of 0.980. Additionally, a comparison of biomass extraction models found the point cloud volume model to be the most accurate, with an determination coefficient (R2) of 0.976, root mean square error (RMSE) of 2.92 g/plant, and mean absolute percentage error (MAPE) of 6.81%, outperforming both the plot crop volume and individual crop volume models. This study highlights the potential of combining 3DGS with multi-view UAV imaging for improved biomass phenotyping.", + "arxiv_url": "http://arxiv.org/abs/2411.08453v1", + "pdf_url": "http://arxiv.org/pdf/2411.08453v1", + "published_date": "2024-11-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DG-SLAM: Robust Dynamic Gaussian Splatting SLAM with Hybrid Pose Optimization", + "authors": [ + "Yueming Xu", + "Haochen Jiang", + "Zhongyang Xiao", + "Jianfeng Feng", + "Li Zhang" + ], + "abstract": "Achieving robust and precise pose estimation in dynamic scenes is a significant research challenge in Visual Simultaneous Localization and Mapping (SLAM). Recent advancements integrating Gaussian Splatting into SLAM systems have proven effective in creating high-quality renderings using explicit 3D Gaussian models, significantly improving environmental reconstruction fidelity. However, these approaches depend on a static environment assumption and face challenges in dynamic environments due to inconsistent observations of geometry and photometry. To address this problem, we propose DG-SLAM, the first robust dynamic visual SLAM system grounded in 3D Gaussians, which provides precise camera pose estimation alongside high-fidelity reconstructions. Specifically, we propose effective strategies, including motion mask generation, adaptive Gaussian point management, and a hybrid camera tracking algorithm to improve the accuracy and robustness of pose estimation. Extensive experiments demonstrate that DG-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, and novel-view synthesis in dynamic scenes, outperforming existing methods meanwhile preserving real-time rendering ability.", + "arxiv_url": "http://arxiv.org/abs/2411.08373v1", + "pdf_url": "http://arxiv.org/pdf/2411.08373v1", + "published_date": "2024-11-13", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MBA-SLAM: Motion Blur Aware Dense Visual SLAM with Radiance Fields Representation", + "authors": [ + "Peng Wang", + "Lingzhe Zhao", + "Yin Zhang", + "Shiyu Zhao", + "Peidong Liu" + ], + "abstract": "Emerging 3D scene representations, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have demonstrated their effectiveness in Simultaneous Localization and Mapping (SLAM) for photo-realistic rendering, particularly when using high-quality video sequences as input. However, existing methods struggle with motion-blurred frames, which are common in real-world scenarios like low-light or long-exposure conditions. This often results in a significant reduction in both camera localization accuracy and map reconstruction quality. To address this challenge, we propose a dense visual SLAM pipeline (i.e. MBA-SLAM) to handle severe motion-blurred inputs. Our approach integrates an efficient motion blur-aware tracker with either neural radiance fields or Gaussian Splatting based mapper. By accurately modeling the physical image formation process of motion-blurred images, our method simultaneously learns 3D scene representation and estimates the cameras' local trajectory during exposure time, enabling proactive compensation for motion blur caused by camera movement. In our experiments, we demonstrate that MBA-SLAM surpasses previous state-of-the-art methods in both camera localization and map reconstruction, showcasing superior performance across a range of datasets, including synthetic and real datasets featuring sharp images as well as those affected by motion blur, highlighting the versatility and robustness of our approach. Code is available at https://github.com/WU-CVGL/MBA-SLAM.", + "arxiv_url": "http://arxiv.org/abs/2411.08279v1", + "pdf_url": "http://arxiv.org/pdf/2411.08279v1", + "published_date": "2024-11-13", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "https://github.com/WU-CVGL/MBA-SLAM", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation", + "authors": [ + "Han Qi", + "Tao Cai", + "Xiyue Han" + ], + "abstract": "Recently, 3D Gaussian Splatting has dominated novel-view synthesis with its real-time rendering speed and state-of-the-art rendering quality. However, during the rendering process, the use of the Jacobian of the affine approximation of the projection transformation leads to inevitable errors, resulting in blurriness, artifacts and a lack of scene consistency in the final rendered images. To address this issue, we introduce an ellipsoid-based projection method to calculate the projection of Gaussian ellipsoid onto the image plane, which is the primitive of 3D Gaussian Splatting. As our proposed ellipsoid-based projection method cannot handle Gaussian ellipsoids with camera origins inside them or parts lying below $z=0$ plane in the camera space, we designed a pre-filtering strategy. Experiments over multiple widely adopted benchmark datasets show that our ellipsoid-based projection method can enhance the rendering quality of 3D Gaussian Splatting and its extensions.", + "arxiv_url": "http://arxiv.org/abs/2411.07579v3", + "pdf_url": "http://arxiv.org/pdf/2411.07579v3", + "published_date": "2024-11-12", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting", + "authors": [ + "Umangi Jain", + "Ashkan Mirzaei", + "Igor Gilitschenski" + ], + "abstract": "We introduce GaussianCut, a new method for interactive multiview segmentation of scenes represented as 3D Gaussians. Our approach allows for selecting the objects to be segmented by interacting with a single view. It accepts intuitive user input, such as point clicks, coarse scribbles, or text. Using 3D Gaussian Splatting (3DGS) as the underlying scene representation simplifies the extraction of objects of interest which are considered to be a subset of the scene's Gaussians. Our key idea is to represent the scene as a graph and use the graph-cut algorithm to minimize an energy function to effectively partition the Gaussians into foreground and background. To achieve this, we construct a graph based on scene Gaussians and devise a segmentation-aligned energy function on the graph to combine user inputs with scene properties. To obtain an initial coarse segmentation, we leverage 2D image/video segmentation models and further refine these coarse estimates using our graph construction. Our empirical evaluations show the adaptability of GaussianCut across a diverse set of scenes. GaussianCut achieves competitive performance with state-of-the-art approaches for 3D segmentation without requiring any additional segmentation-aware training.", + "arxiv_url": "http://arxiv.org/abs/2411.07555v1", + "pdf_url": "http://arxiv.org/pdf/2411.07555v1", + "published_date": "2024-11-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting", + "authors": [ + "Qiankun Gao", + "Jiarui Meng", + "Chengxiang Wen", + "Jie Chen", + "Jian Zhang" + ], + "abstract": "The online reconstruction of dynamic scenes from multi-view streaming videos faces significant challenges in training, rendering and storage efficiency. Harnessing superior learning speed and real-time rendering capabilities, 3D Gaussian Splatting (3DGS) has recently demonstrated considerable potential in this field. However, 3DGS can be inefficient in terms of storage and prone to overfitting by excessively growing Gaussians, particularly with limited views. This paper proposes an efficient framework, dubbed HiCoM, with three key components. First, we construct a compact and robust initial 3DGS representation using a perturbation smoothing strategy. Next, we introduce a Hierarchical Coherent Motion mechanism that leverages the inherent non-uniform distribution and local consistency of 3D Gaussians to swiftly and accurately learn motions across frames. Finally, we continually refine the 3DGS with additional Gaussians, which are later merged into the initial 3DGS to maintain consistency with the evolving scene. To preserve a compact representation, an equivalent number of low-opacity Gaussians that minimally impact the representation are removed before processing subsequent frames. Extensive experiments conducted on two widely used datasets show that our framework improves learning efficiency of the state-of-the-art methods by about $20\\%$ and reduces the data storage by $85\\%$, achieving competitive free-viewpoint video synthesis quality but with higher robustness and stability. Moreover, by parallel learning multiple frames simultaneously, our HiCoM decreases the average training wall time to $<2$ seconds per frame with negligible performance degradation, substantially boosting real-world applicability and responsiveness.", + "arxiv_url": "http://arxiv.org/abs/2411.07541v1", + "pdf_url": "http://arxiv.org/pdf/2411.07541v1", + "published_date": "2024-11-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering", + "authors": [ + "Zhihao Liang", + "Hongdong Li", + "Kui Jia", + "Kailing Guo", + "Qi Zhang" + ], + "abstract": "Recovering the intrinsic physical attributes of a scene from images, generally termed as the inverse rendering problem, has been a central and challenging task in computer vision and computer graphics. In this paper, we present GUS-IR, a novel framework designed to address the inverse rendering problem for complicated scenes featuring rough and glossy surfaces. This paper starts by analyzing and comparing two prominent shading techniques popularly used for inverse rendering, forward shading and deferred shading, effectiveness in handling complex materials. More importantly, we propose a unified shading solution that combines the advantages of both techniques for better decomposition. In addition, we analyze the normal modeling in 3D Gaussian Splatting (3DGS) and utilize the shortest axis as normal for each particle in GUS-IR, along with a depth-related regularization, resulting in improved geometric representation and better shape reconstruction. Furthermore, we enhance the probe-based baking scheme proposed by GS-IR to achieve more accurate ambient occlusion modeling to better handle indirect illumination. Extensive experiments have demonstrated the superior performance of GUS-IR in achieving precise intrinsic decomposition and geometric representation, supporting many downstream tasks (such as relighting, retouching) in computer vision, graphics, and extended reality.", + "arxiv_url": "http://arxiv.org/abs/2411.07478v1", + "pdf_url": "http://arxiv.org/pdf/2411.07478v1", + "published_date": "2024-11-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Hierarchical Compression Technique for 3D Gaussian Splatting Compression", + "authors": [ + "He Huang", + "Wenjie Huang", + "Qi Yang", + "Yiling Xu", + "Zhu li" + ], + "abstract": "3D Gaussian Splatting (GS) demonstrates excellent rendering quality and generation speed in novel view synthesis. However, substantial data size poses challenges for storage and transmission, making 3D GS compression an essential technology. Current 3D GS compression research primarily focuses on developing more compact scene representations, such as converting explicit 3D GS data into implicit forms. In contrast, compression of the GS data itself has hardly been explored. To address this gap, we propose a Hierarchical GS Compression (HGSC) technique. Initially, we prune unimportant Gaussians based on importance scores derived from both global and local significance, effectively reducing redundancy while maintaining visual quality. An Octree structure is used to compress 3D positions. Based on the 3D GS Octree, we implement a hierarchical attribute compression strategy by employing a KD-tree to partition the 3D GS into multiple blocks. We apply farthest point sampling to select anchor primitives within each block and others as non-anchor primitives with varying Levels of Details (LoDs). Anchor primitives serve as reference points for predicting non-anchor primitives across different LoDs to reduce spatial redundancy. For anchor primitives, we use the region adaptive hierarchical transform to achieve near-lossless compression of various attributes. For non-anchor primitives, each is predicted based on the k-nearest anchor primitives. To further minimize prediction errors, the reconstructed LoD and anchor primitives are combined to form new anchor primitives to predict the next LoD. Our method notably achieves superior compression quality and a significant data size reduction of over 4.5 times compared to the state-of-the-art compression method on small scenes datasets.", + "arxiv_url": "http://arxiv.org/abs/2411.06976v1", + "pdf_url": "http://arxiv.org/pdf/2411.06976v1", + "published_date": "2024-11-11", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Adaptive and Temporally Consistent Gaussian Surfels for Multi-view Dynamic Reconstruction", + "authors": [ + "Decai Chen", + "Brianne Oberson", + "Ingo Feldmann", + "Oliver Schreer", + "Anna Hilsmann", + "Peter Eisert" + ], + "abstract": "3D Gaussian Splatting has recently achieved notable success in novel view synthesis for dynamic scenes and geometry reconstruction in static scenes. Building on these advancements, early methods have been developed for dynamic surface reconstruction by globally optimizing entire sequences. However, reconstructing dynamic scenes with significant topology changes, emerging or disappearing objects, and rapid movements remains a substantial challenge, particularly for long sequences. To address these issues, we propose AT-GS, a novel method for reconstructing high-quality dynamic surfaces from multi-view videos through per-frame incremental optimization. To avoid local minima across frames, we introduce a unified and adaptive gradient-aware densification strategy that integrates the strengths of conventional cloning and splitting techniques. Additionally, we reduce temporal jittering in dynamic surfaces by ensuring consistency in curvature maps across consecutive frames. Our method achieves superior accuracy and temporal coherence in dynamic surface reconstruction, delivering high-fidelity space-time novel view synthesis, even in complex and challenging scenes. Extensive experiments on diverse multi-view video datasets demonstrate the effectiveness of our approach, showing clear advantages over baseline methods. Project page: \\url{https://fraunhoferhhi.github.io/AT-GS}", + "arxiv_url": "http://arxiv.org/abs/2411.06602v1", + "pdf_url": "http://arxiv.org/pdf/2411.06602v1", + "published_date": "2024-11-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatFormer: Point Transformer for Robust 3D Gaussian Splatting", + "authors": [ + "Yutong Chen", + "Marko Mihajlovic", + "Xiyi Chen", + "Yiming Wang", + "Sergey Prokudin", + "Siyu Tang" + ], + "abstract": "3D Gaussian Splatting (3DGS) has recently transformed photorealistic reconstruction, achieving high visual fidelity and real-time performance. However, rendering quality significantly deteriorates when test views deviate from the camera angles used during training, posing a major challenge for applications in immersive free-viewpoint rendering and navigation. In this work, we conduct a comprehensive evaluation of 3DGS and related novel view synthesis methods under out-of-distribution (OOD) test camera scenarios. By creating diverse test cases with synthetic and real-world datasets, we demonstrate that most existing methods, including those incorporating various regularization techniques and data-driven priors, struggle to generalize effectively to OOD views. To address this limitation, we introduce SplatFormer, the first point transformer model specifically designed to operate on Gaussian splats. SplatFormer takes as input an initial 3DGS set optimized under limited training views and refines it in a single forward pass, effectively removing potential artifacts in OOD test views. To our knowledge, this is the first successful application of point transformers directly on 3DGS sets, surpassing the limitations of previous multi-scene training methods, which could handle only a restricted number of input views during inference. Our model significantly improves rendering quality under extreme novel views, achieving state-of-the-art performance in these challenging scenarios and outperforming various 3DGS regularization techniques, multi-scene models tailored for sparse view synthesis, and diffusion-based frameworks.", + "arxiv_url": "http://arxiv.org/abs/2411.06390v2", + "pdf_url": "http://arxiv.org/pdf/2411.06390v2", + "published_date": "2024-11-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Through the Curved Cover: Synthesizing Cover Aberrated Scenes with Refractive Field", + "authors": [ + "Liuyue Xie", + "Jiancong Guo", + "Laszlo A. Jeni", + "Zhiheng Jia", + "Mingyang Li", + "Yunwen Zhou", + "Chao Guo" + ], + "abstract": "Recent extended reality headsets and field robots have adopted covers to protect the front-facing cameras from environmental hazards and falls. The surface irregularities on the cover can lead to optical aberrations like blurring and non-parametric distortions. Novel view synthesis methods like NeRF and 3D Gaussian Splatting are ill-equipped to synthesize from sequences with optical aberrations. To address this challenge, we introduce SynthCover to enable novel view synthesis through protective covers for downstream extended reality applications. SynthCover employs a Refractive Field that estimates the cover's geometry, enabling precise analytical calculation of refracted rays. Experiments on synthetic and real-world scenes demonstrate our method's ability to accurately model scenes viewed through protective covers, achieving a significant improvement in rendering quality compared to prior methods. We also show that the model can adjust well to various cover geometries with synthetic sequences captured with covers of different surface curvatures. To motivate further studies on this problem, we provide the benchmarked dataset containing real and synthetic walkable scenes captured with protective cover optical aberrations.", + "arxiv_url": "http://arxiv.org/abs/2411.06365v1", + "pdf_url": "http://arxiv.org/pdf/2411.06365v1", + "published_date": "2024-11-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AI-Driven Stylization of 3D Environments", + "authors": [ + "Yuanbo Chen", + "Yixiao Kang", + "Yukun Song", + "Cyrus Vachha", + "Sining Huang" + ], + "abstract": "In this system, we discuss methods to stylize a scene of 3D primitive objects into a higher fidelity 3D scene using novel 3D representations like NeRFs and 3D Gaussian Splatting. Our approach leverages existing image stylization systems and image-to-3D generative models to create a pipeline that iteratively stylizes and composites 3D objects into scenes. We show our results on adding generated objects into a scene and discuss limitations.", + "arxiv_url": "http://arxiv.org/abs/2411.06067v1", + "pdf_url": "http://arxiv.org/pdf/2411.06067v1", + "published_date": "2024-11-09", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianSpa: An \"Optimizing-Sparsifying\" Simplification Framework for Compact and High-Quality 3D Gaussian Splatting", + "authors": [ + "Yangming Zhang", + "Wenqi Jia", + "Wei Niu", + "Miao Yin" + ], + "abstract": "3D Gaussian Splatting (3DGS) has emerged as a mainstream for novel view synthesis, leveraging continuous aggregations of Gaussian functions to model scene geometry. However, 3DGS suffers from substantial memory requirements to store the multitude of Gaussians, hindering its practicality. To address this challenge, we introduce GaussianSpa, an optimization-based simplification framework for compact and high-quality 3DGS. Specifically, we formulate the simplification as an optimization problem associated with the 3DGS training. Correspondingly, we propose an efficient \"optimizing-sparsifying\" solution that alternately solves two independent sub-problems, gradually imposing strong sparsity onto the Gaussians in the training process. Our comprehensive evaluations on various datasets show the superiority of GaussianSpa over existing state-of-the-art approaches. Notably, GaussianSpa achieves an average PSNR improvement of 0.9 dB on the real-world Deep Blending dataset with 10$\\times$ fewer Gaussians compared to the vanilla 3DGS. Our project page is available at https://gaussianspa.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2411.06019v1", + "pdf_url": "http://arxiv.org/pdf/2411.06019v1", + "published_date": "2024-11-09", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PEP-GS: Perceptually-Enhanced Precise Structured 3D Gaussians for View-Adaptive Rendering", + "authors": [ + "Junxi Jin", + "Xiulai Li", + "Haiping Huang", + "Lianjun Liu", + "Yujie Sun" + ], + "abstract": "Recent advances in structured 3D Gaussians for view-adaptive rendering, particularly through methods like Scaffold-GS, have demonstrated promising results in neural scene representation. However, existing approaches still face challenges in perceptual consistency and precise view-dependent effects. We present PEP-GS, a novel framework that enhances structured 3D Gaussians through three key innovations: (1) a Local-Enhanced Multi-head Self-Attention (LEMSA) mechanism that replaces spherical harmonics for more accurate view-dependent color decoding, and (2) Kolmogorov-Arnold Networks (KAN) that optimize Gaussian opacity and covariance functions for enhanced interpretability and splatting precision. (3) a Neural Laplacian Pyramid Decomposition (NLPD) that improves perceptual similarity across views. Our comprehensive evaluation across multiple datasets indicates that, compared to the current state-of-the-art methods, these improvements are particularly evident in challenging scenarios such as view-dependent effects, specular reflections, fine-scale details and false geometry generation.", + "arxiv_url": "http://arxiv.org/abs/2411.05731v1", + "pdf_url": "http://arxiv.org/pdf/2411.05731v1", + "published_date": "2024-11-08", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ProEdit: Simple Progression is All You Need for High-Quality 3D Scene Editing", + "authors": [ + "Jun-Kun Chen", + "Yu-Xiong Wang" + ], + "abstract": "This paper proposes ProEdit - a simple yet effective framework for high-quality 3D scene editing guided by diffusion distillation in a novel progressive manner. Inspired by the crucial observation that multi-view inconsistency in scene editing is rooted in the diffusion model's large feasible output space (FOS), our framework controls the size of FOS and reduces inconsistency by decomposing the overall editing task into several subtasks, which are then executed progressively on the scene. Within this framework, we design a difficulty-aware subtask decomposition scheduler and an adaptive 3D Gaussian splatting (3DGS) training strategy, ensuring high quality and efficiency in performing each subtask. Extensive evaluation shows that our ProEdit achieves state-of-the-art results in various scenes and challenging editing tasks, all through a simple framework without any expensive or sophisticated add-ons like distillation losses, components, or training procedures. Notably, ProEdit also provides a new way to control, preview, and select the \"aggressivity\" of editing operation during the editing process.", + "arxiv_url": "http://arxiv.org/abs/2411.05006v1", + "pdf_url": "http://arxiv.org/pdf/2411.05006v1", + "published_date": "2024-11-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVSplat360: Feed-Forward 360 Scene Synthesis from Sparse Views", + "authors": [ + "Yuedong Chen", + "Chuanxia Zheng", + "Haofei Xu", + "Bohan Zhuang", + "Andrea Vedaldi", + "Tat-Jen Cham", + "Jianfei Cai" + ], + "abstract": "We introduce MVSplat360, a feed-forward approach for 360{\\deg} novel view synthesis (NVS) of diverse real-world scenes, using only sparse observations. This setting is inherently ill-posed due to minimal overlap among input views and insufficient visual information provided, making it challenging for conventional methods to achieve high-quality results. Our MVSplat360 addresses this by effectively combining geometry-aware 3D reconstruction with temporally consistent video generation. Specifically, it refactors a feed-forward 3D Gaussian Splatting (3DGS) model to render features directly into the latent space of a pre-trained Stable Video Diffusion (SVD) model, where these features then act as pose and visual cues to guide the denoising process and produce photorealistic 3D-consistent views. Our model is end-to-end trainable and supports rendering arbitrary views with as few as 5 sparse input views. To evaluate MVSplat360's performance, we introduce a new benchmark using the challenging DL3DV-10K dataset, where MVSplat360 achieves superior visual quality compared to state-of-the-art methods on wide-sweeping or even 360{\\deg} NVS tasks. Experiments on the existing benchmark RealEstate10K also confirm the effectiveness of our model. The video results are available on our project page: https://donydchen.github.io/mvsplat360.", + "arxiv_url": "http://arxiv.org/abs/2411.04924v1", + "pdf_url": "http://arxiv.org/pdf/2411.04924v1", + "published_date": "2024-11-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting", + "authors": [ + "Jilan Mei", + "Junbo Li", + "Cai Meng" + ], + "abstract": "This paper proposes a new method for accurate and robust 6D pose estimation of novel objects, named GS2Pose. By introducing 3D Gaussian splatting, GS2Pose can utilize the reconstruction results without requiring a high-quality CAD model, which means it only requires segmented RGBD images as input. Specifically, GS2Pose employs a two-stage structure consisting of coarse estimation followed by refined estimation. In the coarse stage, a lightweight U-Net network with a polarization attention mechanism, called Pose-Net, is designed. By using the 3DGS model for supervised training, Pose-Net can generate NOCS images to compute a coarse pose. In the refinement stage, GS2Pose formulates a pose regression algorithm following the idea of reprojection or Bundle Adjustment (BA), referred to as GS-Refiner. By leveraging Lie algebra to extend 3DGS, GS-Refiner obtains a pose-differentiable rendering pipeline that refines the coarse pose by comparing the input images with the rendered images. GS-Refiner also selectively updates parameters in the 3DGS model to achieve environmental adaptation, thereby enhancing the algorithm's robustness and flexibility to illuminative variation, occlusion, and other challenging disruptive factors. GS2Pose was evaluated through experiments conducted on the LineMod dataset, where it was compared with similar algorithms, yielding highly competitive results. The code for GS2Pose will soon be released on GitHub.", + "arxiv_url": "http://arxiv.org/abs/2411.03807v3", + "pdf_url": "http://arxiv.org/pdf/2411.03807v3", + "published_date": "2024-11-06", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement", + "authors": [ + "Ziqi Lu", + "Jianbo Ye", + "John Leonard" + ], + "abstract": "We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes. Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times. Leveraging 3DGS's novel view rendering and EfficientSAM's zero-shot segmentation capabilities, we detect 2D object-level changes, which are then associated and fused across views to estimate 3D changes. Our method can detect changes in cluttered environments using sparse post-change images within as little as 18s, using as few as a single new image. It does not rely on depth input, user instructions, object classes, or object models -- An object is recognized simply if it has been re-arranged. Our approach is evaluated on both public and self-collected real-world datasets, achieving up to 14% higher accuracy and three orders of magnitude faster performance compared to the state-of-the-art radiance-field-based change detection method. This significant performance boost enables a broad range of downstream applications, where we highlight three key use cases: object reconstruction, robot workspace reset, and 3DGS model update. Our code and data will be made available at https://github.com/520xyxyzq/3DGS-CD.", + "arxiv_url": "http://arxiv.org/abs/2411.03706v1", + "pdf_url": "http://arxiv.org/pdf/2411.03706v1", + "published_date": "2024-11-06", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "https://github.com/520xyxyzq/3DGS-CD", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis", + "authors": [ + "Rui Peng", + "Wangze Xu", + "Luyang Tang", + "Liwei Liao", + "Jianbo Jiao", + "Ronggang Wang" + ], + "abstract": "Despite the substantial progress of novel view synthesis, existing methods, either based on the Neural Radiance Fields (NeRF) or more recently 3D Gaussian Splatting (3DGS), suffer significant degradation when the input becomes sparse. Numerous efforts have been introduced to alleviate this problem, but they still struggle to synthesize satisfactory results efficiently, especially in the large scene. In this paper, we propose SCGaussian, a Structure Consistent Gaussian Splatting method using matching priors to learn 3D consistent scene structure. Considering the high interdependence of Gaussian attributes, we optimize the scene structure in two folds: rendering geometry and, more importantly, the position of Gaussian primitives, which is hard to be directly constrained in the vanilla 3DGS due to the non-structure property. To achieve this, we present a hybrid Gaussian representation. Besides the ordinary non-structure Gaussian primitives, our model also consists of ray-based Gaussian primitives that are bound to matching rays and whose optimization of their positions is restricted along the ray. Thus, we can utilize the matching correspondence to directly enforce the position of these Gaussian primitives to converge to the surface points where rays intersect. Extensive experiments on forward-facing, surrounding, and complex large scenes show the effectiveness of our approach with state-of-the-art performance and high efficiency. Code is available at https://github.com/prstrive/SCGaussian.", + "arxiv_url": "http://arxiv.org/abs/2411.03637v1", + "pdf_url": "http://arxiv.org/pdf/2411.03637v1", + "published_date": "2024-11-06", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/prstrive/SCGaussian", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting", + "authors": [ + "Michael Büttner", + "Jonathan Francis", + "Helge Rhodin", + "Andrew Melnik" + ], + "abstract": "This paper introduces a method to enhance Interactive Imitation Learning (IIL) by extracting touch interaction points and tracking object movement from video demonstrations. The approach extends current IIL systems by providing robots with detailed knowledge of both where and how to interact with objects, particularly complex articulated ones like doors and drawers. By leveraging cutting-edge techniques such as 3D Gaussian Splatting and FoundationPose for tracking, this method allows robots to better understand and manipulate objects in dynamic environments. The research lays the foundation for more effective task learning and execution in autonomous robotic systems.", + "arxiv_url": "http://arxiv.org/abs/2411.03555v1", + "pdf_url": "http://arxiv.org/pdf/2411.03555v1", + "published_date": "2024-11-05", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features", + "authors": [ + "Arnab Dey", + "Cheng-You Lu", + "Andrew I. Comport", + "Srinath Sridhar", + "Chin-Teng Lin", + "Jean Martinet" + ], + "abstract": "Recent advancements in radiance field rendering show promising results in 3D scene representation, where Gaussian splatting-based techniques emerge as state-of-the-art due to their quality and efficiency. Gaussian splatting is widely used for various applications, including 3D human representation. However, previous 3D Gaussian splatting methods either use parametric body models as additional information or fail to provide any underlying structure, like human biomechanical features, which are essential for different applications. In this paper, we present a novel approach called HFGaussian that can estimate novel views and human features, such as the 3D skeleton, 3D key points, and dense pose, from sparse input images in real time at 25 FPS. The proposed method leverages generalizable Gaussian splatting technique to represent the human subject and its associated features, enabling efficient and generalizable reconstruction. By incorporating a pose regression network and the feature splatting technique with Gaussian splatting, HFGaussian demonstrates improved capabilities over existing 3D human methods, showcasing the potential of 3D human representations with integrated biomechanics. We thoroughly evaluate our HFGaussian method against the latest state-of-the-art techniques in human Gaussian splatting and pose estimation, demonstrating its real-time, state-of-the-art performance.", + "arxiv_url": "http://arxiv.org/abs/2411.03086v1", + "pdf_url": "http://arxiv.org/pdf/2411.03086v1", + "published_date": "2024-11-05", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LVI-GS: Tightly-coupled LiDAR-Visual-Inertial SLAM using 3D Gaussian Splatting", + "authors": [ + "Huibin Zhao", + "Weipeng Guan", + "Peng Lu" + ], + "abstract": "3D Gaussian Splatting (3DGS) has shown its ability in rapid rendering and high-fidelity mapping. In this paper, we introduce LVI-GS, a tightly-coupled LiDAR-Visual-Inertial mapping framework with 3DGS, which leverages the complementary characteristics of LiDAR and image sensors to capture both geometric structures and visual details of 3D scenes. To this end, the 3D Gaussians are initialized from colourized LiDAR points and optimized using differentiable rendering. In order to achieve high-fidelity mapping, we introduce a pyramid-based training approach to effectively learn multi-level features and incorporate depth loss derived from LiDAR measurements to improve geometric feature perception. Through well-designed strategies for Gaussian-Map expansion, keyframe selection, thread management, and custom CUDA acceleration, our framework achieves real-time photo-realistic mapping. Numerical experiments are performed to evaluate the superior performance of our method compared to state-of-the-art 3D reconstruction systems.", + "arxiv_url": "http://arxiv.org/abs/2411.02703v1", + "pdf_url": "http://arxiv.org/pdf/2411.02703v1", + "published_date": "2024-11-05", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Modeling Uncertainty in 3D Gaussian Splatting through Continuous Semantic Splatting", + "authors": [ + "Joey Wilson", + "Marcelino Almeida", + "Min Sun", + "Sachit Mahajan", + "Maani Ghaffari", + "Parker Ewen", + "Omid Ghasemalizadeh", + "Cheng-Hao Kuo", + "Arnie Sen" + ], + "abstract": "In this paper, we present a novel algorithm for probabilistically updating and rasterizing semantic maps within 3D Gaussian Splatting (3D-GS). Although previous methods have introduced algorithms which learn to rasterize features in 3D-GS for enhanced scene understanding, 3D-GS can fail without warning which presents a challenge for safety-critical robotic applications. To address this gap, we propose a method which advances the literature of continuous semantic mapping from voxels to ellipsoids, combining the precise structure of 3D-GS with the ability to quantify uncertainty of probabilistic robotic maps. Given a set of images, our algorithm performs a probabilistic semantic update directly on the 3D ellipsoids to obtain an expectation and variance through the use of conjugate priors. We also propose a probabilistic rasterization which returns per-pixel segmentation predictions with quantifiable uncertainty. We compare our method with similar probabilistic voxel-based methods to verify our extension to 3D ellipsoids, and perform ablation studies on uncertainty quantification and temporal smoothing.", + "arxiv_url": "http://arxiv.org/abs/2411.02547v1", + "pdf_url": "http://arxiv.org/pdf/2411.02547v1", + "published_date": "2024-11-04", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatOverflow: Asynchronous Hardware Troubleshooting", + "authors": [ + "Amritansh Kwatra", + "Tobias Wienberg", + "Ilan Mandel", + "Ritik Batra", + "Peter He", + "Francois Guimbretiere", + "Thijs Roumen" + ], + "abstract": "As tools for designing and manufacturing hardware become more accessible, smaller producers can develop and distribute novel hardware. However, there aren't established tools to support end-user hardware troubleshooting or routine maintenance. As a result, technical support for hardware remains ad-hoc and challenging to scale. Inspired by software troubleshooting workflows like StackOverflow, we propose a workflow for asynchronous hardware troubleshooting: SplatOverflow. SplatOverflow creates a novel boundary object, the SplatOverflow scene, that users reference to communicate about hardware. The scene comprises a 3D Gaussian Splat of the user's hardware registered onto the hardware's CAD model. The splat captures the current state of the hardware, and the registered CAD model acts as a referential anchor for troubleshooting instructions. With SplatOverflow, maintainers can directly address issues and author instructions in the user's workspace. The instructions define workflows that can easily be shared between users and recontextualized in new environments. In this paper, we describe the design of SplatOverflow, detail the workflows it enables, and illustrate its utility to different kinds of users. We also validate that non-experts can use SplatOverflow to troubleshoot common problems with a 3D printer in a user study.", + "arxiv_url": "http://arxiv.org/abs/2411.02332v2", + "pdf_url": "http://arxiv.org/pdf/2411.02332v2", + "published_date": "2024-11-04", + "categories": [ + "cs.HC" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training", + "authors": [ + "Ruihong Yin", + "Vladimir Yugay", + "Yue Li", + "Sezer Karaoglu", + "Theo Gevers" + ], + "abstract": "The field of novel view synthesis from images has seen rapid advancements with the introduction of Neural Radiance Fields (NeRF) and more recently with 3D Gaussian Splatting. Gaussian Splatting became widely adopted due to its efficiency and ability to render novel views accurately. While Gaussian Splatting performs well when a sufficient amount of training images are available, its unstructured explicit representation tends to overfit in scenarios with sparse input images, resulting in poor rendering performance. To address this, we present a 3D Gaussian-based novel view synthesis method using sparse input images that can accurately render the scene from the viewpoints not covered by the training images. We propose a multi-stage training scheme with matching-based consistency constraints imposed on the novel views without relying on pre-trained depth estimation or diffusion models. This is achieved by using the matches of the available training images to supervise the generation of the novel views sampled between the training frames with color, geometry, and semantic losses. In addition, we introduce a locality preserving regularization for 3D Gaussians which removes rendering artifacts by preserving the local color structure of the scene. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance of our method in few-shot novel view synthesis compared to existing state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2411.02229v2", + "pdf_url": "http://arxiv.org/pdf/2411.02229v2", + "published_date": "2024-11-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes", + "authors": [ + "Gaochao Song", + "Chong Cheng", + "Hao Wang" + ], + "abstract": "In this paper we present a novel method for efficient and effective 3D surface reconstruction in open scenes. Existing Neural Radiance Fields (NeRF) based works typically require extensive training and rendering time due to the adopted implicit representations. In contrast, 3D Gaussian splatting (3DGS) uses an explicit and discrete representation, hence the reconstructed surface is built by the huge number of Gaussian primitives, which leads to excessive memory consumption and rough surface details in sparse Gaussian areas. To address these issues, we propose Gaussian Voxel Kernel Functions (GVKF), which establish a continuous scene representation based on discrete 3DGS through kernel regression. The GVKF integrates fast 3DGS rasterization and highly effective scene implicit representations, achieving high-fidelity open scene surface reconstruction. Experiments on challenging scene datasets demonstrate the efficiency and effectiveness of our proposed GVKF, featuring with high reconstruction quality, real-time rendering speed, significant savings in storage and training memory consumption.", + "arxiv_url": "http://arxiv.org/abs/2411.01853v2", + "pdf_url": "http://arxiv.org/pdf/2411.01853v2", + "published_date": "2024-11-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Real-Time Spatio-Temporal Reconstruction of Dynamic Endoscopic Scenes with 4D Gaussian Splatting", + "authors": [ + "Fengze Li", + "Jishuai He", + "Jieming Ma", + "Zhijing Wu" + ], + "abstract": "Dynamic scene reconstruction is essential in robotic minimally invasive surgery, providing crucial spatial information that enhances surgical precision and outcomes. However, existing methods struggle to address the complex, temporally dynamic nature of endoscopic scenes. This paper presents ST-Endo4DGS, a novel framework that models the spatio-temporal volume of dynamic endoscopic scenes using unbiased 4D Gaussian Splatting (4DGS) primitives, parameterized by anisotropic ellipses with flexible 4D rotations. This approach enables precise representation of deformable tissue dynamics, capturing intricate spatial and temporal correlations in real time. Additionally, we extend spherindrical harmonics to represent time-evolving appearance, achieving realistic adaptations to lighting and view changes. A new endoscopic normal alignment constraint (ENAC) further enhances geometric fidelity by aligning rendered normals with depth-derived geometry. Extensive evaluations show that ST-Endo4DGS outperforms existing methods in both visual quality and real-time performance, establishing a new state-of-the-art in dynamic scene reconstruction for endoscopic surgery.", + "arxiv_url": "http://arxiv.org/abs/2411.01218v1", + "pdf_url": "http://arxiv.org/pdf/2411.01218v1", + "published_date": "2024-11-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CityGaussianV2: Efficient and Geometrically Accurate Reconstruction for Large-Scale Scenes", + "authors": [ + "Yang Liu", + "Chuanchen Luo", + "Zhongkai Mao", + "Junran Peng", + "Zhaoxiang Zhang" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has revolutionized radiance field reconstruction, manifesting efficient and high-fidelity novel view synthesis. However, accurately representing surfaces, especially in large and complex scenarios, remains a significant challenge due to the unstructured nature of 3DGS. In this paper, we present CityGaussianV2, a novel approach for large-scale scene reconstruction that addresses critical challenges related to geometric accuracy and efficiency. Building on the favorable generalization capabilities of 2D Gaussian Splatting (2DGS), we address its convergence and scalability issues. Specifically, we implement a decomposed-gradient-based densification and depth regression technique to eliminate blurry artifacts and accelerate convergence. To scale up, we introduce an elongation filter that mitigates Gaussian count explosion caused by 2DGS degeneration. Furthermore, we optimize the CityGaussian pipeline for parallel training, achieving up to 10$\\times$ compression, at least 25% savings in training time, and a 50% decrease in memory usage. We also established standard geometry benchmarks under large-scale scenes. Experimental results demonstrate that our method strikes a promising balance between visual quality, geometric accuracy, as well as storage and training costs. The project page is available at https://dekuliutesla.github.io/CityGaussianV2/.", + "arxiv_url": "http://arxiv.org/abs/2411.00771v1", + "pdf_url": "http://arxiv.org/pdf/2411.00771v1", + "published_date": "2024-11-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PCoTTA: Continual Test-Time Adaptation for Multi-Task Point Cloud Understanding", + "authors": [ + "Jincen Jiang", + "Qianyu Zhou", + "Yuhang Li", + "Xinkui Zhao", + "Meili Wang", + "Lizhuang Ma", + "Jian Chang", + "Jian Jun Zhang", + "Xuequan Lu" + ], + "abstract": "In this paper, we present PCoTTA, an innovative, pioneering framework for Continual Test-Time Adaptation (CoTTA) in multi-task point cloud understanding, enhancing the model's transferability towards the continually changing target domain. We introduce a multi-task setting for PCoTTA, which is practical and realistic, handling multiple tasks within one unified model during the continual adaptation. Our PCoTTA involves three key components: automatic prototype mixture (APM), Gaussian Splatted feature shifting (GSFS), and contrastive prototype repulsion (CPR). Firstly, APM is designed to automatically mix the source prototypes with the learnable prototypes with a similarity balancing factor, avoiding catastrophic forgetting. Then, GSFS dynamically shifts the testing sample toward the source domain, mitigating error accumulation in an online manner. In addition, CPR is proposed to pull the nearest learnable prototype close to the testing feature and push it away from other prototypes, making each prototype distinguishable during the adaptation. Experimental comparisons lead to a new benchmark, demonstrating PCoTTA's superiority in boosting the model's transferability towards the continually changing target domain.", + "arxiv_url": "http://arxiv.org/abs/2411.00632v1", + "pdf_url": "http://arxiv.org/pdf/2411.00632v1", + "published_date": "2024-11-01", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes", + "authors": [ + "Shaohua Liu", + "Junzhe Lu", + "Zuoya Gu", + "Jiajun Li", + "Yue Deng" + ], + "abstract": "Representing underwater 3D scenes is a valuable yet complex task, as attenuation and scattering effects during underwater imaging significantly couple the information of the objects and the water. This coupling presents a significant challenge for existing methods in effectively representing both the objects and the water medium simultaneously. To address this challenge, we propose Aquatic-GS, a hybrid 3D representation approach for underwater scenes that effectively represents both the objects and the water medium. Specifically, we construct a Neural Water Field (NWF) to implicitly model the water parameters, while extending the latest 3D Gaussian Splatting (3DGS) to model the objects explicitly. Both components are integrated through a physics-based underwater image formation model to represent complex underwater scenes. Moreover, to construct more precise scene geometry and details, we design a Depth-Guided Optimization (DGO) mechanism that uses a pseudo-depth map as auxiliary guidance. After optimization, Aquatic-GS enables the rendering of novel underwater viewpoints and supports restoring the true appearance of underwater scenes, as if the water medium were absent. Extensive experiments on both simulated and real-world datasets demonstrate that Aquatic-GS surpasses state-of-the-art underwater 3D representation methods, achieving better rendering quality and real-time rendering performance with a 410x increase in speed. Furthermore, regarding underwater image restoration, Aquatic-GS outperforms representative dewatering methods in color correction, detail recovery, and stability. Our models, code, and datasets can be accessed at https://aquaticgs.github.io.", + "arxiv_url": "http://arxiv.org/abs/2411.00239v1", + "pdf_url": "http://arxiv.org/pdf/2411.00239v1", + "published_date": "2024-10-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis", + "authors": [ + "Chen Zhao", + "Xuan Wang", + "Tong Zhang", + "Saqib Javed", + "Mathieu Salzmann" + ], + "abstract": "3D Gaussian Splatting (3DGS) has demonstrated remarkable effectiveness for novel view synthesis (NVS). However, the 3DGS model tends to overfit when trained with sparse posed views, limiting its generalization ability to novel views. In this paper, we alleviate the overfitting problem, presenting a Self-Ensembling Gaussian Splatting (SE-GS) approach. Our method encompasses a $\\mathbf{\\Sigma}$-model and a $\\mathbf{\\Delta}$-model. The $\\mathbf{\\Sigma}$-model serves as an ensemble of 3DGS models that generates novel-view images during inference. We achieve the self-ensembling by introducing an uncertainty-aware perturbation strategy at the training state. We complement the $\\mathbf{\\Sigma}$-model with the $\\mathbf{\\Delta}$-model, which is dynamically perturbed based on the uncertainties of novel-view renderings across different training steps. The perturbation yields diverse temporal samples in the Gaussian parameter space without additional training costs. The geometry of the $\\mathbf{\\Sigma}$-model is regularized by penalizing discrepancies between the $\\mathbf{\\Sigma}$-model and these temporal samples. Therefore, our SE-GS conducts an effective and efficient regularization across a large number of 3DGS models, resulting in a robust ensemble, the $\\mathbf{\\Sigma}$-model. Our experimental results on the LLFF, Mip-NeRF360, DTU, and MVImgNet datasets show that our approach improves NVS quality with few-shot training views, outperforming existing state-of-the-art methods. The code is released at: https://sailor-z.github.io/projects/SEGS.html.", + "arxiv_url": "http://arxiv.org/abs/2411.00144v2", + "pdf_url": "http://arxiv.org/pdf/2411.00144v2", + "published_date": "2024-10-31", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "URAvatar: Universal Relightable Gaussian Codec Avatars", + "authors": [ + "Junxuan Li", + "Chen Cao", + "Gabriel Schwartz", + "Rawal Khirodkar", + "Christian Richardt", + "Tomas Simon", + "Yaser Sheikh", + "Shunsuke Saito" + ], + "abstract": "We present a new approach to creating photorealistic and relightable head avatars from a phone scan with unknown illumination. The reconstructed avatars can be animated and relit in real time with the global illumination of diverse environments. Unlike existing approaches that estimate parametric reflectance parameters via inverse rendering, our approach directly models learnable radiance transfer that incorporates global light transport in an efficient manner for real-time rendering. However, learning such a complex light transport that can generalize across identities is non-trivial. A phone scan in a single environment lacks sufficient information to infer how the head would appear in general environments. To address this, we build a universal relightable avatar model represented by 3D Gaussians. We train on hundreds of high-quality multi-view human scans with controllable point lights. High-resolution geometric guidance further enhances the reconstruction accuracy and generalization. Once trained, we finetune the pretrained model on a phone scan using inverse rendering to obtain a personalized relightable avatar. Our experiments establish the efficacy of our design, outperforming existing approaches while retaining real-time rendering capability.", + "arxiv_url": "http://arxiv.org/abs/2410.24223v1", + "pdf_url": "http://arxiv.org/pdf/2410.24223v1", + "published_date": "2024-10-31", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images", + "authors": [ + "Botao Ye", + "Sifei Liu", + "Haofei Xu", + "Xueting Li", + "Marc Pollefeys", + "Ming-Hsuan Yang", + "Songyou Peng" + ], + "abstract": "We introduce NoPoSplat, a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from \\textit{unposed} sparse multi-view images. Our model, trained exclusively with photometric loss, achieves real-time 3D Gaussian reconstruction during inference. To eliminate the need for accurate pose input during reconstruction, we anchor one input view's local camera coordinates as the canonical space and train the network to predict Gaussian primitives for all views within this space. This approach obviates the need to transform Gaussian primitives from local coordinates into a global coordinate system, thus avoiding errors associated with per-frame Gaussians and pose estimation. To resolve scale ambiguity, we design and compare various intrinsic embedding methods, ultimately opting to convert camera intrinsics into a token embedding and concatenate it with image tokens as input to the model, enabling accurate scene scale prediction. We utilize the reconstructed 3D Gaussians for novel view synthesis and pose estimation tasks and propose a two-stage coarse-to-fine pipeline for accurate pose estimation. Experimental results demonstrate that our pose-free approach can achieve superior novel view synthesis quality compared to pose-required methods, particularly in scenarios with limited input image overlap. For pose estimation, our method, trained without ground truth depth or explicit matching loss, significantly outperforms the state-of-the-art methods with substantial improvements. This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios. Code and trained models are available at https://noposplat.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2410.24207v1", + "pdf_url": "http://arxiv.org/pdf/2410.24207v1", + "published_date": "2024-10-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering", + "authors": [ + "Kai Ye", + "Chong Gao", + "Guanbin Li", + "Wenzheng Chen", + "Baoquan Chen" + ], + "abstract": "We consider the problem of physically-based inverse rendering using 3D Gaussian Splatting (3DGS) representations. While recent 3DGS methods have achieved remarkable results in novel view synthesis (NVS), accurately capturing high-fidelity geometry, physically interpretable materials and lighting remains challenging, as it requires precise geometry modeling to provide accurate surface normals, along with physically-based rendering (PBR) techniques to ensure correct material and lighting disentanglement. Previous 3DGS methods resort to approximating surface normals, but often struggle with noisy local geometry, leading to inaccurate normal estimation and suboptimal material-lighting decomposition. In this paper, we introduce GeoSplatting, a novel hybrid representation that augments 3DGS with explicit geometric guidance and differentiable PBR equations. Specifically, we bridge isosurface and 3DGS together, where we first extract isosurface mesh from a scalar field, then convert it into 3DGS points and formulate PBR equations for them in a fully differentiable manner. In GeoSplatting, 3DGS is grounded on the mesh geometry, enabling precise surface normal modeling, which facilitates the use of PBR frameworks for material decomposition. This approach further maintains the efficiency and quality of NVS from 3DGS while ensuring accurate geometry from the isosurface. Comprehensive evaluations across diverse datasets demonstrate the superiority of GeoSplatting, consistently outperforming existing methods both quantitatively and qualitatively.", + "arxiv_url": "http://arxiv.org/abs/2410.24204v2", + "pdf_url": "http://arxiv.org/pdf/2410.24204v2", + "published_date": "2024-10-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting", + "authors": [ + "Xiufeng Huang", + "Ruiqi Li", + "Yiu-ming Cheung", + "Ka Chun Cheung", + "Simon See", + "Renjie Wan" + ], + "abstract": "3D Gaussian Splatting (3DGS) has become a crucial method for acquiring 3D assets. To protect the copyright of these assets, digital watermarking techniques can be applied to embed ownership information discreetly within 3DGS models. However, existing watermarking methods for meshes, point clouds, and implicit radiance fields cannot be directly applied to 3DGS models, as 3DGS models use explicit 3D Gaussians with distinct structures and do not rely on neural networks. Naively embedding the watermark on a pre-trained 3DGS can cause obvious distortion in rendered images. In our work, we propose an uncertainty-based method that constrains the perturbation of model parameters to achieve invisible watermarking for 3DGS. At the message decoding stage, the copyright messages can be reliably extracted from both 3D Gaussians and 2D rendered images even under various forms of 3D and 2D distortions. We conduct extensive experiments on the Blender, LLFF and MipNeRF-360 datasets to validate the effectiveness of our proposed method, demonstrating state-of-the-art performance on both message decoding accuracy and view synthesis quality.", + "arxiv_url": "http://arxiv.org/abs/2410.23718v1", + "pdf_url": "http://arxiv.org/pdf/2410.23718v1", + "published_date": "2024-10-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM", + "authors": [ + "Xiaomeng Wang", + "Nan Wang", + "Guofeng Zhang" + ], + "abstract": "In this paper, we propose a flexible SLAM framework, XRDSLAM. It adopts a modular code design and a multi-process running mechanism, providing highly reusable foundational modules such as unified dataset management, 3d visualization, algorithm configuration, and metrics evaluation. It can help developers quickly build a complete SLAM system, flexibly combine different algorithm modules, and conduct standardized benchmarking for accuracy and efficiency comparison. Within this framework, we integrate several state-of-the-art SLAM algorithms with different types, including NeRF and 3DGS based SLAM, and even odometry or reconstruction algorithms, which demonstrates the flexibility and extensibility. We also conduct a comprehensive comparison and evaluation of these integrated algorithms, analyzing the characteristics of each. Finally, we contribute all the code, configuration and data to the open-source community, which aims to promote the widespread research and development of SLAM technology within the open-source ecosystem.", + "arxiv_url": "http://arxiv.org/abs/2410.23690v1", + "pdf_url": "http://arxiv.org/pdf/2410.23690v1", + "published_date": "2024-10-31", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring", + "authors": [ + "Dongwoo Lee", + "Joonkyu Park", + "Kyoung Mu Lee" + ], + "abstract": "To train a deblurring network, an appropriate dataset with paired blurry and sharp images is essential. Existing datasets collect blurry images either synthetically by aggregating consecutive sharp frames or using sophisticated camera systems to capture real blur. However, these methods offer limited diversity in blur types (blur trajectories) or require extensive human effort to reconstruct large-scale datasets, failing to fully reflect real-world blur scenarios. To address this, we propose GS-Blur, a dataset of synthesized realistic blurry images created using a novel approach. To this end, we first reconstruct 3D scenes from multi-view images using 3D Gaussian Splatting (3DGS), then render blurry images by moving the camera view along the randomly generated motion trajectories. By adopting various camera trajectories in reconstructing our GS-Blur, our dataset contains realistic and diverse types of blur, offering a large-scale dataset that generalizes well to real-world blur. Using GS-Blur with various deblurring methods, we demonstrate its ability to generalize effectively compared to previous synthetic or real blur datasets, showing significant improvements in deblurring performance.", + "arxiv_url": "http://arxiv.org/abs/2410.23658v1", + "pdf_url": "http://arxiv.org/pdf/2410.23658v1", + "published_date": "2024-10-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting", + "authors": [ + "Muhammad Salman Ali", + "Sung-Ho Bae", + "Enzo Tartaglione" + ], + "abstract": "3D models have recently been popularized by the potentiality of end-to-end training offered first by Neural Radiance Fields and most recently by 3D Gaussian Splatting models. The latter has the big advantage of naturally providing fast training convergence and high editability. However, as the research around these is still in its infancy, there is still a gap in the literature regarding the model's scalability. In this work, we propose an approach enabling both memory and computation scalability of such models. More specifically, we propose an iterative pruning strategy that removes redundant information encoded in the model. We also enhance compressibility for the model by including in the optimization strategy a differentiable quantization and entropy coding estimator. Our results on popular benchmarks showcase the effectiveness of the proposed approach and open the road to the broad deployability of such a solution even on resource-constrained devices.", + "arxiv_url": "http://arxiv.org/abs/2410.23213v1", + "pdf_url": "http://arxiv.org/pdf/2410.23213v1", + "published_date": "2024-10-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis", + "authors": [ + "Zhiyuan Min", + "Yawei Luo", + "Jianwen Sun", + "Yi Yang" + ], + "abstract": "Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality. Project page: https://tatakai1.github.io/efreesplat/.", + "arxiv_url": "http://arxiv.org/abs/2410.22817v2", + "pdf_url": "http://arxiv.org/pdf/2410.22817v2", + "published_date": "2024-10-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Geometry Cloak: Preventing TGS-based 3D Reconstruction from Copyrighted Images", + "authors": [ + "Qi Song", + "Ziyuan Luo", + "Ka Chun Cheung", + "Simon See", + "Renjie Wan" + ], + "abstract": "Single-view 3D reconstruction methods like Triplane Gaussian Splatting (TGS) have enabled high-quality 3D model generation from just a single image input within seconds. However, this capability raises concerns about potential misuse, where malicious users could exploit TGS to create unauthorized 3D models from copyrighted images. To prevent such infringement, we propose a novel image protection approach that embeds invisible geometry perturbations, termed \"geometry cloaks\", into images before supplying them to TGS. These carefully crafted perturbations encode a customized message that is revealed when TGS attempts 3D reconstructions of the cloaked image. Unlike conventional adversarial attacks that simply degrade output quality, our method forces TGS to fail the 3D reconstruction in a specific way - by generating an identifiable customized pattern that acts as a watermark. This watermark allows copyright holders to assert ownership over any attempted 3D reconstructions made from their protected images. Extensive experiments have verified the effectiveness of our geometry cloak. Our project is available at https://qsong2001.github.io/geometry_cloak.", + "arxiv_url": "http://arxiv.org/abs/2410.22705v1", + "pdf_url": "http://arxiv.org/pdf/2410.22705v1", + "published_date": "2024-10-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting", + "authors": [ + "Sunghwan Hong", + "Jaewoo Jung", + "Heeseong Shin", + "Jisang Han", + "Jiaolong Yang", + "Chong Luo", + "Seungryong Kim" + ], + "abstract": "We consider the problem of novel view synthesis from unposed images in a single feed-forward. Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS, where we further extend it to offer a practical solution that relaxes common assumptions such as dense image views, accurate camera poses, and substantial image overlaps. We achieve this through identifying and addressing unique challenges arising from the use of pixel-aligned 3DGS: misaligned 3D Gaussians across different views induce noisy or sparse gradients that destabilize training and hinder convergence, especially when above assumptions are not met. To mitigate this, we employ pre-trained monocular depth estimation and visual correspondence models to achieve coarse alignments of 3D Gaussians. We then introduce lightweight, learnable modules to refine depth and pose estimates from the coarse alignments, improving the quality of 3D reconstruction and novel view synthesis. Furthermore, the refined estimates are leveraged to estimate geometry confidence scores, which assess the reliability of 3D Gaussian centers and condition the prediction of Gaussian parameters accordingly. Extensive evaluations on large-scale real-world datasets demonstrate that PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.", + "arxiv_url": "http://arxiv.org/abs/2410.22128v1", + "pdf_url": "http://arxiv.org/pdf/2410.22128v1", + "published_date": "2024-10-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FreeGaussian: Guidance-free Controllable 3D Gaussian Splats with Flow Derivatives", + "authors": [ + "Qizhi Chen", + "Delin Qu", + "Yiwen Tang", + "Haoming Song", + "Yiting Zhang", + "Dong Wang", + "Bin Zhao", + "Xuelong Li" + ], + "abstract": "Reconstructing controllable Gaussian splats from monocular video is a challenging task due to its inherently insufficient constraints. Widely adopted approaches supervise complex interactions with additional masks and control signal annotations, limiting their real-world applications. In this paper, we propose an annotation guidance-free method, dubbed FreeGaussian, that mathematically derives dynamic Gaussian motion from optical flow and camera motion using novel dynamic Gaussian constraints. By establishing a connection between 2D flows and 3D Gaussian dynamic control, our method enables self-supervised optimization and continuity of dynamic Gaussian motions from flow priors. Furthermore, we introduce a 3D spherical vector controlling scheme, which represents the state with a 3D Gaussian trajectory, thereby eliminating the need for complex 1D control signal calculations and simplifying controllable Gaussian modeling. Quantitative and qualitative evaluations on extensive experiments demonstrate the state-of-the-art visual performance and control capability of our method. Project page: https://freegaussian.github.io.", + "arxiv_url": "http://arxiv.org/abs/2410.22070v1", + "pdf_url": "http://arxiv.org/pdf/2410.22070v1", + "published_date": "2024-10-29", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting", + "authors": [ + "Yuetao Li", + "Zijia Kuang", + "Ting Li", + "Guyue Zhou", + "Shaohui Zhang", + "Zike Yan" + ], + "abstract": "We propose ActiveSplat, an autonomous high-fidelity reconstruction system leveraging Gaussian splatting. Taking advantage of efficient and realistic rendering, the system establishes a unified framework for online mapping, viewpoint selection, and path planning. The key to ActiveSplat is a hybrid map representation that integrates both dense information about the environment and a sparse abstraction of the workspace. Therefore, the system leverages sparse topology for efficient viewpoint sampling and path planning, while exploiting view-dependent dense prediction for viewpoint selection, facilitating efficient decision-making with promising accuracy and completeness. A hierarchical planning strategy based on the topological map is adopted to mitigate repetitive trajectories and improve local granularity given limited budgets, ensuring high-fidelity reconstruction with photorealistic view synthesis. Extensive experiments and ablation studies validate the efficacy of the proposed method in terms of reconstruction accuracy, data coverage, and exploration efficiency. Project page: https://li-yuetao.github.io/ActiveSplat/.", + "arxiv_url": "http://arxiv.org/abs/2410.21955v1", + "pdf_url": "http://arxiv.org/pdf/2410.21955v1", + "published_date": "2024-10-29", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps", + "authors": [ + "Yating Xu", + "Chen Li", + "Gim Hee Lee" + ], + "abstract": "The key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. However, the geometry extracted from NeRF is generally inaccurate, which leads to sub-optimal detection performance. In this paper, we propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection. To circumvent the requirement for a large number of depth planes for accurate depth prediction, we design a probabilistic sampling and soft weighting mechanism to decide the placement of pixel features on the 3D volume. We select multiple locations that score top in the probability volume for each pixel and use their probability score to indicate the confidence. We further apply recent pixel-aligned Gaussian Splatting to regularize depth prediction and improve detection performance with little computation overhead. Extensive experiments on ScanNet and ARKitScenes datasets are conducted to show the superiority of our model. Our code is available at https://github.com/Pixie8888/MVSDet.", + "arxiv_url": "http://arxiv.org/abs/2410.21566v1", + "pdf_url": "http://arxiv.org/pdf/2410.21566v1", + "published_date": "2024-10-28", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/Pixie8888/MVSDet", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Grid4D: 4D Decomposed Hash Encoding for High-fidelity Dynamic Gaussian Splatting", + "authors": [ + "Jiawei Xu", + "Zexin Fan", + "Jian Yang", + "Jin Xie" + ], + "abstract": "Recently, Gaussian splatting has received more and more attention in the field of static scene rendering. Due to the low computational overhead and inherent flexibility of explicit representations, plane-based explicit methods are popular ways to predict deformations for Gaussian-based dynamic scene rendering models. However, plane-based methods rely on the inappropriate low-rank assumption and excessively decompose the space-time 4D encoding, resulting in overmuch feature overlap and unsatisfactory rendering quality. To tackle these problems, we propose Grid4D, a dynamic scene rendering model based on Gaussian splatting and employing a novel explicit encoding method for the 4D input through the hash encoding. Different from plane-based explicit representations, we decompose the 4D encoding into one spatial and three temporal 3D hash encodings without the low-rank assumption. Additionally, we design a novel attention module that generates the attention scores in a directional range to aggregate the spatial and temporal features. The directional attention enables Grid4D to more accurately fit the diverse deformations across distinct scene components based on the spatial encoded features. Moreover, to mitigate the inherent lack of smoothness in explicit representation methods, we introduce a smooth regularization term that keeps our model from the chaos of deformation prediction. Our experiments demonstrate that Grid4D significantly outperforms the state-of-the-art models in visual quality and rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2410.20815v1", + "pdf_url": "http://arxiv.org/pdf/2410.20815v1", + "published_date": "2024-10-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LoDAvatar: Hierarchical Embedding and Adaptive Levels of Detail with Gaussian Splatting for Enhanced Human Avatars", + "authors": [ + "Xiaonuo Dongye", + "Hanzhi Guo", + "Le Luo", + "Haiyan Jiang", + "Yihua Bao", + "Zeyu Tian", + "Dongdong Weng" + ], + "abstract": "With the advancement of virtual reality, the demand for 3D human avatars is increasing. The emergence of Gaussian Splatting technology has enabled the rendering of Gaussian avatars with superior visual quality and reduced computational costs. Despite numerous methods researchers propose for implementing drivable Gaussian avatars, limited attention has been given to balancing visual quality and computational costs. In this paper, we introduce LoDAvatar, a method that introduces levels of detail into Gaussian avatars through hierarchical embedding and selective detail enhancement methods. The key steps of LoDAvatar encompass data preparation, Gaussian embedding, Gaussian optimization, and selective detail enhancement. We conducted experiments involving Gaussian avatars at various levels of detail, employing both objective assessments and subjective evaluations. The outcomes indicate that incorporating levels of detail into Gaussian avatars can decrease computational costs during rendering while upholding commendable visual quality, thereby enhancing runtime frame rates. We advocate adopting LoDAvatar to render multiple dynamic Gaussian avatars or extensive Gaussian scenes to balance visual quality and computational costs.", + "arxiv_url": "http://arxiv.org/abs/2410.20789v1", + "pdf_url": "http://arxiv.org/pdf/2410.20789v1", + "published_date": "2024-10-28", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians", + "authors": [ + "Chongjian Ge", + "Chenfeng Xu", + "Yuanfeng Ji", + "Chensheng Peng", + "Masayoshi Tomizuka", + "Ping Luo", + "Mingyu Ding", + "Varun Jampani", + "Wei Zhan" + ], + "abstract": "Recent breakthroughs in text-guided image generation have significantly advanced the field of 3D generation. While generating a single high-quality 3D object is now feasible, generating multiple objects with reasonable interactions within a 3D space, a.k.a. compositional 3D generation, presents substantial challenges. This paper introduces CompGS, a novel generative framework that employs 3D Gaussian Splatting (GS) for efficient, compositional text-to-3D content generation. To achieve this goal, two core designs are proposed: (1) 3D Gaussians Initialization with 2D compositionality: We transfer the well-established 2D compositionality to initialize the Gaussian parameters on an entity-by-entity basis, ensuring both consistent 3D priors for each entity and reasonable interactions among multiple entities; (2) Dynamic Optimization: We propose a dynamic strategy to optimize 3D Gaussians using Score Distillation Sampling (SDS) loss. CompGS first automatically decomposes 3D Gaussians into distinct entity parts, enabling optimization at both the entity and composition levels. Additionally, CompGS optimizes across objects of varying scales by dynamically adjusting the spatial parameters of each entity, enhancing the generation of fine-grained details, particularly in smaller entities. Qualitative comparisons and quantitative evaluations on T3Bench demonstrate the effectiveness of CompGS in generating compositional 3D objects with superior image quality and semantic alignment over existing methods. CompGS can also be easily extended to controllable 3D editing, facilitating scene generation. We hope CompGS will provide new insights to the compositional 3D generation. Project page: https://chongjiange.github.io/compgs.html.", + "arxiv_url": "http://arxiv.org/abs/2410.20723v1", + "pdf_url": "http://arxiv.org/pdf/2410.20723v1", + "published_date": "2024-10-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings", + "authors": [ + "Suyoung Lee", + "Jaeyoung Chung", + "Jaeyoo Huh", + "Kyoung Mu Lee" + ], + "abstract": "Omnidirectional (or 360-degree) images are increasingly being used for 3D applications since they allow the rendering of an entire scene with a single image. Existing works based on neural radiance fields demonstrate successful 3D reconstruction quality on egocentric videos, yet they suffer from long training and rendering times. Recently, 3D Gaussian splatting has gained attention for its fast optimization and real-time rendering. However, directly using a perspective rasterizer to omnidirectional images results in severe distortion due to the different optical properties between two image domains. In this work, we present ODGS, a novel rasterization pipeline for omnidirectional images, with geometric interpretation. For each Gaussian, we define a tangent plane that touches the unit sphere and is perpendicular to the ray headed toward the Gaussian center. We then leverage a perspective camera rasterizer to project the Gaussian onto the corresponding tangent plane. The projected Gaussians are transformed and combined into the omnidirectional image, finalizing the omnidirectional rasterization process. This interpretation reveals the implicit assumptions within the proposed pipeline, which we verify through mathematical proofs. The entire rasterization process is parallelized using CUDA, achieving optimization and rendering speeds 100 times faster than NeRF-based methods. Our comprehensive experiments highlight the superiority of ODGS by delivering the best reconstruction and perceptual quality across various datasets. Additionally, results on roaming datasets demonstrate that ODGS restores fine details effectively, even when reconstructing large 3D scenes. The source code is available on our project page (https://github.com/esw0116/ODGS).", + "arxiv_url": "http://arxiv.org/abs/2410.20686v1", + "pdf_url": "http://arxiv.org/pdf/2410.20686v1", + "published_date": "2024-10-28", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/esw0116/ODGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Normal-GS: 3D Gaussian Splatting with Normal-Involved Rendering", + "authors": [ + "Meng Wei", + "Qianyi Wu", + "Jianmin Zheng", + "Hamid Rezatofighi", + "Jianfei Cai" + ], + "abstract": "Rendering and reconstruction are long-standing topics in computer vision and graphics. Achieving both high rendering quality and accurate geometry is a challenge. Recent advancements in 3D Gaussian Splatting (3DGS) have enabled high-fidelity novel view synthesis at real-time speeds. However, the noisy and discrete nature of 3D Gaussian primitives hinders accurate surface estimation. Previous attempts to regularize 3D Gaussian normals often degrade rendering quality due to the fundamental disconnect between normal vectors and the rendering pipeline in 3DGS-based methods. Therefore, we introduce Normal-GS, a novel approach that integrates normal vectors into the 3DGS rendering pipeline. The core idea is to model the interaction between normals and incident lighting using the physically-based rendering equation. Our approach re-parameterizes surface colors as the product of normals and a designed Integrated Directional Illumination Vector (IDIV). To optimize memory usage and simplify optimization, we employ an anchor-based 3DGS to implicitly encode locally-shared IDIVs. Additionally, Normal-GS leverages optimized normals and Integrated Directional Encoding (IDE) to accurately model specular effects, enhancing both rendering quality and surface normal precision. Extensive experiments demonstrate that Normal-GS achieves near state-of-the-art visual quality while obtaining accurate surface normals and preserving real-time rendering performance.", + "arxiv_url": "http://arxiv.org/abs/2410.20593v1", + "pdf_url": "http://arxiv.org/pdf/2410.20593v1", + "published_date": "2024-10-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Neural Fields in Robotics: A Survey", + "authors": [ + "Muhammad Zubair Irshad", + "Mauro Comi", + "Yen-Chen Lin", + "Nick Heppert", + "Abhinav Valada", + "Rares Ambrus", + "Zsolt Kira", + "Jonathan Tremblay" + ], + "abstract": "Neural Fields have emerged as a transformative approach for 3D scene representation in computer vision and robotics, enabling accurate inference of geometry, 3D semantics, and dynamics from posed 2D data. Leveraging differentiable rendering, Neural Fields encompass both continuous implicit and explicit neural representations enabling high-fidelity 3D reconstruction, integration of multi-modal sensor data, and generation of novel viewpoints. This survey explores their applications in robotics, emphasizing their potential to enhance perception, planning, and control. Their compactness, memory efficiency, and differentiability, along with seamless integration with foundation and generative models, make them ideal for real-time applications, improving robot adaptability and decision-making. This paper provides a thorough review of Neural Fields in robotics, categorizing applications across various domains and evaluating their strengths and limitations, based on over 200 papers. First, we present four key Neural Fields frameworks: Occupancy Networks, Signed Distance Fields, Neural Radiance Fields, and Gaussian Splatting. Second, we detail Neural Fields' applications in five major robotics domains: pose estimation, manipulation, navigation, physics, and autonomous driving, highlighting key works and discussing takeaways and open challenges. Finally, we outline the current limitations of Neural Fields in robotics and propose promising directions for future research. Project page: https://robonerf.github.io", + "arxiv_url": "http://arxiv.org/abs/2410.20220v1", + "pdf_url": "http://arxiv.org/pdf/2410.20220v1", + "published_date": "2024-10-26", + "categories": [ + "cs.RO", + "cs.AI", + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SCube: Instant Large-Scale Scene Reconstruction using VoxSplats", + "authors": [ + "Xuanchi Ren", + "Yifan Lu", + "Hanxue Liang", + "Zhangjie Wu", + "Huan Ling", + "Mike Chen", + "Sanja Fidler", + "Francis Williams", + "Jiahui Huang" + ], + "abstract": "We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images. Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold. To reconstruct a VoxSplat from images, we employ a hierarchical voxel latent diffusion model conditioned on the input images followed by a feedforward appearance prediction model. The diffusion model generates high-resolution grids progressively in a coarse-to-fine manner, and the appearance network predicts a set of Gaussians within each voxel. From as few as 3 non-overlapping input images, SCube can generate millions of Gaussians with a 1024^3 voxel grid spanning hundreds of meters in 20 seconds. Past works tackling scene reconstruction from images either rely on per-scene optimization and fail to reconstruct the scene away from input views (thus requiring dense view coverage as input) or leverage geometric priors based on low-resolution models, which produce blurry results. In contrast, SCube leverages high-resolution sparse networks and produces sharp outputs from few views. We show the superiority of SCube compared to prior art using the Waymo self-driving dataset on 3D reconstruction and demonstrate its applications, such as LiDAR simulation and text-to-scene generation.", + "arxiv_url": "http://arxiv.org/abs/2410.20030v1", + "pdf_url": "http://arxiv.org/pdf/2410.20030v1", + "published_date": "2024-10-26", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DiffGS: Functional Gaussian Splatting Diffusion", + "authors": [ + "Junsheng Zhou", + "Weiqi Zhang", + "Yu-Shen Liu" + ], + "abstract": "3D Gaussian Splatting (3DGS) has shown convincing performance in rendering speed and fidelity, yet the generation of Gaussian Splatting remains a challenge due to its discreteness and unstructured nature. In this work, we propose DiffGS, a general Gaussian generator based on latent diffusion models. DiffGS is a powerful and efficient 3D generative model which is capable of generating Gaussian primitives at arbitrary numbers for high-fidelity rendering with rasterization. The key insight is to represent Gaussian Splatting in a disentangled manner via three novel functions to model Gaussian probabilities, colors and transforms. Through the novel disentanglement of 3DGS, we represent the discrete and unstructured 3DGS with continuous Gaussian Splatting functions, where we then train a latent diffusion model with the target of generating these Gaussian Splatting functions both unconditionally and conditionally. Meanwhile, we introduce a discretization algorithm to extract Gaussians at arbitrary numbers from the generated functions via octree-guided sampling and optimization. We explore DiffGS for various tasks, including unconditional generation, conditional generation from text, image, and partial 3DGS, as well as Point-to-Gaussian generation. We believe that DiffGS provides a new direction for flexibly modeling and generating Gaussian Splatting.", + "arxiv_url": "http://arxiv.org/abs/2410.19657v2", + "pdf_url": "http://arxiv.org/pdf/2410.19657v2", + "published_date": "2024-10-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Robotic Learning in your Backyard: A Neural Simulator from Open Source Components", + "authors": [ + "Liyou Zhou", + "Oleg Sinavski", + "Athanasios Polydoros" + ], + "abstract": "The emergence of 3D Gaussian Splatting for fast and high-quality novel view synthesize has opened up the possibility to construct photo-realistic simulations from video for robotic reinforcement learning. While the approach has been demonstrated in several research papers, the software tools used to build such a simulator remain unavailable or proprietary. We present SplatGym, an open source neural simulator for training data-driven robotic control policies. The simulator creates a photorealistic virtual environment from a single video. It supports ego camera view generation, collision detection, and virtual object in-painting. We demonstrate training several visual navigation policies via reinforcement learning. SplatGym represents a notable first step towards an open-source general-purpose neural environment for robotic learning. It broadens the range of applications that can effectively utilise reinforcement learning by providing convenient and unrestricted tooling, and by eliminating the need for the manual development of conventional 3D environments.", + "arxiv_url": "http://arxiv.org/abs/2410.19564v1", + "pdf_url": "http://arxiv.org/pdf/2410.19564v1", + "published_date": "2024-10-25", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization", + "authors": [ + "Weihang Liu", + "Xue Xian Zheng", + "Jingyi Yu", + "Xin Lou" + ], + "abstract": "The recent popular radiance field models, exemplified by Neural Radiance Fields (NeRF), Instant-NGP and 3D Gaussian Splatting, are designed to represent 3D content by that training models for each individual scene. This unique characteristic of scene representation and per-scene training distinguishes radiance field models from other neural models, because complex scenes necessitate models with higher representational capacity and vice versa. In this paper, we propose content-aware radiance fields, aligning the model complexity with the scene intricacies through Adversarial Content-Aware Quantization (A-CAQ). Specifically, we make the bitwidth of parameters differentiable and trainable, tailored to the unique characteristics of specific scenes and requirements. The proposed framework has been assessed on Instant-NGP, a well-known NeRF variant and evaluated using various datasets. Experimental results demonstrate a notable reduction in computational complexity, while preserving the requisite reconstruction and rendering quality, making it beneficial for practical deployment of radiance fields models. Codes are available at https://github.com/WeihangLiu2024/Content_Aware_NeRF.", + "arxiv_url": "http://arxiv.org/abs/2410.19483v1", + "pdf_url": "http://arxiv.org/pdf/2410.19483v1", + "published_date": "2024-10-25", + "categories": [ + "cs.CV", + "eess.IV" + ], + "github_url": "https://github.com/WeihangLiu2024/Content_Aware_NeRF", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting", + "authors": [ + "Takuma Nishimura", + "Andreea Dogaru", + "Martin Oeggerli", + "Bernhard Egger" + ], + "abstract": "Scanning Electron Microscopes (SEMs) are widely renowned for their ability to analyze the surface structures of microscopic objects, offering the capability to capture highly detailed, yet only grayscale, images. To create more expressive and realistic illustrations, these images are typically manually colorized by an artist with the support of image editing software. This task becomes highly laborious when multiple images of a scanned object require colorization. We propose facilitating this process by using the underlying 3D structure of the microscopic scene to propagate the color information to all the captured images, from as little as one colorized view. We explore several scene representation techniques and achieve high-quality colorized novel view synthesis of a SEM scene. In contrast to prior work, there is no manual intervention or labelling involved in obtaining the 3D representation. This enables an artist to color a single or few views of a sequence and automatically retrieve a fully colored scene or video. Project page: https://ronly2460.github.io/ArCSEM", + "arxiv_url": "http://arxiv.org/abs/2410.21310v1", + "pdf_url": "http://arxiv.org/pdf/2410.21310v1", + "published_date": "2024-10-25", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views", + "authors": [ + "Xin Fei", + "Wenzhao Zheng", + "Yueqi Duan", + "Wei Zhan", + "Masayoshi Tomizuka", + "Kurt Keutzer", + "Jiwen Lu" + ], + "abstract": "We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views. Code: https://github.com/Barrybarry-Smith/PixelGaussian.", + "arxiv_url": "http://arxiv.org/abs/2410.18979v1", + "pdf_url": "http://arxiv.org/pdf/2410.18979v1", + "published_date": "2024-10-24", + "categories": [ + "cs.CV", + "cs.AI", + "cs.LG" + ], + "github_url": "https://github.com/Barrybarry-Smith/PixelGaussian", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation", + "authors": [ + "Hansheng Chen", + "Bokui Shen", + "Yulin Liu", + "Ruoxi Shi", + "Linqi Zhou", + "Connor Z. Lin", + "Jiayuan Gu", + "Hao Su", + "Gordon Wetzstein", + "Leonidas Guibas" + ], + "abstract": "Multi-view image diffusion models have significantly advanced open-domain 3D object generation. However, most existing models rely on 2D network architectures that lack inherent 3D biases, resulting in compromised geometric consistency. To address this challenge, we introduce 3D-Adapter, a plug-in module designed to infuse 3D geometry awareness into pretrained image diffusion models. Central to our approach is the idea of 3D feedback augmentation: for each denoising step in the sampling loop, 3D-Adapter decodes intermediate multi-view features into a coherent 3D representation, then re-encodes the rendered RGBD views to augment the pretrained base model through feature addition. We study two variants of 3D-Adapter: a fast feed-forward version based on Gaussian splatting and a versatile training-free version utilizing neural fields and meshes. Our extensive experiments demonstrate that 3D-Adapter not only greatly enhances the geometry quality of text-to-multi-view models such as Instant3D and Zero123++, but also enables high-quality 3D generation using the plain text-to-image Stable Diffusion. Furthermore, we showcase the broad application potential of 3D-Adapter by presenting high quality results in text-to-3D, image-to-3D, text-to-texture, and text-to-avatar tasks.", + "arxiv_url": "http://arxiv.org/abs/2410.18974v1", + "pdf_url": "http://arxiv.org/pdf/2410.18974v1", + "published_date": "2024-10-24", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sort-free Gaussian Splatting via Weighted Sum Rendering", + "authors": [ + "Qiqi Hou", + "Randall Rauwendaal", + "Zifeng Li", + "Hoang Le", + "Farzad Farhadzadeh", + "Fatih Porikli", + "Alexei Bourd", + "Amir Said" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has emerged as a significant advancement in 3D scene reconstruction, attracting considerable attention due to its ability to recover high-fidelity details while maintaining low complexity. Despite the promising results achieved by 3DGS, its rendering performance is constrained by its dependence on costly non-commutative alpha-blending operations. These operations mandate complex view dependent sorting operations that introduce computational overhead, especially on the resource-constrained platforms such as mobile phones. In this paper, we propose Weighted Sum Rendering, which approximates alpha blending with weighted sums, thereby removing the need for sorting. This simplifies implementation, delivers superior performance, and eliminates the \"popping\" artifacts caused by sorting. Experimental results show that optimizing a generalized Gaussian splatting formulation to the new differentiable rendering yields competitive image quality. The method was implemented and tested in a mobile device GPU, achieving on average $1.23\\times$ faster rendering.", + "arxiv_url": "http://arxiv.org/abs/2410.18931v1", + "pdf_url": "http://arxiv.org/pdf/2410.18931v1", + "published_date": "2024-10-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling", + "authors": [ + "Mingtong Zhang", + "Kaifeng Zhang", + "Yunzhu Li" + ], + "abstract": "Videos of robots interacting with objects encode rich information about the objects' dynamics. However, existing video prediction approaches typically do not explicitly account for the 3D information from videos, such as robot actions and objects' 3D states, limiting their use in real-world robotic applications. In this work, we introduce a framework to learn object dynamics directly from multi-view RGB videos by explicitly considering the robot's action trajectories and their effects on scene dynamics. We utilize the 3D Gaussian representation of 3D Gaussian Splatting (3DGS) to train a particle-based dynamics model using Graph Neural Networks. This model operates on sparse control particles downsampled from the densely tracked 3D Gaussian reconstructions. By learning the neural dynamics model on offline robot interaction data, our method can predict object motions under varying initial configurations and unseen robot actions. The 3D transformations of Gaussians can be interpolated from the motions of control particles, enabling the rendering of predicted future object states and achieving action-conditioned video prediction. The dynamics model can also be applied to model-based planning frameworks for object manipulation tasks. We conduct experiments on various kinds of deformable materials, including ropes, clothes, and stuffed animals, demonstrating our framework's ability to model complex shapes and dynamics. Our project page is available at https://gs-dynamics.github.io.", + "arxiv_url": "http://arxiv.org/abs/2410.18912v1", + "pdf_url": "http://arxiv.org/pdf/2410.18912v1", + "published_date": "2024-10-24", + "categories": [ + "cs.RO", + "cs.AI", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Binocular-Guided 3D Gaussian Splatting with View Consistency for Sparse View Synthesis", + "authors": [ + "Liang Han", + "Junsheng Zhou", + "Yu-Shen Liu", + "Zhizhong Han" + ], + "abstract": "Novel view synthesis from sparse inputs is a vital yet challenging task in 3D computer vision. Previous methods explore 3D Gaussian Splatting with neural priors (e.g. depth priors) as an additional supervision, demonstrating promising quality and efficiency compared to the NeRF based methods. However, the neural priors from 2D pretrained models are often noisy and blurry, which struggle to precisely guide the learning of radiance fields. In this paper, We propose a novel method for synthesizing novel views from sparse views with Gaussian Splatting that does not require external prior as supervision. Our key idea lies in exploring the self-supervisions inherent in the binocular stereo consistency between each pair of binocular images constructed with disparity-guided image warping. To this end, we additionally introduce a Gaussian opacity constraint which regularizes the Gaussian locations and avoids Gaussian redundancy for improving the robustness and efficiency of inferring 3D Gaussians from sparse views. Extensive experiments on the LLFF, DTU, and Blender datasets demonstrate that our method significantly outperforms the state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2410.18822v2", + "pdf_url": "http://arxiv.org/pdf/2410.18822v2", + "published_date": "2024-10-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points", + "authors": [ + "Linus Franke", + "Laura Fink", + "Marc Stamminger" + ], + "abstract": "Recent advances in novel view synthesis (NVS), particularly neural radiance fields (NeRF) and Gaussian splatting (3DGS), have demonstrated impressive results in photorealistic scene rendering. These techniques hold great potential for applications in virtual tourism and teleportation, where immersive realism is crucial. However, the high-performance demands of virtual reality (VR) systems present challenges in directly utilizing even such fast-to-render scene representations like 3DGS due to latency and computational constraints. In this paper, we propose foveated rendering as a promising solution to these obstacles. We analyze state-of-the-art NVS methods with respect to their rendering performance and compatibility with the human visual system. Our approach introduces a novel foveated rendering approach for Virtual Reality, that leverages the sharp, detailed output of neural point rendering for the foveal region, fused with a smooth rendering of 3DGS for the peripheral vision. Our evaluation confirms that perceived sharpness and detail-richness are increased by our approach compared to a standard VR-ready 3DGS configuration. Our system meets the necessary performance requirements for real-time VR interactions, ultimately enhancing the user's immersive experience. Project page: https://lfranke.github.io/vr_splatting", + "arxiv_url": "http://arxiv.org/abs/2410.17932v1", + "pdf_url": "http://arxiv.org/pdf/2410.17932v1", + "published_date": "2024-10-23", + "categories": [ + "cs.CV", + "cs.GR", + "I.3; I.4" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting", + "authors": [ + "Yu Wang", + "Xiaobao Wei", + "Ming Lu", + "Guoliang Kang" + ], + "abstract": "Previous methods utilize the Neural Radiance Field (NeRF) for panoptic lifting, while their training and rendering speed are unsatisfactory. In contrast, 3D Gaussian Splatting (3DGS) has emerged as a prominent technique due to its rapid training and rendering speed. However, unlike NeRF, the conventional 3DGS may not satisfy the basic smoothness assumption as it does not rely on any parameterized structures to render (e.g., MLPs). Consequently, the conventional 3DGS is, in nature, more susceptible to noisy 2D mask supervision. In this paper, we propose a new method called PLGS that enables 3DGS to generate consistent panoptic segmentation masks from noisy 2D segmentation masks while maintaining superior efficiency compared to NeRF-based methods. Specifically, we build a panoptic-aware structured 3D Gaussian model to introduce smoothness and design effective noise reduction strategies. For the semantic field, instead of initialization with structure from motion, we construct reliable semantic anchor points to initialize the 3D Gaussians. We then use these anchor points as smooth regularization during training. Additionally, we present a self-training approach using pseudo labels generated by merging the rendered masks with the noisy masks to enhance the robustness of PLGS. For the instance field, we project the 2D instance masks into 3D space and match them with oriented bounding boxes to generate cross-view consistent instance masks for supervision. Experiments on various benchmarks demonstrate that our method outperforms previous state-of-the-art methods in terms of both segmentation quality and speed.", + "arxiv_url": "http://arxiv.org/abs/2410.17505v1", + "pdf_url": "http://arxiv.org/pdf/2410.17505v1", + "published_date": "2024-10-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AG-SLAM: Active Gaussian Splatting SLAM", + "authors": [ + "Wen Jiang", + "Boshu Lei", + "Katrina Ashton", + "Kostas Daniilidis" + ], + "abstract": "We present AG-SLAM, the first active SLAM system utilizing 3D Gaussian Splatting (3DGS) for online scene reconstruction. In recent years, radiance field scene representations, including 3DGS have been widely used in SLAM and exploration, but actively planning trajectories for robotic exploration is still unvisited. In particular, many exploration methods assume precise localization and thus do not mitigate the significant risk of constructing a trajectory, which is difficult for a SLAM system to operate on. This can cause camera tracking failure and lead to failures in real-world robotic applications. Our method leverages Fisher Information to balance the dual objectives of maximizing the information gain for the environment while minimizing the cost of localization errors. Experiments conducted on the Gibson and Habitat-Matterport 3D datasets demonstrate state-of-the-art results of the proposed method.", + "arxiv_url": "http://arxiv.org/abs/2410.17422v1", + "pdf_url": "http://arxiv.org/pdf/2410.17422v1", + "published_date": "2024-10-22", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes", + "authors": [ + "Cheng-De Fan", + "Chen-Wei Chang", + "Yi-Ruei Liu", + "Jie-Ying Lee", + "Jiun-Long Huang", + "Yu-Chee Tseng", + "Yu-Lun Liu" + ], + "abstract": "We present SpectroMotion, a novel approach that combines 3D Gaussian Splatting (3DGS) with physically-based rendering (PBR) and deformation fields to reconstruct dynamic specular scenes. Previous methods extending 3DGS to model dynamic scenes have struggled to accurately represent specular surfaces. Our method addresses this limitation by introducing a residual correction technique for accurate surface normal computation during deformation, complemented by a deformable environment map that adapts to time-varying lighting conditions. We implement a coarse-to-fine training strategy that significantly enhances both scene geometry and specular color prediction. We demonstrate that our model outperforms prior methods for view synthesis of scenes containing dynamic specular objects and that it is the only existing 3DGS method capable of synthesizing photorealistic real-world dynamic specular scenes, outperforming state-of-the-art methods in rendering complex, dynamic, and specular scenes.", + "arxiv_url": "http://arxiv.org/abs/2410.17249v1", + "pdf_url": "http://arxiv.org/pdf/2410.17249v1", + "published_date": "2024-10-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias", + "authors": [ + "Haian Jin", + "Hanwen Jiang", + "Hao Tan", + "Kai Zhang", + "Sai Bi", + "Tianyuan Zhang", + "Fujun Luan", + "Noah Snavely", + "Zexiang Xu" + ], + "abstract": "We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce two architectures: (1) an encoder-decoder LVSM, which encodes input image tokens into a fixed number of 1D latent tokens, functioning as a fully learned scene representation, and decodes novel-view images from them; and (2) a decoder-only LVSM, which directly maps input images to novel-view outputs, completely eliminating intermediate scene representations. Both models bypass the 3D inductive biases used in previous methods -- from 3D representations (e.g., NeRF, 3DGS) to network designs (e.g., epipolar projections, plane sweeps) -- addressing novel view synthesis with a fully data-driven approach. While the encoder-decoder model offers faster inference due to its independent latent representation, the decoder-only LVSM achieves superior quality, scalability, and zero-shot generalization, outperforming previous state-of-the-art methods by 1.5 to 3.5 dB PSNR. Comprehensive evaluations across multiple datasets demonstrate that both LVSM variants achieve state-of-the-art novel view synthesis quality. Notably, our models surpass all previous methods even with reduced computational resources (1-2 GPUs). Please see our website for more details: https://haian-jin.github.io/projects/LVSM/ .", + "arxiv_url": "http://arxiv.org/abs/2410.17242v1", + "pdf_url": "http://arxiv.org/pdf/2410.17242v1", + "published_date": "2024-10-22", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "E-3DGS: Gaussian Splatting with Exposure and Motion Events", + "authors": [ + "Xiaoting Yin", + "Hao Shi", + "Yuhan Bao", + "Zhenshan Bing", + "Yiyi Liao", + "Kailun Yang", + "Kaiwei Wang" + ], + "abstract": "Estimating Neural Radiance Fields (NeRFs) from images captured under optimal conditions has been extensively explored in the vision community. However, robotic applications often face challenges such as motion blur, insufficient illumination, and high computational overhead, which adversely affect downstream tasks like navigation, inspection, and scene visualization. To address these challenges, we propose E-3DGS, a novel event-based approach that partitions events into motion (from camera or object movement) and exposure (from camera exposure), using the former to handle fast-motion scenes and using the latter to reconstruct grayscale images for high-quality training and optimization of event-based 3D Gaussian Splatting (3DGS). We introduce a novel integration of 3DGS with exposure events for high-quality reconstruction of explicit scene representations. Our versatile framework can operate on motion events alone for 3D reconstruction, enhance quality using exposure events, or adopt a hybrid mode that balances quality and effectiveness by optimizing with initial exposure events followed by high-speed motion events. We also introduce EME-3D, a real-world 3D dataset with exposure events, motion events, camera calibration parameters, and sparse point clouds. Our method is faster and delivers better reconstruction quality than event-based NeRF while being more cost-effective than NeRF methods that combine event and RGB data by using a single event sensor. By combining motion and exposure events, E-3DGS sets a new benchmark for event-based 3D reconstruction with robust performance in challenging conditions and lower hardware demands. The source code and dataset will be available at https://github.com/MasterHow/E-3DGS.", + "arxiv_url": "http://arxiv.org/abs/2410.16995v1", + "pdf_url": "http://arxiv.org/pdf/2410.16995v1", + "published_date": "2024-10-22", + "categories": [ + "cs.CV", + "cs.RO", + "eess.IV" + ], + "github_url": "https://github.com/MasterHow/E-3DGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Multi-Layer Gaussian Splatting for Immersive Anatomy Visualization", + "authors": [ + "Constantin Kleinbeck", + "Hannah Schieber", + "Klaus Engel", + "Ralf Gutjahr", + "Daniel Roth" + ], + "abstract": "In medical image visualization, path tracing of volumetric medical data like CT scans produces lifelike three-dimensional visualizations. Immersive VR displays can further enhance the understanding of complex anatomies. Going beyond the diagnostic quality of traditional 2D slices, they enable interactive 3D evaluation of anatomies, supporting medical education and planning. Rendering high-quality visualizations in real-time, however, is computationally intensive and impractical for compute-constrained devices like mobile headsets. We propose a novel approach utilizing GS to create an efficient but static intermediate representation of CT scans. We introduce a layered GS representation, incrementally including different anatomical structures while minimizing overlap and extending the GS training to remove inactive Gaussians. We further compress the created model with clustering across layers. Our approach achieves interactive frame rates while preserving anatomical structures, with quality adjustable to the target hardware. Compared to standard GS, our representation retains some of the explorative qualities initially enabled by immersive path tracing. Selective activation and clipping of layers are possible at rendering time, adding a degree of interactivity to otherwise static GS models. This could enable scenarios where high computational demands would otherwise prohibit using path-traced medical volumes.", + "arxiv_url": "http://arxiv.org/abs/2410.16978v1", + "pdf_url": "http://arxiv.org/pdf/2410.16978v1", + "published_date": "2024-10-22", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors", + "authors": [ + "Honghua Chen", + "Yushi Lan", + "Yongwei Chen", + "Yifan Zhou", + "Xingang Pan" + ], + "abstract": "Drag-based editing has become popular in 2D content creation, driven by the capabilities of image generative models. However, extending this technique to 3D remains a challenge. Existing 3D drag-based editing methods, whether employing explicit spatial transformations or relying on implicit latent optimization within limited-capacity 3D generative models, fall short in handling significant topology changes or generating new textures across diverse object categories. To overcome these limitations, we introduce MVDrag3D, a novel framework for more flexible and creative drag-based 3D editing that leverages multi-view generation and reconstruction priors. At the core of our approach is the usage of a multi-view diffusion model as a strong generative prior to perform consistent drag editing over multiple rendered views, which is followed by a reconstruction model that reconstructs 3D Gaussians of the edited object. While the initial 3D Gaussians may suffer from misalignment between different views, we address this via view-specific deformation networks that adjust the position of Gaussians to be well aligned. In addition, we propose a multi-view score function that distills generative priors from multiple views to further enhance the view consistency and visual quality. Extensive experiments demonstrate that MVDrag3D provides a precise, generative, and flexible solution for 3D drag-based editing, supporting more versatile editing effects across various object categories and 3D representations.", + "arxiv_url": "http://arxiv.org/abs/2410.16272v1", + "pdf_url": "http://arxiv.org/pdf/2410.16272v1", + "published_date": "2024-10-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors", + "authors": [ + "Xi Liu", + "Chaoyi Zhou", + "Siyu Huang" + ], + "abstract": "Novel-view synthesis aims to generate novel views of a scene from multiple input images or videos, and recent advancements like 3D Gaussian splatting (3DGS) have achieved notable success in producing photorealistic renderings with efficient pipelines. However, generating high-quality novel views under challenging settings, such as sparse input views, remains difficult due to insufficient information in under-sampled areas, often resulting in noticeable artifacts. This paper presents 3DGS-Enhancer, a novel pipeline for enhancing the representation quality of 3DGS representations. We leverage 2D video diffusion priors to address the challenging 3D view consistency problem, reformulating it as achieving temporal consistency within a video generation process. 3DGS-Enhancer restores view-consistent latent features of rendered novel views and integrates them with the input views through a spatial-temporal decoder. The enhanced views are then used to fine-tune the initial 3DGS model, significantly improving its rendering performance. Extensive experiments on large-scale datasets of unbounded scenes demonstrate that 3DGS-Enhancer yields superior reconstruction performance and high-fidelity rendering results compared to state-of-the-art methods. The project webpage is https://xiliu8006.github.io/3DGS-Enhancer-project .", + "arxiv_url": "http://arxiv.org/abs/2410.16266v1", + "pdf_url": "http://arxiv.org/pdf/2410.16266v1", + "published_date": "2024-10-21", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation", + "authors": [ + "Yu Sheng", + "Runfeng Lin", + "Lidian Wang", + "Quecheng Qiu", + "YanYong Zhang", + "Yu Zhang", + "Bei Hua", + "Jianmin Ji" + ], + "abstract": "Combining accurate geometry with rich semantics has been proven to be highly effective for language-guided robotic manipulation. Existing methods for dynamic scenes either fail to update in real-time or rely on additional depth sensors for simple scene editing, limiting their applicability in real-world. In this paper, we introduce MSGField, a representation that uses a collection of 2D Gaussians for high-quality reconstruction, further enhanced with attributes to encode semantic and motion information. Specially, we represent the motion field compactly by decomposing each primitive's motion into a combination of a limited set of motion bases. Leveraging the differentiable real-time rendering of Gaussian splatting, we can quickly optimize object motion, even for complex non-rigid motions, with image supervision from only two camera views. Additionally, we designed a pipeline that utilizes object priors to efficiently obtain well-defined semantics. In our challenging dataset, which includes flexible and extremely small objects, our method achieve a success rate of 79.2% in static and 63.3% in dynamic environments for language-guided manipulation. For specified object grasping, we achieve a success rate of 90%, on par with point cloud-based methods. Code and dataset will be released at:https://shengyu724.github.io/MSGField.github.io.", + "arxiv_url": "http://arxiv.org/abs/2410.15730v1", + "pdf_url": "http://arxiv.org/pdf/2410.15730v1", + "published_date": "2024-10-21", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LucidFusion: Generating 3D Gaussians with Arbitrary Unposed Images", + "authors": [ + "Hao He", + "Yixun Liang", + "Luozhou Wang", + "Yuanhao Cai", + "Xinli Xu", + "Hao-Xiang Guo", + "Xiang Wen", + "Yingcong Chen" + ], + "abstract": "Recent large reconstruction models have made notable progress in generating high-quality 3D objects from single images. However, these methods often struggle with controllability, as they lack information from multiple views, leading to incomplete or inconsistent 3D reconstructions. To address this limitation, we introduce LucidFusion, a flexible end-to-end feed-forward framework that leverages the Relative Coordinate Map (RCM). Unlike traditional methods linking images to 3D world thorough pose, LucidFusion utilizes RCM to align geometric features coherently across different views, making it highly adaptable for 3D generation from arbitrary, unposed images. Furthermore, LucidFusion seamlessly integrates with the original single-image-to-3D pipeline, producing detailed 3D Gaussians at a resolution of $512 \\times 512$, making it well-suited for a wide range of applications.", + "arxiv_url": "http://arxiv.org/abs/2410.15636v2", + "pdf_url": "http://arxiv.org/pdf/2410.15636v2", + "published_date": "2024-10-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Fully Explicit Dynamic Gaussian Splatting", + "authors": [ + "Junoh Lee", + "Chang-Yeon Won", + "Hyunjun Jung", + "Inhwan Bae", + "Hae-Gon Jeon" + ], + "abstract": "3D Gaussian Splatting has shown fast and high-quality rendering results in static scenes by leveraging dense 3D prior and explicit representations. Unfortunately, the benefits of the prior and representation do not involve novel view synthesis for dynamic motions. Ironically, this is because the main barrier is the reliance on them, which requires increasing training and rendering times to account for dynamic motions. In this paper, we design a Explicit 4D Gaussian Splatting(Ex4DGS). Our key idea is to firstly separate static and dynamic Gaussians during training, and to explicitly sample positions and rotations of the dynamic Gaussians at sparse timestamps. The sampled positions and rotations are then interpolated to represent both spatially and temporally continuous motions of objects in dynamic scenes as well as reducing computational cost. Additionally, we introduce a progressive training scheme and a point-backtracking technique that improves Ex4DGS's convergence. We initially train Ex4DGS using short timestamps and progressively extend timestamps, which makes it work well with a few point clouds. The point-backtracking is used to quantify the cumulative error of each Gaussian over time, enabling the detection and removal of erroneous Gaussians in dynamic scenes. Comprehensive experiments on various scenes demonstrate the state-of-the-art rendering quality from our method, achieving fast rendering of 62 fps on a single 2080Ti GPU.", + "arxiv_url": "http://arxiv.org/abs/2410.15629v2", + "pdf_url": "http://arxiv.org/pdf/2410.15629v2", + "published_date": "2024-10-21", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EF-3DGS: Event-Aided Free-Trajectory 3D Gaussian Splatting", + "authors": [ + "Bohao Liao", + "Wei Zhai", + "Zengyu Wan", + "Tianzhu Zhang", + "Yang Cao", + "Zheng-Jun Zha" + ], + "abstract": "Scene reconstruction from casually captured videos has wide applications in real-world scenarios. With recent advancements in differentiable rendering techniques, several methods have attempted to simultaneously optimize scene representations (NeRF or 3DGS) and camera poses. Despite recent progress, existing methods relying on traditional camera input tend to fail in high-speed (or equivalently low-frame-rate) scenarios. Event cameras, inspired by biological vision, record pixel-wise intensity changes asynchronously with high temporal resolution, providing valuable scene and motion information in blind inter-frame intervals. In this paper, we introduce the event camera to aid scene construction from a casually captured video for the first time, and propose Event-Aided Free-Trajectory 3DGS, called EF-3DGS, which seamlessly integrates the advantages of event cameras into 3DGS through three key components. First, we leverage the Event Generation Model (EGM) to fuse events and frames, supervising the rendered views observed by the event stream. Second, we adopt the Contrast Maximization (CMax) framework in a piece-wise manner to extract motion information by maximizing the contrast of the Image of Warped Events (IWE), thereby calibrating the estimated poses. Besides, based on the Linear Event Generation Model (LEGM), the brightness information encoded in the IWE is also utilized to constrain the 3DGS in the gradient domain. Third, to mitigate the absence of color information of events, we introduce photometric bundle adjustment (PBA) to ensure view consistency across events and frames. We evaluate our method on the public Tanks and Temples benchmark and a newly collected real-world dataset, RealEv-DAVIS. Our project page is https://lbh666.github.io/ef-3dgs/.", + "arxiv_url": "http://arxiv.org/abs/2410.15392v2", + "pdf_url": "http://arxiv.org/pdf/2410.15392v2", + "published_date": "2024-10-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting", + "authors": [ + "Yusen Xie", + "Zhenmin Huang", + "Jin Wu", + "Jun Ma" + ], + "abstract": "In this paper, we introduce GS-LIVM, a real-time photo-realistic LiDAR-Inertial-Visual mapping framework with Gaussian Splatting tailored for outdoor scenes. Compared to existing methods based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), our approach enables real-time photo-realistic mapping while ensuring high-quality image rendering in large-scale unbounded outdoor environments. In this work, Gaussian Process Regression (GPR) is employed to mitigate the issues resulting from sparse and unevenly distributed LiDAR observations. The voxel-based 3D Gaussians map representation facilitates real-time dense mapping in large outdoor environments with acceleration governed by custom CUDA kernels. Moreover, the overall framework is designed in a covariance-centered manner, where the estimated covariance is used to initialize the scale and rotation of 3D Gaussians, as well as update the parameters of the GPR. We evaluate our algorithm on several outdoor datasets, and the results demonstrate that our method achieves state-of-the-art performance in terms of mapping efficiency and rendering quality. The source code is available on GitHub.", + "arxiv_url": "http://arxiv.org/abs/2410.17084v1", + "pdf_url": "http://arxiv.org/pdf/2410.17084v1", + "published_date": "2024-10-18", + "categories": [ + "cs.RO", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LUDVIG: Learning-free Uplifting of 2D Visual features to Gaussian Splatting scenes", + "authors": [ + "Juliette Marrie", + "Romain Ménégaux", + "Michael Arbel", + "Diane Larlus", + "Julien Mairal" + ], + "abstract": "We address the task of uplifting visual features or semantic masks from 2D vision models to 3D scenes represented by Gaussian Splatting. Whereas common approaches rely on iterative optimization-based procedures, we show that a simple yet effective aggregation technique yields excellent results. Applied to semantic masks from Segment Anything (SAM), our uplifting approach leads to segmentation quality comparable to the state of the art. We then extend this method to generic DINOv2 features, integrating 3D scene geometry through graph diffusion, and achieve competitive segmentation results despite DINOv2 not being trained on millions of annotated masks like SAM.", + "arxiv_url": "http://arxiv.org/abs/2410.14462v1", + "pdf_url": "http://arxiv.org/pdf/2410.14462v1", + "published_date": "2024-10-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Neural Signed Distance Function Inference through Splatting 3D Gaussians Pulled on Zero-Level Set", + "authors": [ + "Wenyuan Zhang", + "Yu-Shen Liu", + "Zhizhong Han" + ], + "abstract": "It is vital to infer a signed distance function (SDF) in multi-view based surface reconstruction. 3D Gaussian splatting (3DGS) provides a novel perspective for volume rendering, and shows advantages in rendering efficiency and quality. Although 3DGS provides a promising neural rendering option, it is still hard to infer SDFs for surface reconstruction with 3DGS due to the discreteness, the sparseness, and the off-surface drift of 3D Gaussians. To resolve these issues, we propose a method that seamlessly merge 3DGS with the learning of neural SDFs. Our key idea is to more effectively constrain the SDF inference with the multi-view consistency. To this end, we dynamically align 3D Gaussians on the zero-level set of the neural SDF using neural pulling, and then render the aligned 3D Gaussians through the differentiable rasterization. Meanwhile, we update the neural SDF by pulling neighboring space to the pulled 3D Gaussians, which progressively refine the signed distance field near the surface. With both differentiable pulling and splatting, we jointly optimize 3D Gaussians and the neural SDF with both RGB and geometry constraints, which recovers more accurate, smooth, and complete surfaces with more geometry details. Our numerical and visual comparisons show our superiority over the state-of-the-art results on the widely used benchmarks.", + "arxiv_url": "http://arxiv.org/abs/2410.14189v1", + "pdf_url": "http://arxiv.org/pdf/2410.14189v1", + "published_date": "2024-10-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DaRePlane: Direction-aware Representations for Dynamic Scene Reconstruction", + "authors": [ + "Ange Lou", + "Benjamin Planche", + "Zhongpai Gao", + "Yamin Li", + "Tianyu Luan", + "Hao Ding", + "Meng Zheng", + "Terrence Chen", + "Ziyan Wu", + "Jack Noble" + ], + "abstract": "Numerous recent approaches to modeling and re-rendering dynamic scenes leverage plane-based explicit representations, addressing slow training times associated with models like neural radiance fields (NeRF) and Gaussian splatting (GS). However, merely decomposing 4D dynamic scenes into multiple 2D plane-based representations is insufficient for high-fidelity re-rendering of scenes with complex motions. In response, we present DaRePlane, a novel direction-aware representation approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. Within NeRF pipelines, DaRePlane computes features for each space-time point by fusing vectors from these recovered planes, then passed to a tiny MLP for color regression. When applied to Gaussian splatting, DaRePlane computes the features of Gaussian points, followed by a tiny multi-head MLP for spatial-time deformation prediction. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. To demonstrate the generality and efficiency of DaRePlane, we test it on both regular and surgical dynamic scenes, for both NeRF and GS systems. Extensive experiments show that DaRePlane yields state-of-the-art performance in novel view synthesis for various complex dynamic scenes.", + "arxiv_url": "http://arxiv.org/abs/2410.14169v1", + "pdf_url": "http://arxiv.org/pdf/2410.14169v1", + "published_date": "2024-10-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DepthSplat: Connecting Gaussian Splatting and Depth", + "authors": [ + "Haofei Xu", + "Songyou Peng", + "Fangjinhua Wang", + "Hermann Blum", + "Daniel Barath", + "Andreas Geiger", + "Marc Pollefeys" + ], + "abstract": "Gaussian splatting and single/multi-view depth estimation are typically studied in isolation. In this paper, we present DepthSplat to connect Gaussian splatting and depth estimation and study their interactions. More specifically, we first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features, leading to high-quality feed-forward 3D Gaussian splatting reconstructions. We also show that Gaussian splatting can serve as an unsupervised pre-training objective for learning powerful depth models from large-scale unlabeled datasets. We validate the synergy between Gaussian splatting and depth estimation through extensive ablation and cross-task transfer experiments. Our DepthSplat achieves state-of-the-art performance on ScanNet, RealEstate10K and DL3DV datasets in terms of both depth estimation and novel view synthesis, demonstrating the mutual benefits of connecting both tasks.", + "arxiv_url": "http://arxiv.org/abs/2410.13862v2", + "pdf_url": "http://arxiv.org/pdf/2410.13862v2", + "published_date": "2024-10-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Differentiable Robot Rendering", + "authors": [ + "Ruoshi Liu", + "Alper Canberk", + "Shuran Song", + "Carl Vondrick" + ], + "abstract": "Vision foundation models trained on massive amounts of visual data have shown unprecedented reasoning and planning skills in open-world settings. A key challenge in applying them to robotic tasks is the modality gap between visual data and action data. We introduce differentiable robot rendering, a method allowing the visual appearance of a robot body to be directly differentiable with respect to its control parameters. Our model integrates a kinematics-aware deformable model and Gaussians Splatting and is compatible with any robot form factors and degrees of freedom. We demonstrate its capability and usage in applications including reconstruction of robot poses from images and controlling robots through vision language models. Quantitative and qualitative results show that our differentiable rendering model provides effective gradients for robotic control directly from pixels, setting the foundation for the future applications of vision foundation models in robotics.", + "arxiv_url": "http://arxiv.org/abs/2410.13851v1", + "pdf_url": "http://arxiv.org/pdf/2410.13851v1", + "published_date": "2024-10-17", + "categories": [ + "cs.RO", + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MEGA: Memory-Efficient 4D Gaussian Splatting for Dynamic Scenes", + "authors": [ + "Xinjie Zhang", + "Zhening Liu", + "Yifan Zhang", + "Xingtong Ge", + "Dailan He", + "Tongda Xu", + "Yan Wang", + "Zehong Lin", + "Shuicheng Yan", + "Jun Zhang" + ], + "abstract": "4D Gaussian Splatting (4DGS) has recently emerged as a promising technique for capturing complex dynamic 3D scenes with high fidelity. It utilizes a 4D Gaussian representation and a GPU-friendly rasterizer, enabling rapid rendering speeds. Despite its advantages, 4DGS faces significant challenges, notably the requirement of millions of 4D Gaussians, each with extensive associated attributes, leading to substantial memory and storage cost. This paper introduces a memory-efficient framework for 4DGS. We streamline the color attribute by decomposing it into a per-Gaussian direct color component with only 3 parameters and a shared lightweight alternating current color predictor. This approach eliminates the need for spherical harmonics coefficients, which typically involve up to 144 parameters in classic 4DGS, thereby creating a memory-efficient 4D Gaussian representation. Furthermore, we introduce an entropy-constrained Gaussian deformation technique that uses a deformation field to expand the action range of each Gaussian and integrates an opacity-based entropy loss to limit the number of Gaussians, thus forcing our model to use as few Gaussians as possible to fit a dynamic scene well. With simple half-precision storage and zip compression, our framework achieves a storage reduction by approximately 190$\\times$ and 125$\\times$ on the Technicolor and Neural 3D Video datasets, respectively, compared to the original 4DGS. Meanwhile, it maintains comparable rendering speeds and scene representation quality, setting a new standard in the field.", + "arxiv_url": "http://arxiv.org/abs/2410.13613v1", + "pdf_url": "http://arxiv.org/pdf/2410.13613v1", + "published_date": "2024-10-17", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering", + "authors": [ + "Jiahao Lu", + "Jiacheng Deng", + "Ruijie Zhu", + "Yanzhe Liang", + "Wenfei Yang", + "Tianzhu Zhang", + "Xu Zhou" + ], + "abstract": "Dynamic scenes rendering is an intriguing yet challenging problem. Although current methods based on NeRF have achieved satisfactory performance, they still can not reach real-time levels. Recently, 3D Gaussian Splatting (3DGS) has garnered researchers attention due to their outstanding rendering quality and real-time speed. Therefore, a new paradigm has been proposed: defining a canonical 3D gaussians and deforming it to individual frames in deformable fields. However, since the coordinates of canonical 3D gaussians are filled with noise, which can transfer noise into the deformable fields, and there is currently no method that adequately considers the aggregation of 4D information. Therefore, we propose Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering (DN-4DGS). Specifically, a Noise Suppression Strategy is introduced to change the distribution of the coordinates of the canonical 3D gaussians and suppress noise. Additionally, a Decoupled Temporal-Spatial Aggregation Module is designed to aggregate information from adjacent points and frames. Extensive experiments on various real-world datasets demonstrate that our method achieves state-of-the-art rendering quality under a real-time level.", + "arxiv_url": "http://arxiv.org/abs/2410.13607v2", + "pdf_url": "http://arxiv.org/pdf/2410.13607v2", + "published_date": "2024-10-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation", + "authors": [ + "Guosheng Zhao", + "Chaojun Ni", + "Xiaofeng Wang", + "Zheng Zhu", + "Xueyang Zhang", + "Yida Wang", + "Guan Huang", + "Xinze Chen", + "Boyuan Wang", + "Youyi Zhang", + "Wenjun Mei", + "Xingang Wang" + ], + "abstract": "Closed-loop simulation is essential for advancing end-to-end autonomous driving systems. Contemporary sensor simulation methods, such as NeRF and 3DGS, rely predominantly on conditions closely aligned with training data distributions, which are largely confined to forward-driving scenarios. Consequently, these methods face limitations when rendering complex maneuvers (e.g., lane change, acceleration, deceleration). Recent advancements in autonomous-driving world models have demonstrated the potential to generate diverse driving videos. However, these approaches remain constrained to 2D video generation, inherently lacking the spatiotemporal coherence required to capture intricacies of dynamic driving environments. In this paper, we introduce DriveDreamer4D, which enhances 4D driving scene representation leveraging world model priors. Specifically, we utilize the world model as a data machine to synthesize novel trajectory videos, where structured conditions are explicitly leveraged to control the spatial-temporal consistency of traffic elements. Besides, the cousin data training strategy is proposed to facilitate merging real and synthetic data for optimizing 4DGS. To our knowledge, DriveDreamer4D is the first to utilize video generation models for improving 4D reconstruction in driving scenarios. Experimental results reveal that DriveDreamer4D significantly enhances generation quality under novel trajectory views, achieving a relative improvement in FID by 32.1%, 46.4%, and 16.3% compared to PVG, S3Gaussian, and Deformable-GS. Moreover, DriveDreamer4D markedly enhances the spatiotemporal coherence of driving agents, which is verified by a comprehensive user study and the relative increases of 22.6%, 43.5%, and 15.6% in the NTA-IoU metric.", + "arxiv_url": "http://arxiv.org/abs/2410.13571v3", + "pdf_url": "http://arxiv.org/pdf/2410.13571v3", + "published_date": "2024-10-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "L3DG: Latent 3D Gaussian Diffusion", + "authors": [ + "Barbara Roessle", + "Norman Müller", + "Lorenzo Porzi", + "Samuel Rota Bulò", + "Peter Kontschieder", + "Angela Dai", + "Matthias Nießner" + ], + "abstract": "We propose L3DG, the first approach for generative 3D modeling of 3D Gaussians through a latent 3D Gaussian diffusion formulation. This enables effective generative 3D modeling, scaling to generation of entire room-scale scenes which can be very efficiently rendered. To enable effective synthesis of 3D Gaussians, we propose a latent diffusion formulation, operating in a compressed latent space of 3D Gaussians. This compressed latent space is learned by a vector-quantized variational autoencoder (VQ-VAE), for which we employ a sparse convolutional architecture to efficiently operate on room-scale scenes. This way, the complexity of the costly generation process via diffusion is substantially reduced, allowing higher detail on object-level generation, as well as scalability to large scenes. By leveraging the 3D Gaussian representation, the generated scenes can be rendered from arbitrary viewpoints in real-time. We demonstrate that our approach significantly improves visual quality over prior work on unconditional object-level radiance field synthesis and showcase its applicability to room-scale scene generation.", + "arxiv_url": "http://arxiv.org/abs/2410.13530v1", + "pdf_url": "http://arxiv.org/pdf/2410.13530v1", + "published_date": "2024-10-17", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GlossyGS: Inverse Rendering of Glossy Objects with 3D Gaussian Splatting", + "authors": [ + "Shuichang Lai", + "Letian Huang", + "Jie Guo", + "Kai Cheng", + "Bowen Pan", + "Xiaoxiao Long", + "Jiangjing Lyu", + "Chengfei Lv", + "Yanwen Guo" + ], + "abstract": "Reconstructing objects from posed images is a crucial and complex task in computer graphics and computer vision. While NeRF-based neural reconstruction methods have exhibited impressive reconstruction ability, they tend to be time-comsuming. Recent strategies have adopted 3D Gaussian Splatting (3D-GS) for inverse rendering, which have led to quick and effective outcomes. However, these techniques generally have difficulty in producing believable geometries and materials for glossy objects, a challenge that stems from the inherent ambiguities of inverse rendering. To address this, we introduce GlossyGS, an innovative 3D-GS-based inverse rendering framework that aims to precisely reconstruct the geometry and materials of glossy objects by integrating material priors. The key idea is the use of micro-facet geometry segmentation prior, which helps to reduce the intrinsic ambiguities and improve the decomposition of geometries and materials. Additionally, we introduce a normal map prefiltering strategy to more accurately simulate the normal distribution of reflective surfaces. These strategies are integrated into a hybrid geometry and material representation that employs both explicit and implicit methods to depict glossy objects. We demonstrate through quantitative analysis and qualitative visualization that the proposed method is effective to reconstruct high-fidelity geometries and materials of glossy objects, and performs favorably against state-of-the-arts.", + "arxiv_url": "http://arxiv.org/abs/2410.13349v1", + "pdf_url": "http://arxiv.org/pdf/2410.13349v1", + "published_date": "2024-10-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Hybrid bundle-adjusting 3D Gaussians for view consistent rendering with pose optimization", + "authors": [ + "Yanan Guo", + "Ying Xie", + "Ying Chang", + "Benkui Zhang", + "Bo Jia", + "Lin Cao" + ], + "abstract": "Novel view synthesis has made significant progress in the field of 3D computer vision. However, the rendering of view-consistent novel views from imperfect camera poses remains challenging. In this paper, we introduce a hybrid bundle-adjusting 3D Gaussians model that enables view-consistent rendering with pose optimization. This model jointly extract image-based and neural 3D representations to simultaneously generate view-consistent images and camera poses within forward-facing scenes. The effective of our model is demonstrated through extensive experiments conducted on both real and synthetic datasets. These experiments clearly illustrate that our model can effectively optimize neural scene representations while simultaneously resolving significant camera pose misalignments. The source code is available at https://github.com/Bistu3DV/hybridBA.", + "arxiv_url": "http://arxiv.org/abs/2410.13280v1", + "pdf_url": "http://arxiv.org/pdf/2410.13280v1", + "published_date": "2024-10-17", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/Bistu3DV/hybridBA", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "UniG: Modelling Unitary 3D Gaussians for View-consistent 3D Reconstruction", + "authors": [ + "Jiamin Wu", + "Kenkun Liu", + "Yukai Shi", + "Xiaoke Jiang", + "Yuan Yao", + "Lei Zhang" + ], + "abstract": "In this work, we present UniG, a view-consistent 3D reconstruction and novel view synthesis model that generates a high-fidelity representation of 3D Gaussians from sparse images. Existing 3D Gaussians-based methods usually regress Gaussians per-pixel of each view, create 3D Gaussians per view separately, and merge them through point concatenation. Such a view-independent reconstruction approach often results in a view inconsistency issue, where the predicted positions of the same 3D point from different views may have discrepancies. To address this problem, we develop a DETR (DEtection TRansformer)-like framework, which treats 3D Gaussians as decoder queries and updates their parameters layer by layer by performing multi-view cross-attention (MVDFA) over multiple input images. In this way, multiple views naturally contribute to modeling a unitary representation of 3D Gaussians, thereby making 3D reconstruction more view-consistent. Moreover, as the number of 3D Gaussians used as decoder queries is irrespective of the number of input views, allow an arbitrary number of input images without causing memory explosion. Extensive experiments validate the advantages of our approach, showcasing superior performance over existing methods quantitatively (improving PSNR by 4.2 dB when trained on Objaverse and tested on the GSO benchmark) and qualitatively. The code will be released at https://github.com/jwubz123/UNIG.", + "arxiv_url": "http://arxiv.org/abs/2410.13195v2", + "pdf_url": "http://arxiv.org/pdf/2410.13195v2", + "published_date": "2024-10-17", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/jwubz123/UNIG", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Long-LRM: Long-sequence Large Reconstruction Model for Wide-coverage Gaussian Splats", + "authors": [ + "Chen Ziwen", + "Hao Tan", + "Kai Zhang", + "Sai Bi", + "Fujun Luan", + "Yicong Hong", + "Li Fuxin", + "Zexiang Xu" + ], + "abstract": "We propose Long-LRM, a generalizable 3D Gaussian reconstruction model that is capable of reconstructing a large scene from a long sequence of input images. Specifically, our model can process 32 source images at 960x540 resolution within only 1.3 seconds on a single A100 80G GPU. Our architecture features a mixture of the recent Mamba2 blocks and the classical transformer blocks which allowed many more tokens to be processed than prior work, enhanced by efficient token merging and Gaussian pruning steps that balance between quality and efficiency. Unlike previous feed-forward models that are limited to processing 1~4 input images and can only reconstruct a small portion of a large scene, Long-LRM reconstructs the entire scene in a single feed-forward step. On large-scale scene datasets such as DL3DV-140 and Tanks and Temples, our method achieves performance comparable to optimization-based approaches while being two orders of magnitude more efficient. Project page: https://arthurhero.github.io/projects/llrm", + "arxiv_url": "http://arxiv.org/abs/2410.12781v1", + "pdf_url": "http://arxiv.org/pdf/2410.12781v1", + "published_date": "2024-10-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TV-3DG: Mastering Text-to-3D Customized Generation with Visual Prompt", + "authors": [ + "Jiahui Yang", + "Donglin Di", + "Baorui Ma", + "Xun Yang", + "Yongjia Ma", + "Wenzhang Sun", + "Wei Chen", + "Jianxun Cui", + "Zhou Xue", + "Meng Wang", + "Yebin Liu" + ], + "abstract": "In recent years, advancements in generative models have significantly expanded the capabilities of text-to-3D generation. Many approaches rely on Score Distillation Sampling (SDS) technology. However, SDS struggles to accommodate multi-condition inputs, such as text and visual prompts, in customized generation tasks. To explore the core reasons, we decompose SDS into a difference term and a classifier-free guidance term. Our analysis identifies the core issue as arising from the difference term and the random noise addition during the optimization process, both contributing to deviations from the target mode during distillation. To address this, we propose a novel algorithm, Classifier Score Matching (CSM), which removes the difference term in SDS and uses a deterministic noise addition process to reduce noise during optimization, effectively overcoming the low-quality limitations of SDS in our customized generation framework. Based on CSM, we integrate visual prompt information with an attention fusion mechanism and sampling guidance techniques, forming the Visual Prompt CSM (VPCSM) algorithm. Furthermore, we introduce a Semantic-Geometry Calibration (SGC) module to enhance quality through improved textual information integration. We present our approach as TV-3DG, with extensive experiments demonstrating its capability to achieve stable, high-quality, customized 3D generation. Project page: \\url{https://yjhboy.github.io/TV-3DG}", + "arxiv_url": "http://arxiv.org/abs/2410.21299v2", + "pdf_url": "http://arxiv.org/pdf/2410.21299v2", + "published_date": "2024-10-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Splatting in Robotics: A Survey", + "authors": [ + "Siting Zhu", + "Guangming Wang", + "Dezhi Kong", + "Hesheng Wang" + ], + "abstract": "Dense 3D representations of the environment have been a long-term goal in the robotics field. While previous Neural Radiance Fields (NeRF) representation have been prevalent for its implicit, coordinate-based model, the recent emergence of 3D Gaussian Splatting (3DGS) has demonstrated remarkable potential in its explicit radiance field representation. By leveraging 3D Gaussian primitives for explicit scene representation and enabling differentiable rendering, 3DGS has shown significant advantages over other radiance fields in real-time rendering and photo-realistic performance, which is beneficial for robotic applications. In this survey, we provide a comprehensive understanding of 3DGS in the field of robotics. We divide our discussion of the related works into two main categories: the application of 3DGS and the advancements in 3DGS techniques. In the application section, we explore how 3DGS has been utilized in various robotics tasks from scene understanding and interaction perspectives. The advance of 3DGS section focuses on the improvements of 3DGS own properties in its adaptability and efficiency, aiming to enhance its performance in robotics. We then summarize the most commonly used datasets and evaluation metrics in robotics. Finally, we identify the challenges and limitations of current 3DGS methods and discuss the future development of 3DGS in robotics.", + "arxiv_url": "http://arxiv.org/abs/2410.12262v1", + "pdf_url": "http://arxiv.org/pdf/2410.12262v1", + "published_date": "2024-10-16", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatPose+: Real-time Image-Based Pose-Agnostic 3D Anomaly Detection", + "authors": [ + "Yizhe Liu", + "Yan Song Hu", + "Yuhao Chen", + "John Zelek" + ], + "abstract": "Image-based Pose-Agnostic 3D Anomaly Detection is an important task that has emerged in industrial quality control. This task seeks to find anomalies from query images of a tested object given a set of reference images of an anomaly-free object. The challenge is that the query views (a.k.a poses) are unknown and can be different from the reference views. Currently, new methods such as OmniposeAD and SplatPose have emerged to bridge the gap by synthesizing pseudo reference images at the query views for pixel-to-pixel comparison. However, none of these methods can infer in real-time, which is critical in industrial quality control for massive production. For this reason, we propose SplatPose+, which employs a hybrid representation consisting of a Structure from Motion (SfM) model for localization and a 3D Gaussian Splatting (3DGS) model for Novel View Synthesis. Although our proposed pipeline requires the computation of an additional SfM model, it offers real-time inference speeds and faster training compared to SplatPose. Quality-wise, we achieved a new SOTA on the Pose-agnostic Anomaly Detection benchmark with the Multi-Pose Anomaly Detection (MAD-SIM) dataset.", + "arxiv_url": "http://arxiv.org/abs/2410.12080v1", + "pdf_url": "http://arxiv.org/pdf/2410.12080v1", + "published_date": "2024-10-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LoGS: Visual Localization via Gaussian Splatting with Fewer Training Images", + "authors": [ + "Yuzhou Cheng", + "Jianhao Jiao", + "Yue Wang", + "Dimitrios Kanoulas" + ], + "abstract": "Visual localization involves estimating a query image's 6-DoF (degrees of freedom) camera pose, which is a fundamental component in various computer vision and robotic tasks. This paper presents LoGS, a vision-based localization pipeline utilizing the 3D Gaussian Splatting (GS) technique as scene representation. This novel representation allows high-quality novel view synthesis. During the mapping phase, structure-from-motion (SfM) is applied first, followed by the generation of a GS map. During localization, the initial position is obtained through image retrieval, local feature matching coupled with a PnP solver, and then a high-precision pose is achieved through the analysis-by-synthesis manner on the GS map. Experimental results on four large-scale datasets demonstrate the proposed approach's SoTA accuracy in estimating camera poses and robustness under challenging few-shot conditions.", + "arxiv_url": "http://arxiv.org/abs/2410.11505v1", + "pdf_url": "http://arxiv.org/pdf/2410.11505v1", + "published_date": "2024-10-15", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS^3: Efficient Relighting with Triple Gaussian Splatting", + "authors": [ + "Zoubin Bi", + "Yixin Zeng", + "Chong Zeng", + "Fan Pei", + "Xiang Feng", + "Kun Zhou", + "Hongzhi Wu" + ], + "abstract": "We present a spatial and angular Gaussian based representation and a triple splatting process, for real-time, high-quality novel lighting-and-view synthesis from multi-view point-lit input images. To describe complex appearance, we employ a Lambertian plus a mixture of angular Gaussians as an effective reflectance function for each spatial Gaussian. To generate self-shadow, we splat all spatial Gaussians towards the light source to obtain shadow values, which are further refined by a small multi-layer perceptron. To compensate for other effects like global illumination, another network is trained to compute and add a per-spatial-Gaussian RGB tuple. The effectiveness of our representation is demonstrated on 30 samples with a wide variation in geometry (from solid to fluffy) and appearance (from translucent to anisotropic), as well as using different forms of input data, including rendered images of synthetic/reconstructed objects, photographs captured with a handheld camera and a flash, or from a professional lightstage. We achieve a training time of 40-70 minutes and a rendering speed of 90 fps on a single commodity GPU. Our results compare favorably with state-of-the-art techniques in terms of quality/performance. Our code and data are publicly available at https://GSrelight.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2410.11419v1", + "pdf_url": "http://arxiv.org/pdf/2410.11419v1", + "published_date": "2024-10-15", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MCGS: Multiview Consistency Enhancement for Sparse-View 3D Gaussian Radiance Fields", + "authors": [ + "Yuru Xiao", + "Deming Zhai", + "Wenbo Zhao", + "Kui Jiang", + "Junjun Jiang", + "Xianming Liu" + ], + "abstract": "Radiance fields represented by 3D Gaussians excel at synthesizing novel views, offering both high training efficiency and fast rendering. However, with sparse input views, the lack of multi-view consistency constraints results in poorly initialized point clouds and unreliable heuristics for optimization and densification, leading to suboptimal performance. Existing methods often incorporate depth priors from dense estimation networks but overlook the inherent multi-view consistency in input images. Additionally, they rely on multi-view stereo (MVS)-based initialization, which limits the efficiency of scene representation. To overcome these challenges, we propose a view synthesis framework based on 3D Gaussian Splatting, named MCGS, enabling photorealistic scene reconstruction from sparse input views. The key innovations of MCGS in enhancing multi-view consistency are as follows: i) We introduce an initialization method by leveraging a sparse matcher combined with a random filling strategy, yielding a compact yet sufficient set of initial points. This approach enhances the initial geometry prior, promoting efficient scene representation. ii) We develop a multi-view consistency-guided progressive pruning strategy to refine the Gaussian field by strengthening consistency and eliminating low-contribution Gaussians. These modular, plug-and-play strategies enhance robustness to sparse input views, accelerate rendering, and reduce memory consumption, making MCGS a practical and efficient framework for 3D Gaussian Splatting.", + "arxiv_url": "http://arxiv.org/abs/2410.11394v1", + "pdf_url": "http://arxiv.org/pdf/2410.11394v1", + "published_date": "2024-10-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSORB-SLAM: Gaussian Splatting SLAM benefits from ORB features and Transmittance information", + "authors": [ + "Wancai Zheng", + "Xinyi Yu", + "Jintao Rong", + "Linlin Ou", + "Yan Wei", + "Libo Zhou" + ], + "abstract": "The emergence of 3D Gaussian Splatting (3DGS) has recently sparked a renewed wave of dense visual SLAM research. However, current methods face challenges such as sensitivity to artifacts and noise, sub-optimal selection of training viewpoints, and a lack of light global optimization. In this paper, we propose a dense SLAM system that tightly couples 3DGS with ORB features. We design a joint optimization approach for robust tracking and effectively reducing the impact of noise and artifacts. This involves combining novel geometric observations, derived from accumulated transmittance, with ORB features extracted from pixel data. Furthermore, to improve mapping quality, we propose an adaptive Gaussian expansion and regularization method that enables Gaussian primitives to represent the scene compactly. This is coupled with a viewpoint selection strategy based on the hybrid graph to mitigate over-fitting effects and enhance convergence quality. Finally, our approach achieves compact and high-quality scene representations and accurate localization. GSORB-SLAM has been evaluated on different datasets, demonstrating outstanding performance. The code will be available.", + "arxiv_url": "http://arxiv.org/abs/2410.11356v2", + "pdf_url": "http://arxiv.org/pdf/2410.11356v2", + "published_date": "2024-10-15", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting", + "authors": [ + "Yuanbo Chen", + "Chengyu Zhang", + "Jason Wang", + "Xuefan Gao", + "Avideh Zakhor" + ], + "abstract": "Scene reconstruction and novel-view synthesis for large, complex, multi-story, indoor scenes is a challenging and time-consuming task. Prior methods have utilized drones for data capture and radiance fields for scene reconstruction, both of which present certain challenges. First, in order to capture diverse viewpoints with the drone's front-facing camera, some approaches fly the drone in an unstable zig-zag fashion, which hinders drone-piloting and generates motion blur in the captured data. Secondly, most radiance field methods do not easily scale to arbitrarily large number of images. This paper proposes an efficient and scalable pipeline for indoor novel-view synthesis from drone-captured 360 videos using 3D Gaussian Splatting. 360 cameras capture a wide set of viewpoints, allowing for comprehensive scene capture under a simple straightforward drone trajectory. To scale our method to large scenes, we devise a divide-and-conquer strategy to automatically split the scene into smaller blocks that can be reconstructed individually and in parallel. We also propose a coarse-to-fine alignment strategy to seamlessly match these blocks together to compose the entire scene. Our experiments demonstrate marked improvement in both reconstruction quality, i.e. PSNR and SSIM, and computation time compared to prior approaches.", + "arxiv_url": "http://arxiv.org/abs/2410.11285v1", + "pdf_url": "http://arxiv.org/pdf/2410.11285v1", + "published_date": "2024-10-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Few-shot Novel View Synthesis using Depth Aware 3D Gaussian Splatting", + "authors": [ + "Raja Kumar", + "Vanshika Vats" + ], + "abstract": "3D Gaussian splatting has surpassed neural radiance field methods in novel view synthesis by achieving lower computational costs and real-time high-quality rendering. Although it produces a high-quality rendering with a lot of input views, its performance drops significantly when only a few views are available. In this work, we address this by proposing a depth-aware Gaussian splatting method for few-shot novel view synthesis. We use monocular depth prediction as a prior, along with a scale-invariant depth loss, to constrain the 3D shape under just a few input views. We also model color using lower-order spherical harmonics to avoid overfitting. Further, we observe that removing splats with lower opacity periodically, as performed in the original work, leads to a very sparse point cloud and, hence, a lower-quality rendering. To mitigate this, we retain all the splats, leading to a better reconstruction in a few view settings. Experimental results show that our method outperforms the traditional 3D Gaussian splatting methods by achieving improvements of 10.5% in peak signal-to-noise ratio, 6% in structural similarity index, and 14.1% in perceptual similarity, thereby validating the effectiveness of our approach. The code will be made available at: https://github.com/raja-kumar/depth-aware-3DGS", + "arxiv_url": "http://arxiv.org/abs/2410.11080v1", + "pdf_url": "http://arxiv.org/pdf/2410.11080v1", + "published_date": "2024-10-14", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "https://github.com/raja-kumar/depth-aware-3DGS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DArticCyclists: Generating Simulated Dynamic 3D Cyclists for Human-Object Interaction (HOI) and Autonomous Driving Applications", + "authors": [ + "Eduardo R. Corral-Soto", + "Yang Liu", + "Tongtong Cao", + "Yuan Ren", + "Liu Bingbing" + ], + "abstract": "Human-object interaction (HOI) and human-scene interaction (HSI) are crucial for human-centric scene understanding applications in Embodied Artificial Intelligence (EAI), robotics, and augmented reality (AR). A common limitation faced in these research areas is the data scarcity problem: insufficient labeled human-scene object pairs on the input images, and limited interaction complexity and granularity between them. Recent HOI and HSI methods have addressed this issue by generating dynamic interactions with rigid objects. But more complex dynamic interactions such as a human rider pedaling an articulated bicycle have been unexplored. To address this limitation, and to enable research on complex dynamic human-articulated object interactions, in this paper we propose a method to generate simulated 3D dynamic cyclist assets and interactions. We designed a methodology for creating a new part-based multi-view articulated synthetic 3D bicycle dataset that we call 3DArticBikes that can be used to train NeRF and 3DGS-based 3D reconstruction methods. We then propose a 3DGS-based parametric bicycle composition model to assemble 8-DoF pose-controllable 3D bicycles. Finally, using dynamic information from cyclist videos, we build a complete synthetic dynamic 3D cyclist (rider pedaling a bicycle) by re-posing a selectable synthetic 3D person while automatically placing the rider onto one of our new articulated 3D bicycles using a proposed 3D Keypoint optimization-based Inverse Kinematics pose refinement. We present both, qualitative and quantitative results where we compare our generated cyclists against those from a recent stable diffusion-based method.", + "arxiv_url": "http://arxiv.org/abs/2410.10782v1", + "pdf_url": "http://arxiv.org/pdf/2410.10782v1", + "published_date": "2024-10-14", + "categories": [ + "cs.CV", + "cs.HC" + ], + "github_url": "", + "keywords": [ + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "4-LEGS: 4D Language Embedded Gaussian Splatting", + "authors": [ + "Gal Fiebelman", + "Tamir Cohen", + "Ayellet Morgenstern", + "Peter Hedman", + "Hadar Averbuch-Elor" + ], + "abstract": "The emergence of neural representations has revolutionized our means for digitally viewing a wide range of 3D scenes, enabling the synthesis of photorealistic images rendered from novel views. Recently, several techniques have been proposed for connecting these low-level representations with the high-level semantics understanding embodied within the scene. These methods elevate the rich semantic understanding from 2D imagery to 3D representations, distilling high-dimensional spatial features onto 3D space. In our work, we are interested in connecting language with a dynamic modeling of the world. We show how to lift spatio-temporal features to a 4D representation based on 3D Gaussian Splatting. This enables an interactive interface where the user can spatiotemporally localize events in the video from text prompts. We demonstrate our system on public 3D video datasets of people and animals performing various actions.", + "arxiv_url": "http://arxiv.org/abs/2410.10719v2", + "pdf_url": "http://arxiv.org/pdf/2410.10719v2", + "published_date": "2024-10-14", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "4DStyleGaussian: Zero-shot 4D Style Transfer with Gaussian Splatting", + "authors": [ + "Wanlin Liang", + "Hongbin Xu", + "Weitao Chen", + "Feng Xiao", + "Wenxiong Kang" + ], + "abstract": "3D neural style transfer has gained significant attention for its potential to provide user-friendly stylization with spatial consistency. However, existing 3D style transfer methods often fall short in terms of inference efficiency, generalization ability, and struggle to handle dynamic scenes with temporal consistency. In this paper, we introduce 4DStyleGaussian, a novel 4D style transfer framework designed to achieve real-time stylization of arbitrary style references while maintaining reasonable content affinity, multi-view consistency, and temporal coherence. Our approach leverages an embedded 4D Gaussian Splatting technique, which is trained using a reversible neural network for reducing content loss in the feature distillation process. Utilizing the 4D embedded Gaussians, we predict a 4D style transformation matrix that facilitates spatially and temporally consistent style transfer with Gaussian Splatting. Experiments demonstrate that our method can achieve high-quality and zero-shot stylization for 4D scenarios with enhanced efficiency and spatial-temporal consistency.", + "arxiv_url": "http://arxiv.org/abs/2410.10412v1", + "pdf_url": "http://arxiv.org/pdf/2410.10412v1", + "published_date": "2024-10-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GUISE: Graph GaUssIan Shading watErmark", + "authors": [ + "Renyi Yang" + ], + "abstract": "In the expanding field of generative artificial intelligence, integrating robust watermarking technologies is essential to protect intellectual property and maintain content authenticity. Traditionally, watermarking techniques have been developed primarily for rich information media such as images and audio. However, these methods have not been adequately adapted for graph-based data, particularly molecular graphs. Latent 3D graph diffusion(LDM-3DG) is an ascendant approach in the molecular graph generation field. This model effectively manages the complexities of molecular structures, preserving essential symmetries and topological features. We adapt the Gaussian Shading, a proven performance lossless watermarking technique, to the latent graph diffusion domain to protect this sophisticated new technology. Our adaptation simplifies the watermark diffusion process through duplication and padding, making it adaptable and suitable for various message types. We conduct several experiments using the LDM-3DG model on publicly available datasets QM9 and Drugs, to assess the robustness and effectiveness of our technique. Our results demonstrate that the watermarked molecules maintain statistical parity in 9 out of 10 performance metrics compared to the original. Moreover, they exhibit a 100% detection rate and a 99% extraction rate in a 2D decoded pipeline, while also showing robustness against post-editing attacks.", + "arxiv_url": "http://arxiv.org/abs/2410.10178v1", + "pdf_url": "http://arxiv.org/pdf/2410.10178v1", + "published_date": "2024-10-14", + "categories": [ + "cs.LG", + "cs.MM" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting Visual MPC for Granular Media Manipulation", + "authors": [ + "Wei-Cheng Tseng", + "Ellina Zhang", + "Krishna Murthy Jatavallabhula", + "Florian Shkurti" + ], + "abstract": "Recent advancements in learned 3D representations have enabled significant progress in solving complex robotic manipulation tasks, particularly for rigid-body objects. However, manipulating granular materials such as beans, nuts, and rice, remains challenging due to the intricate physics of particle interactions, high-dimensional and partially observable state, inability to visually track individual particles in a pile, and the computational demands of accurate dynamics prediction. Current deep latent dynamics models often struggle to generalize in granular material manipulation due to a lack of inductive biases. In this work, we propose a novel approach that learns a visual dynamics model over Gaussian splatting representations of scenes and leverages this model for manipulating granular media via Model-Predictive Control. Our method enables efficient optimization for complex manipulation tasks on piles of granular media. We evaluate our approach in both simulated and real-world settings, demonstrating its ability to solve unseen planning tasks and generalize to new environments in a zero-shot transfer. We also show significant prediction and manipulation performance improvements compared to existing granular media manipulation methods.", + "arxiv_url": "http://arxiv.org/abs/2410.09740v1", + "pdf_url": "http://arxiv.org/pdf/2410.09740v1", + "published_date": "2024-10-13", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors", + "authors": [ + "Hritam Basak", + "Hadi Tabatabaee", + "Shreekant Gayaka", + "Ming-Feng Li", + "Xin Yang", + "Cheng-Hao Kuo", + "Arnie Sen", + "Min Sun", + "Zhaozheng Yin" + ], + "abstract": "3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild. Accurately reconstructing an object's complete 3D structure and texture has numerous applications in real-world scenarios, including robotic manipulation, grasping, 3D scene understanding, and AR/VR. Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture by optimizing the efficient representation of Gaussian Splatting, guided by pre-trained 2D or 3D diffusion models. However, a notable disparity exists between the training datasets of these models, leading to distinct differences in their outputs. While 2D models generate highly detailed visuals, they lack cross-view consistency in geometry and texture. In contrast, 3D models ensure consistency across different views but often result in overly smooth textures. We propose bridging the gap between 2D and 3D diffusion models to address this limitation by integrating a two-stage frequency-based distillation loss with Gaussian Splatting. Specifically, we leverage geometric priors in the low-frequency spectrum from a 3D diffusion model to maintain consistent geometry and use a 2D diffusion model to refine the fidelity and texture in the high-frequency spectrum of the generated 3D structure, resulting in more detailed and fine-grained outcomes. Our approach enhances geometric consistency and visual quality, outperforming the current SOTA. Additionally, we demonstrate the easy adaptability of our method for efficient object pose estimation and tracking.", + "arxiv_url": "http://arxiv.org/abs/2410.09467v2", + "pdf_url": "http://arxiv.org/pdf/2410.09467v2", + "published_date": "2024-10-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction", + "authors": [ + "Jialei Chen", + "Xin Zhang", + "Mobarakol Islam", + "Francisco Vasconcelos", + "Danail Stoyanov", + "Daniel S. Elson", + "Baoru Huang" + ], + "abstract": "Accurate 3D reconstruction of dynamic surgical scenes from endoscopic video is essential for robotic-assisted surgery. While recent 3D Gaussian Splatting methods have shown promise in achieving high-quality reconstructions with fast rendering speeds, their use of inverse depth loss functions compresses depth variations. This can lead to a loss of fine geometric details, limiting their ability to capture precise 3D geometry and effectiveness in intraoperative application. To address these challenges, we present SurgicalGS, a dynamic 3D Gaussian Splatting framework specifically designed for surgical scene reconstruction with improved geometric accuracy. Our approach first initialises a Gaussian point cloud using depth priors, employing binary motion masks to identify pixels with significant depth variations and fusing point clouds from depth maps across frames for initialisation. We use the Flexible Deformation Model to represent dynamic scene and introduce a normalised depth regularisation loss along with an unsupervised depth smoothness constraint to ensure more accurate geometric reconstruction. Extensive experiments on two real surgical datasets demonstrate that SurgicalGS achieves state-of-the-art reconstruction quality, especially in terms of accurate geometry, advancing the usability of 3D Gaussian Splatting in robotic-assisted surgery.", + "arxiv_url": "http://arxiv.org/abs/2410.09292v1", + "pdf_url": "http://arxiv.org/pdf/2410.09292v1", + "published_date": "2024-10-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering", + "authors": [ + "Jaehoon Choi", + "Yonghan Lee", + "Hyungtae Lee", + "Heesung Kwon", + "Dinesh Manocha" + ], + "abstract": "Recently, 3D Gaussian splatting has gained attention for its capability to generate high-fidelity rendering results. At the same time, most applications such as games, animation, and AR/VR use mesh-based representations to represent and render 3D scenes. We propose a novel approach that integrates mesh representation with 3D Gaussian splats to perform high-quality rendering of reconstructed real-world scenes. In particular, we introduce a distance-based Gaussian splatting technique to align the Gaussian splats with the mesh surface and remove redundant Gaussian splats that do not contribute to the rendering. We consider the distance between each Gaussian splat and the mesh surface to distinguish between tightly-bound and loosely-bound Gaussian splats. The tightly-bound splats are flattened and aligned well with the mesh geometry. The loosely-bound Gaussian splats are used to account for the artifacts in reconstructed 3D meshes in terms of rendering. We present a training strategy of binding Gaussian splats to the mesh geometry, and take into account both types of splats. In this context, we introduce several regularization techniques aimed at precisely aligning tightly-bound Gaussian splats with the mesh surface during the training process. We validate the effectiveness of our method on large and unbounded scene from mip-NeRF 360 and Deep Blending datasets. Our method surpasses recent mesh-based neural rendering techniques by achieving a 2dB higher PSNR, and outperforms mesh-based Gaussian splatting methods by 1.3 dB PSNR, particularly on the outdoor mip-NeRF 360 dataset, demonstrating better rendering quality. We provide analyses for each type of Gaussian splat and achieve a reduction in the number of Gaussian splats by 30% compared to the original 3D Gaussian splatting.", + "arxiv_url": "http://arxiv.org/abs/2410.08941v1", + "pdf_url": "http://arxiv.org/pdf/2410.08941v1", + "published_date": "2024-10-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars", + "authors": [ + "Xuan Huang", + "Hanhui Li", + "Wanquan Liu", + "Xiaodan Liang", + "Yiqiang Yan", + "Yuhao Cheng", + "Chengqiang Gao" + ], + "abstract": "In this paper, we propose to create animatable avatars for interacting hands with 3D Gaussian Splatting (GS) and single-image inputs. Existing GS-based methods designed for single subjects often yield unsatisfactory results due to limited input views, various hand poses, and occlusions. To address these challenges, we introduce a novel two-stage interaction-aware GS framework that exploits cross-subject hand priors and refines 3D Gaussians in interacting areas. Particularly, to handle hand variations, we disentangle the 3D presentation of hands into optimization-based identity maps and learning-based latent geometric features and neural texture maps. Learning-based features are captured by trained networks to provide reliable priors for poses, shapes, and textures, while optimization-based identity maps enable efficient one-shot fitting of out-of-distribution hands. Furthermore, we devise an interaction-aware attention module and a self-adaptive Gaussian refinement module. These modules enhance image rendering quality in areas with intra- and inter-hand interactions, overcoming the limitations of existing GS-based methods. Our proposed method is validated via extensive experiments on the large-scale InterHand2.6M dataset, and it significantly improves the state-of-the-art performance in image quality. Project Page: \\url{https://github.com/XuanHuang0/GuassianHand}.", + "arxiv_url": "http://arxiv.org/abs/2410.08840v1", + "pdf_url": "http://arxiv.org/pdf/2410.08840v1", + "published_date": "2024-10-11", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/XuanHuang0/GuassianHand", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization", + "authors": [ + "Christian Schmidt", + "Jens Piekenbrinck", + "Bastian Leibe" + ], + "abstract": "3D Gaussian Splatting has recently emerged as a powerful tool for fast and accurate novel-view synthesis from a set of posed input images. However, like most novel-view synthesis approaches, it relies on accurate camera pose information, limiting its applicability in real-world scenarios where acquiring accurate camera poses can be challenging or even impossible. We propose an extension to the 3D Gaussian Splatting framework by optimizing the extrinsic camera parameters with respect to photometric residuals. We derive the analytical gradients and integrate their computation with the existing high-performance CUDA implementation. This enables downstream tasks such as 6-DoF camera pose estimation as well as joint reconstruction and camera refinement. In particular, we achieve rapid convergence and high accuracy for pose estimation on real-world scenes. Our method enables fast reconstruction of 3D scenes without requiring accurate pose information by jointly optimizing geometry and camera poses, while achieving state-of-the-art results in novel-view synthesis. Our approach is considerably faster to optimize than most competing methods, and several times faster in rendering. We show results on real-world scenes and complex trajectories through simulated environments, achieving state-of-the-art results on LLFF while reducing runtime by two to four times compared to the most efficient competing method. Source code will be available at https://github.com/Schmiddo/noposegs .", + "arxiv_url": "http://arxiv.org/abs/2410.08743v1", + "pdf_url": "http://arxiv.org/pdf/2410.08743v1", + "published_date": "2024-10-11", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/Schmiddo/noposegs", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction", + "authors": [ + "Irving Fang", + "Kairui Shi", + "Xujin He", + "Siqi Tan", + "Yifan Wang", + "Hanwen Zhao", + "Hung-Jui Huang", + "Wenzhen Yuan", + "Chen Feng", + "Jing Zhang" + ], + "abstract": "Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings. Emulating this capability, we introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors. FusionSense addresses three key challenges: (i) How can robots efficiently acquire robust global shape information about the surrounding scene and objects? (ii) How can robots strategically select touch points on the object using geometric and common-sense priors? (iii) How can partial observations such as tactile signals improve the overall representation of the object? Our framework employs 3D Gaussian Splatting as a core representation and incorporates a hierarchical optimization strategy involving global structure construction, object visual hull pruning and local geometric constraints. This advancement results in fast and robust perception in environments with traditionally challenging objects that are transparent, reflective, or dark, enabling more downstream manipulation or navigation tasks. Experiments on real-world data suggest that our framework outperforms previously state-of-the-art sparse-view methods. All code and data are open-sourced on the project website.", + "arxiv_url": "http://arxiv.org/abs/2410.08282v1", + "pdf_url": "http://arxiv.org/pdf/2410.08282v1", + "published_date": "2024-10-10", + "categories": [ + "cs.RO", + "cs.AI", + "cs.CV", + "cs.GR", + "I.4.5; I.4.8" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Poison-splat: Computation Cost Attack on 3D Gaussian Splatting", + "authors": [ + "Jiahao Lu", + "Yifan Zhang", + "Qiuhong Shen", + "Xinchao Wang", + "Shuicheng Yan" + ], + "abstract": "3D Gaussian splatting (3DGS), known for its groundbreaking performance and efficiency, has become a dominant 3D representation and brought progress to many 3D vision tasks. However, in this work, we reveal a significant security vulnerability that has been largely overlooked in 3DGS: the computation cost of training 3DGS could be maliciously tampered by poisoning the input data. By developing an attack named Poison-splat, we reveal a novel attack surface where the adversary can poison the input images to drastically increase the computation memory and time needed for 3DGS training, pushing the algorithm towards its worst computation complexity. In extreme cases, the attack can even consume all allocable memory, leading to a Denial-of-Service (DoS) that disrupts servers, resulting in practical damages to real-world 3DGS service vendors. Such a computation cost attack is achieved by addressing a bi-level optimization problem through three tailored strategies: attack objective approximation, proxy model rendering, and optional constrained optimization. These strategies not only ensure the effectiveness of our attack but also make it difficult to defend with simple defensive measures. We hope the revelation of this novel attack surface can spark attention to this crucial yet overlooked vulnerability of 3DGS systems.", + "arxiv_url": "http://arxiv.org/abs/2410.08190v1", + "pdf_url": "http://arxiv.org/pdf/2410.08190v1", + "published_date": "2024-10-10", + "categories": [ + "cs.CV", + "cs.CR", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DifFRelight: Diffusion-Based Facial Performance Relighting", + "authors": [ + "Mingming He", + "Pascal Clausen", + "Ahmet Levent Taşel", + "Li Ma", + "Oliver Pilarski", + "Wenqi Xian", + "Laszlo Rikker", + "Xueming Yu", + "Ryan Burgert", + "Ning Yu", + "Paul Debevec" + ], + "abstract": "We present a novel framework for free-viewpoint facial performance relighting using diffusion-based image-to-image translation. Leveraging a subject-specific dataset containing diverse facial expressions captured under various lighting conditions, including flat-lit and one-light-at-a-time (OLAT) scenarios, we train a diffusion model for precise lighting control, enabling high-fidelity relit facial images from flat-lit inputs. Our framework includes spatially-aligned conditioning of flat-lit captures and random noise, along with integrated lighting information for global control, utilizing prior knowledge from the pre-trained Stable Diffusion model. This model is then applied to dynamic facial performances captured in a consistent flat-lit environment and reconstructed for novel-view synthesis using a scalable dynamic 3D Gaussian Splatting method to maintain quality and consistency in the relit results. In addition, we introduce unified lighting control by integrating a novel area lighting representation with directional lighting, allowing for joint adjustments in light size and direction. We also enable high dynamic range imaging (HDRI) composition using multiple directional lights to produce dynamic sequences under complex lighting conditions. Our evaluations demonstrate the models efficiency in achieving precise lighting control and generalizing across various facial expressions while preserving detailed features such as skintexture andhair. The model accurately reproduces complex lighting effects like eye reflections, subsurface scattering, self-shadowing, and translucency, advancing photorealism within our framework.", + "arxiv_url": "http://arxiv.org/abs/2410.08188v1", + "pdf_url": "http://arxiv.org/pdf/2410.08188v1", + "published_date": "2024-10-10", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image", + "authors": [ + "Xiaoxue Chen", + "Jv Zheng", + "Hao Huang", + "Haoran Xu", + "Weihao Gu", + "Kangliang Chen", + "He xiang", + "Huan-ang Gao", + "Hao Zhao", + "Guyue Zhou", + "Yaqin Zhang" + ], + "abstract": "The generation of high-quality 3D car assets is essential for various applications, including video games, autonomous driving, and virtual reality. Current 3D generation methods utilizing NeRF or 3D-GS as representations for 3D objects, generate a Lambertian object under fixed lighting and lack separated modelings for material and global illumination. As a result, the generated assets are unsuitable for relighting under varying lighting conditions, limiting their applicability in downstream tasks. To address this challenge, we propose a novel relightable 3D object generative framework that automates the creation of 3D car assets, enabling the swift and accurate reconstruction of a vehicle's geometry, texture, and material properties from a single input image. Our approach begins with introducing a large-scale synthetic car dataset comprising over 1,000 high-precision 3D vehicle models. We represent 3D objects using global illumination and relightable 3D Gaussian primitives integrating with BRDF parameters. Building on this representation, we introduce a feed-forward model that takes images as input and outputs both relightable 3D Gaussians and global illumination parameters. Experimental results demonstrate that our method produces photorealistic 3D car assets that can be seamlessly integrated into road scenes with different illuminations, which offers substantial practical benefits for industrial applications.", + "arxiv_url": "http://arxiv.org/abs/2410.08181v1", + "pdf_url": "http://arxiv.org/pdf/2410.08181v1", + "published_date": "2024-10-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics", + "authors": [ + "Junyi Cao", + "Shanyan Guan", + "Yanhao Ge", + "Wei Li", + "Xiaokang Yang", + "Chao Ma" + ], + "abstract": "While humans effortlessly discern intrinsic dynamics and adapt to new scenarios, modern AI systems often struggle. Current methods for visual grounding of dynamics either use pure neural-network-based simulators (black box), which may violate physical laws, or traditional physical simulators (white box), which rely on expert-defined equations that may not fully capture actual dynamics. We propose the Neural Material Adaptor (NeuMA), which integrates existing physical laws with learned corrections, facilitating accurate learning of actual dynamics while maintaining the generalizability and interpretability of physical priors. Additionally, we propose Particle-GS, a particle-driven 3D Gaussian Splatting variant that bridges simulation and observed images, allowing back-propagate image gradients to optimize the simulator. Comprehensive experiments on various dynamics in terms of grounded particle accuracy, dynamic rendering quality, and generalization ability demonstrate that NeuMA can accurately capture intrinsic dynamics.", + "arxiv_url": "http://arxiv.org/abs/2410.08257v1", + "pdf_url": "http://arxiv.org/pdf/2410.08257v1", + "published_date": "2024-10-10", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Efficient Perspective-Correct 3D Gaussian Splatting Using Hybrid Transparency", + "authors": [ + "Florian Hahlbohm", + "Fabian Friederichs", + "Tim Weyrich", + "Linus Franke", + "Moritz Kappel", + "Susana Castillo", + "Marc Stamminger", + "Martin Eisemann", + "Marcus Magnor" + ], + "abstract": "3D Gaussian Splats (3DGS) have proven a versatile rendering primitive, both for inverse rendering as well as real-time exploration of scenes. In these applications, coherence across camera frames and multiple views is crucial, be it for robust convergence of a scene reconstruction or for artifact-free fly-throughs. Recent work started mitigating artifacts that break multi-view coherence, including popping artifacts due to inconsistent transparency sorting and perspective-correct outlines of (2D) splats. At the same time, real-time requirements forced such implementations to accept compromises in how transparency of large assemblies of 3D Gaussians is resolved, in turn breaking coherence in other ways. In our work, we aim at achieving maximum coherence, by rendering fully perspective-correct 3D Gaussians while using a high-quality approximation of accurate blending, hybrid transparency, on a per-pixel level, in order to retain real-time frame rates. Our fast and perspectively accurate approach for evaluation of 3D Gaussians does not require matrix inversions, thereby ensuring numerical stability and eliminating the need for special handling of degenerate splats, and the hybrid transparency formulation for blending maintains similar quality as fully resolved per-pixel transparencies at a fraction of the rendering costs. We further show that each of these two components can be independently integrated into Gaussian splatting systems. In combination, they achieve up to 2$\\times$ higher frame rates, 2$\\times$ faster optimization, and equal or better image quality with fewer rendering artifacts compared to traditional 3DGS on common benchmarks.", + "arxiv_url": "http://arxiv.org/abs/2410.08129v2", + "pdf_url": "http://arxiv.org/pdf/2410.08129v2", + "published_date": "2024-10-10", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera", + "authors": [ + "Jian Huang", + "Chengrui Dong", + "Peidong Liu" + ], + "abstract": "Implicit neural representation and explicit 3D Gaussian Splatting (3D-GS) for novel view synthesis have achieved remarkable progress with frame-based camera (e.g. RGB and RGB-D cameras) recently. Compared to frame-based camera, a novel type of bio-inspired visual sensor, i.e. event camera, has demonstrated advantages in high temporal resolution, high dynamic range, low power consumption and low latency. Due to its unique asynchronous and irregular data capturing process, limited work has been proposed to apply neural representation or 3D Gaussian splatting for an event camera. In this work, we present IncEventGS, an incremental 3D Gaussian Splatting reconstruction algorithm with a single event camera. To recover the 3D scene representation incrementally, we exploit the tracking and mapping paradigm of conventional SLAM pipelines for IncEventGS. Given the incoming event stream, the tracker firstly estimates an initial camera motion based on prior reconstructed 3D-GS scene representation. The mapper then jointly refines both the 3D scene representation and camera motion based on the previously estimated motion trajectory from the tracker. The experimental results demonstrate that IncEventGS delivers superior performance compared to prior NeRF-based methods and other related baselines, even we do not have the ground-truth camera poses. Furthermore, our method can also deliver better performance compared to state-of-the-art event visual odometry methods in terms of camera motion estimation. Code is publicly available at: https://github.com/wu-cvgl/IncEventGS.", + "arxiv_url": "http://arxiv.org/abs/2410.08107v2", + "pdf_url": "http://arxiv.org/pdf/2410.08107v2", + "published_date": "2024-10-10", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/wu-cvgl/IncEventGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Fast Feedforward 3D Gaussian Splatting Compression", + "authors": [ + "Yihang Chen", + "Qianyi Wu", + "Mengyao Li", + "Weiyao Lin", + "Mehrtash Harandi", + "Jianfei Cai" + ], + "abstract": "With 3D Gaussian Splatting (3DGS) advancing real-time and high-fidelity rendering for novel view synthesis, storage requirements pose challenges for their widespread adoption. Although various compression techniques have been proposed, previous art suffers from a common limitation: for any existing 3DGS, per-scene optimization is needed to achieve compression, making the compression sluggish and slow. To address this issue, we introduce Fast Compression of 3D Gaussian Splatting (FCGS), an optimization-free model that can compress 3DGS representations rapidly in a single feed-forward pass, which significantly reduces compression time from minutes to seconds. To enhance compression efficiency, we propose a multi-path entropy module that assigns Gaussian attributes to different entropy constraint paths for balance between size and fidelity. We also carefully design both inter- and intra-Gaussian context models to remove redundancies among the unstructured Gaussian blobs. Overall, FCGS achieves a compression ratio of over 20X while maintaining fidelity, surpassing most per-scene SOTA optimization-based methods. Our code is available at: https://github.com/YihangChen-ee/FCGS.", + "arxiv_url": "http://arxiv.org/abs/2410.08017v2", + "pdf_url": "http://arxiv.org/pdf/2410.08017v2", + "published_date": "2024-10-10", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/YihangChen-ee/FCGS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Generalizable and Animatable Gaussian Head Avatar", + "authors": [ + "Xuangeng Chu", + "Tatsuya Harada" + ], + "abstract": "In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction. Existing methods rely on neural radiance fields, leading to heavy rendering consumption and low reenactment speeds. To address these limitations, we generate the parameters of 3D Gaussians from a single image in a single forward pass. The key innovation of our work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details. Additionally, we leverage global image features and the 3D morphable model to construct 3D Gaussians for controlling expressions. After training, our model can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds. Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy. We believe our method can establish new benchmarks for future research and advance applications of digital avatars. Code and demos are available https://github.com/xg-chu/GAGAvatar.", + "arxiv_url": "http://arxiv.org/abs/2410.07971v1", + "pdf_url": "http://arxiv.org/pdf/2410.07971v1", + "published_date": "2024-10-10", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "https://github.com/xg-chu/GAGAvatar", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "L-VITeX: Light-weight Visual Intuition for Terrain Exploration", + "authors": [ + "Antar Mazumder", + "Zarin Anjum Madhiha" + ], + "abstract": "This paper presents L-VITeX, a lightweight visual intuition system for terrain exploration designed for resource-constrained robots and swarms. L-VITeX aims to provide a hint of Regions of Interest (RoIs) without computationally expensive processing. By utilizing the Faster Objects, More Objects (FOMO) tinyML architecture, the system achieves high accuracy (>99%) in RoI detection while operating on minimal hardware resources (Peak RAM usage < 50 KB) with near real-time inference (<200 ms). The paper evaluates L-VITeX's performance across various terrains, including mountainous areas, underwater shipwreck debris regions, and Martian rocky surfaces. Additionally, it demonstrates the system's application in 3D mapping using a small mobile robot run by ESP32-Cam and Gaussian Splats (GS), showcasing its potential to enhance exploration efficiency and decision-making.", + "arxiv_url": "http://arxiv.org/abs/2410.07872v1", + "pdf_url": "http://arxiv.org/pdf/2410.07872v1", + "published_date": "2024-10-10", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting", + "authors": [ + "Ruijie Zhu", + "Yanzhe Liang", + "Hanzhi Chang", + "Jiacheng Deng", + "Jiahao Lu", + "Wenfei Yang", + "Tianzhu Zhang", + "Yongdong Zhang" + ], + "abstract": "Dynamic scene reconstruction is a long-term challenge in the field of 3D vision. Recently, the emergence of 3D Gaussian Splatting has provided new insights into this problem. Although subsequent efforts rapidly extend static 3D Gaussian to dynamic scenes, they often lack explicit constraints on object motion, leading to optimization difficulties and performance degradation. To address the above issues, we propose a novel deformable 3D Gaussian splatting framework called MotionGS, which explores explicit motion priors to guide the deformation of 3D Gaussians. Specifically, we first introduce an optical flow decoupling module that decouples optical flow into camera flow and motion flow, corresponding to camera movement and object motion respectively. Then the motion flow can effectively constrain the deformation of 3D Gaussians, thus simulating the motion of dynamic objects. Additionally, a camera pose refinement module is proposed to alternately optimize 3D Gaussians and camera poses, mitigating the impact of inaccurate camera poses. Extensive experiments in the monocular dynamic scenes validate that MotionGS surpasses state-of-the-art methods and exhibits significant superiority in both qualitative and quantitative results. Project page: https://ruijiezhu94.github.io/MotionGS_page", + "arxiv_url": "http://arxiv.org/abs/2410.07707v1", + "pdf_url": "http://arxiv.org/pdf/2410.07707v1", + "published_date": "2024-10-10", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Vision-Language Gaussian Splatting", + "authors": [ + "Qucheng Peng", + "Benjamin Planche", + "Zhongpai Gao", + "Meng Zheng", + "Anwesa Choudhuri", + "Terrence Chen", + "Chen Chen", + "Ziyan Wu" + ], + "abstract": "Recent advancements in 3D reconstruction methods and vision-language models have propelled the development of multi-modal 3D scene understanding, which has vital applications in robotics, autonomous driving, and virtual/augmented reality. However, current multi-modal scene understanding approaches have naively embedded semantic representations into 3D reconstruction methods without striking a balance between visual and language modalities, which leads to unsatisfying semantic rasterization of translucent or reflective objects, as well as over-fitting on color modality. To alleviate these limitations, we propose a solution that adequately handles the distinct visual and semantic modalities, i.e., a 3D vision-language Gaussian splatting model for scene understanding, to put emphasis on the representation learning of language modality. We propose a novel cross-modal rasterizer, using modality fusion along with a smoothed semantic indicator for enhancing semantic rasterization. We also employ a camera-view blending technique to improve semantic consistency between existing and synthesized views, thereby effectively mitigating over-fitting. Extensive experiments demonstrate that our method achieves state-of-the-art performance in open-vocabulary semantic segmentation, surpassing existing methods by a significant margin.", + "arxiv_url": "http://arxiv.org/abs/2410.07577v1", + "pdf_url": "http://arxiv.org/pdf/2410.07577v1", + "published_date": "2024-10-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Thing2Reality: Transforming 2D Content into Conditioned Multiviews and 3D Gaussian Objects for XR Communication", + "authors": [ + "Erzhen Hu", + "Mingyi Li", + "Jungtaek Hong", + "Xun Qian", + "Alex Olwal", + "David Kim", + "Seongkook Heo", + "Ruofei Du" + ], + "abstract": "During remote communication, participants often share both digital and physical content, such as product designs, digital assets, and environments, to enhance mutual understanding. Recent advances in augmented communication have facilitated users to swiftly create and share digital 2D copies of physical objects from video feeds into a shared space. However, conventional 2D representations of digital objects restricts users' ability to spatially reference items in a shared immersive environment. To address this, we propose Thing2Reality, an Extended Reality (XR) communication platform that enhances spontaneous discussions of both digital and physical items during remote sessions. With Thing2Reality, users can quickly materialize ideas or physical objects in immersive environments and share them as conditioned multiview renderings or 3D Gaussians. Thing2Reality enables users to interact with remote objects or discuss concepts in a collaborative manner. Our user study revealed that the ability to interact with and manipulate 3D representations of objects significantly enhances the efficiency of discussions, with the potential to augment discussion of 2D artifacts.", + "arxiv_url": "http://arxiv.org/abs/2410.07119v1", + "pdf_url": "http://arxiv.org/pdf/2410.07119v1", + "published_date": "2024-10-09", + "categories": [ + "cs.HC", + "cs.AI", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation", + "authors": [ + "Zhiqi Li", + "Yiming Chen", + "Peidong Liu" + ], + "abstract": "Recent advancements in 2D/3D generative techniques have facilitated the generation of dynamic 3D objects from monocular videos. Previous methods mainly rely on the implicit neural radiance fields (NeRF) or explicit Gaussian Splatting as the underlying representation, and struggle to achieve satisfactory spatial-temporal consistency and surface appearance. Drawing inspiration from modern 3D animation pipelines, we introduce DreamMesh4D, a novel framework combining mesh representation with geometric skinning technique to generate high-quality 4D object from a monocular video. Instead of utilizing classical texture map for appearance, we bind Gaussian splats to triangle face of mesh for differentiable optimization of both the texture and mesh vertices. In particular, DreamMesh4D begins with a coarse mesh obtained through an image-to-3D generation procedure. Sparse points are then uniformly sampled across the mesh surface, and are used to build a deformation graph to drive the motion of the 3D object for the sake of computational efficiency and providing additional constraint. For each step, transformations of sparse control points are predicted using a deformation network, and the mesh vertices as well as the surface Gaussians are deformed via a novel geometric skinning algorithm, which is a hybrid approach combining LBS (linear blending skinning) and DQS (dual-quaternion skinning), mitigating drawbacks associated with both approaches. The static surface Gaussians and mesh vertices as well as the deformation network are learned via reference view photometric loss, score distillation loss as well as other regularizers in a two-stage manner. Extensive experiments demonstrate superior performance of our method. Furthermore, our method is compatible with modern graphic pipelines, showcasing its potential in the 3D gaming and film industry.", + "arxiv_url": "http://arxiv.org/abs/2410.06756v1", + "pdf_url": "http://arxiv.org/pdf/2410.06756v1", + "published_date": "2024-10-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ES-Gaussian: Gaussian Splatting Mapping via Error Space-Based Gaussian Completion", + "authors": [ + "Lu Chen", + "Yingfu Zeng", + "Haoang Li", + "Zhitao Deng", + "Jiafu Yan", + "Zhenjun Zhao" + ], + "abstract": "Accurate and affordable indoor 3D reconstruction is critical for effective robot navigation and interaction. Traditional LiDAR-based mapping provides high precision but is costly, heavy, and power-intensive, with limited ability for novel view rendering. Vision-based mapping, while cost-effective and capable of capturing visual data, often struggles with high-quality 3D reconstruction due to sparse point clouds. We propose ES-Gaussian, an end-to-end system using a low-altitude camera and single-line LiDAR for high-quality 3D indoor reconstruction. Our system features Visual Error Construction (VEC) to enhance sparse point clouds by identifying and correcting areas with insufficient geometric detail from 2D error maps. Additionally, we introduce a novel 3DGS initialization method guided by single-line LiDAR, overcoming the limitations of traditional multi-view setups and enabling effective reconstruction in resource-constrained environments. Extensive experimental results on our new Dreame-SR dataset and a publicly available dataset demonstrate that ES-Gaussian outperforms existing methods, particularly in challenging scenarios. The project page is available at https://chenlu-china.github.io/ES-Gaussian/.", + "arxiv_url": "http://arxiv.org/abs/2410.06613v2", + "pdf_url": "http://arxiv.org/pdf/2410.06613v2", + "published_date": "2024-10-09", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Representation Methods: A Survey", + "authors": [ + "Zhengren Wang" + ], + "abstract": "The field of 3D representation has experienced significant advancements, driven by the increasing demand for high-fidelity 3D models in various applications such as computer graphics, virtual reality, and autonomous systems. This review examines the development and current state of 3D representation methods, highlighting their research trajectories, innovations, strength and weakness. Key techniques such as Voxel Grid, Point Cloud, Mesh, Signed Distance Function (SDF), Neural Radiance Field (NeRF), 3D Gaussian Splatting, Tri-Plane, and Deep Marching Tetrahedra (DMTet) are reviewed. The review also introduces essential datasets that have been pivotal in advancing the field, highlighting their characteristics and impact on research progress. Finally, we explore potential research directions that hold promise for further expanding the capabilities and applications of 3D representation methods.", + "arxiv_url": "http://arxiv.org/abs/2410.06475v1", + "pdf_url": "http://arxiv.org/pdf/2410.06475v1", + "published_date": "2024-10-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting", + "authors": [ + "Weixing Zhang", + "Zongrui Li", + "De Ma", + "Huajin Tang", + "Xudong Jiang", + "Qian Zheng", + "Gang Pan" + ], + "abstract": "3D Gaussian Splatting is capable of reconstructing 3D scenes in minutes. Despite recent advances in improving surface reconstruction accuracy, the reconstructed results still exhibit bias and suffer from inefficiency in storage and training. This paper provides a different observation on the cause of the inefficiency and the reconstruction bias, which is attributed to the integration of the low-opacity parts (LOPs) of the generated Gaussians. We show that LOPs consist of Gaussians with overall low-opacity (LOGs) and the low-opacity tails (LOTs) of Gaussians. We propose Spiking GS to reduce such two types of LOPs by integrating spiking neurons into the Gaussian Splatting pipeline. Specifically, we introduce global and local full-precision integrate-and-fire spiking neurons to the opacity and representation function of flattened 3D Gaussians, respectively. Furthermore, we enhance the density control strategy with spiking neurons' thresholds and a new criterion on the scale of Gaussians. Our method can represent more accurate reconstructed surfaces at a lower cost. The supplementary material and code are available at https://github.com/zju-bmi-lab/SpikingGS.", + "arxiv_url": "http://arxiv.org/abs/2410.07266v4", + "pdf_url": "http://arxiv.org/pdf/2410.07266v4", + "published_date": "2024-10-09", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/zju-bmi-lab/SpikingGS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HiSplat: Hierarchical 3D Gaussian Splatting for Generalizable Sparse-View Reconstruction", + "authors": [ + "Shengji Tang", + "Weicai Ye", + "Peng Ye", + "Weihao Lin", + "Yang Zhou", + "Tao Chen", + "Wanli Ouyang" + ], + "abstract": "Reconstructing 3D scenes from multiple viewpoints is a fundamental task in stereo vision. Recently, advances in generalizable 3D Gaussian Splatting have enabled high-quality novel view synthesis for unseen scenes from sparse input views by feed-forward predicting per-pixel Gaussian parameters without extra optimization. However, existing methods typically generate single-scale 3D Gaussians, which lack representation of both large-scale structure and texture details, resulting in mislocation and artefacts. In this paper, we propose a novel framework, HiSplat, which introduces a hierarchical manner in generalizable 3D Gaussian Splatting to construct hierarchical 3D Gaussians via a coarse-to-fine strategy. Specifically, HiSplat generates large coarse-grained Gaussians to capture large-scale structures, followed by fine-grained Gaussians to enhance delicate texture details. To promote inter-scale interactions, we propose an Error Aware Module for Gaussian compensation and a Modulating Fusion Module for Gaussian repair. Our method achieves joint optimization of hierarchical representations, allowing for novel view synthesis using only two-view reference images. Comprehensive experiments on various datasets demonstrate that HiSplat significantly enhances reconstruction quality and cross-dataset generalization compared to prior single-scale methods. The corresponding ablation study and analysis of different-scale 3D Gaussians reveal the mechanism behind the effectiveness. Project website: https://open3dvlab.github.io/HiSplat/", + "arxiv_url": "http://arxiv.org/abs/2410.06245v1", + "pdf_url": "http://arxiv.org/pdf/2410.06245v1", + "published_date": "2024-10-08", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RelitLRM: Generative Relightable Radiance for Large Reconstruction Models", + "authors": [ + "Tianyuan Zhang", + "Zhengfei Kuang", + "Haian Jin", + "Zexiang Xu", + "Sai Bi", + "Hao Tan", + "He Zhang", + "Yiwei Hu", + "Milos Hasan", + "William T. Freeman", + "Kai Zhang", + "Fujun Luan" + ], + "abstract": "We propose RelitLRM, a Large Reconstruction Model (LRM) for generating high-quality Gaussian splatting representations of 3D objects under novel illuminations from sparse (4-8) posed images captured under unknown static lighting. Unlike prior inverse rendering methods requiring dense captures and slow optimization, often causing artifacts like incorrect highlights or shadow baking, RelitLRM adopts a feed-forward transformer-based model with a novel combination of a geometry reconstructor and a relightable appearance generator based on diffusion. The model is trained end-to-end on synthetic multi-view renderings of objects under varying known illuminations. This architecture design enables to effectively decompose geometry and appearance, resolve the ambiguity between material and lighting, and capture the multi-modal distribution of shadows and specularity in the relit appearance. We show our sparse-view feed-forward RelitLRM offers competitive relighting results to state-of-the-art dense-view optimization-based baselines while being significantly faster. Our project page is available at: https://relit-lrm.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2410.06231v2", + "pdf_url": "http://arxiv.org/pdf/2410.06231v2", + "published_date": "2024-10-08", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSLoc: Visual Localization with 3D Gaussian Splatting", + "authors": [ + "Kazii Botashev", + "Vladislav Pyatov", + "Gonzalo Ferrer", + "Stamatios Lefkimmiatis" + ], + "abstract": "We present GSLoc: a new visual localization method that performs dense camera alignment using 3D Gaussian Splatting as a map representation of the scene. GSLoc backpropagates pose gradients over the rendering pipeline to align the rendered and target images, while it adopts a coarse-to-fine strategy by utilizing blurring kernels to mitigate the non-convexity of the problem and improve the convergence. The results show that our approach succeeds at visual localization in challenging conditions of relatively small overlap between initial and target frames inside textureless environments when state-of-the-art neural sparse methods provide inferior results. Using the byproduct of realistic rendering from the 3DGS map representation, we show how to enhance localization results by mixing a set of observed and virtual reference keyframes when solving the image retrieval problem. We evaluate our method both on synthetic and real-world data, discussing its advantages and application potential.", + "arxiv_url": "http://arxiv.org/abs/2410.06165v1", + "pdf_url": "http://arxiv.org/pdf/2410.06165v1", + "published_date": "2024-10-08", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplaTraj: Camera Trajectory Generation with Semantic Gaussian Splatting", + "authors": [ + "Xinyi Liu", + "Tianyi Zhang", + "Matthew Johnson-Roberson", + "Weiming Zhi" + ], + "abstract": "Many recent developments for robots to represent environments have focused on photorealistic reconstructions. This paper particularly focuses on generating sequences of images from the photorealistic Gaussian Splatting models, that match instructions that are given by user-inputted language. We contribute a novel framework, SplaTraj, which formulates the generation of images within photorealistic environment representations as a continuous-time trajectory optimization problem. Costs are designed so that a camera following the trajectory poses will smoothly traverse through the environment and render the specified spatial information in a photogenic manner. This is achieved by querying a photorealistic representation with language embedding to isolate regions that correspond to the user-specified inputs. These regions are then projected to the camera's view as it moves over time and a cost is constructed. We can then apply gradient-based optimization and differentiate through the rendering to optimize the trajectory for the defined cost. The resulting trajectory moves to photogenically view each of the specified objects. We empirically evaluate our approach on a suite of environments and instructions, and demonstrate the quality of generated image sequences.", + "arxiv_url": "http://arxiv.org/abs/2410.06014v1", + "pdf_url": "http://arxiv.org/pdf/2410.06014v1", + "published_date": "2024-10-08", + "categories": [ + "cs.RO", + "cs.AI", + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Comparative Analysis of Novel View Synthesis and Photogrammetry for 3D Forest Stand Reconstruction and extraction of individual tree parameters", + "authors": [ + "Guoji Tian", + "Chongcheng Chen", + "Hongyu Huang" + ], + "abstract": "Accurate and efficient 3D reconstruction of trees is crucial for forest resource assessments and management. Close-Range Photogrammetry (CRP) is commonly used for reconstructing forest scenes but faces challenges like low efficiency and poor quality. Recently, Novel View Synthesis (NVS) technologies, including Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have shown promise for 3D plant reconstruction with limited images. However, existing research mainly focuses on small plants in orchards or individual trees, leaving uncertainty regarding their application in larger, complex forest stands. In this study, we collected sequential images of forest plots with varying complexity and performed dense reconstruction using NeRF and 3DGS. The resulting point clouds were compared with those from photogrammetry and laser scanning. Results indicate that NVS methods significantly enhance reconstruction efficiency. Photogrammetry struggles with complex stands, leading to point clouds with excessive canopy noise and incorrectly reconstructed trees, such as duplicated trunks. NeRF, while better for canopy regions, may produce errors in ground areas with limited views. The 3DGS method generates sparser point clouds, particularly in trunk areas, affecting diameter at breast height (DBH) accuracy. All three methods can extract tree height information, with NeRF yielding the highest accuracy; however, photogrammetry remains superior for DBH accuracy. These findings suggest that NVS methods have significant potential for 3D reconstruction of forest stands, offering valuable support for complex forest resource inventory and visualization tasks.", + "arxiv_url": "http://arxiv.org/abs/2410.05772v1", + "pdf_url": "http://arxiv.org/pdf/2410.05772v1", + "published_date": "2024-10-08", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PH-Dropout: Practical Epistemic Uncertainty Quantification for View Synthesis", + "authors": [ + "Chuanhao Sun", + "Thanos Triantafyllou", + "Anthos Makris", + "Maja Drmač", + "Kai Xu", + "Luo Mai", + "Mahesh K. Marina" + ], + "abstract": "View synthesis using Neural Radiance Fields (NeRF) and Gaussian Splatting (GS) has demonstrated impressive fidelity in rendering real-world scenarios. However, practical methods for accurate and efficient epistemic Uncertainty Quantification (UQ) in view synthesis are lacking. Existing approaches for NeRF either introduce significant computational overhead (e.g., ``10x increase in training time\" or ``10x repeated training\") or are limited to specific uncertainty conditions or models. Notably, GS models lack any systematic approach for comprehensive epistemic UQ. This capability is crucial for improving the robustness and scalability of neural view synthesis, enabling active model updates, error estimation, and scalable ensemble modeling based on uncertainty. In this paper, we revisit NeRF and GS-based methods from a function approximation perspective, identifying key differences and connections in 3D representation learning. Building on these insights, we introduce PH-Dropout (Post hoc Dropout), the first real-time and accurate method for epistemic uncertainty estimation that operates directly on pre-trained NeRF and GS models. Extensive evaluations validate our theoretical findings and demonstrate the effectiveness of PH-Dropout.", + "arxiv_url": "http://arxiv.org/abs/2410.05468v2", + "pdf_url": "http://arxiv.org/pdf/2410.05468v2", + "published_date": "2024-10-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting", + "authors": [ + "Yukang Cao", + "Masoud Hadi", + "Liang Pan", + "Ziwei Liu" + ], + "abstract": "Diffusion-based 2D virtual try-on (VTON) techniques have recently demonstrated strong performance, while the development of 3D VTON has largely lagged behind. Despite recent advances in text-guided 3D scene editing, integrating 2D VTON into these pipelines to achieve vivid 3D VTON remains challenging. The reasons are twofold. First, text prompts cannot provide sufficient details in describing clothing. Second, 2D VTON results generated from different viewpoints of the same 3D scene lack coherence and spatial relationships, hence frequently leading to appearance inconsistencies and geometric distortions. To resolve these problems, we introduce an image-prompted 3D VTON method (dubbed GS-VTON) which, by leveraging 3D Gaussian Splatting (3DGS) as the 3D representation, enables the transfer of pre-trained knowledge from 2D VTON models to 3D while improving cross-view consistency. (1) Specifically, we propose a personalized diffusion model that utilizes low-rank adaptation (LoRA) fine-tuning to incorporate personalized information into pre-trained 2D VTON models. To achieve effective LoRA training, we introduce a reference-driven image editing approach that enables the simultaneous editing of multi-view images while ensuring consistency. (2) Furthermore, we propose a persona-aware 3DGS editing framework to facilitate effective editing while maintaining consistent cross-view appearance and high-quality 3D geometry. (3) Additionally, we have established a new 3D VTON benchmark, 3D-VTONBench, which facilitates comprehensive qualitative and quantitative 3D VTON evaluations. Through extensive experiments and comparative analyses with existing methods, the proposed \\OM has demonstrated superior fidelity and advanced editing capabilities, affirming its effectiveness for 3D VTON.", + "arxiv_url": "http://arxiv.org/abs/2410.05259v1", + "pdf_url": "http://arxiv.org/pdf/2410.05259v1", + "published_date": "2024-10-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting", + "authors": [ + "Qifeng Chen", + "Sheng Yang", + "Sicong Du", + "Tao Tang", + "Peng Chen", + "Yuchi Huo" + ], + "abstract": "LiDAR simulation plays a crucial role in closed-loop simulation for autonomous driving. Although recent advancements, such as the use of reconstructed mesh and Neural Radiance Fields (NeRF), have made progress in simulating the physical properties of LiDAR, these methods have struggled to achieve satisfactory frame rates and rendering quality. To address these limitations, we present LiDAR-GS, the first LiDAR Gaussian Splatting method, for real-time high-fidelity re-simulation of LiDAR sensor scans in public urban road scenes. The vanilla Gaussian Splatting, designed for camera models, cannot be directly applied to LiDAR re-simulation. To bridge the gap between passive camera and active LiDAR, our LiDAR-GS designs a differentiable laser beam splatting, grounded in the LiDAR range view model. This innovation allows for precise surface splatting by projecting lasers onto micro cross-sections, effectively eliminating artifacts associated with local affine approximations. Additionally, LiDAR-GS leverages Neural Gaussian Fields, which further integrate view-dependent clues, to represent key LiDAR properties that are influenced by the incident angle and external factors. Combining these practices with some essential adaptations, e.g., dynamic instances decomposition, our approach succeeds in simultaneously re-simulating depth, intensity, and ray-drop channels, achieving state-of-the-art results in both rendering frame rate and quality on publically available large scene datasets. Our source code will be made publicly available.", + "arxiv_url": "http://arxiv.org/abs/2410.05111v1", + "pdf_url": "http://arxiv.org/pdf/2410.05111v1", + "published_date": "2024-10-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamSat: Towards a General 3D Model for Novel View Synthesis of Space Objects", + "authors": [ + "Nidhi Mathihalli", + "Audrey Wei", + "Giovanni Lavezzi", + "Peng Mun Siew", + "Victor Rodriguez-Fernandez", + "Hodei Urrutxua", + "Richard Linares" + ], + "abstract": "Novel view synthesis (NVS) enables to generate new images of a scene or convert a set of 2D images into a comprehensive 3D model. In the context of Space Domain Awareness, since space is becoming increasingly congested, NVS can accurately map space objects and debris, improving the safety and efficiency of space operations. Similarly, in Rendezvous and Proximity Operations missions, 3D models can provide details about a target object's shape, size, and orientation, allowing for better planning and prediction of the target's behavior. In this work, we explore the generalization abilities of these reconstruction techniques, aiming to avoid the necessity of retraining for each new scene, by presenting a novel approach to 3D spacecraft reconstruction from single-view images, DreamSat, by fine-tuning the Zero123 XL, a state-of-the-art single-view reconstruction model, on a high-quality dataset of 190 high-quality spacecraft models and integrating it into the DreamGaussian framework. We demonstrate consistent improvements in reconstruction quality across multiple metrics, including Contrastive Language-Image Pretraining (CLIP) score (+0.33%), Peak Signal-to-Noise Ratio (PSNR) (+2.53%), Structural Similarity Index (SSIM) (+2.38%), and Learned Perceptual Image Patch Similarity (LPIPS) (+0.16%) on a test set of 30 previously unseen spacecraft images. Our method addresses the lack of domain-specific 3D reconstruction tools in the space industry by leveraging state-of-the-art diffusion models and 3D Gaussian splatting techniques. This approach maintains the efficiency of the DreamGaussian framework while enhancing the accuracy and detail of spacecraft reconstructions. The code for this work can be accessed on GitHub (https://github.com/ARCLab-MIT/space-nvs).", + "arxiv_url": "http://arxiv.org/abs/2410.05097v1", + "pdf_url": "http://arxiv.org/pdf/2410.05097v1", + "published_date": "2024-10-07", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "https://github.com/ARCLab-MIT/space-nvs", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PhotoReg: Photometrically Registering 3D Gaussian Splatting Models", + "authors": [ + "Ziwen Yuan", + "Tianyi Zhang", + "Matthew Johnson-Roberson", + "Weiming Zhi" + ], + "abstract": "Building accurate representations of the environment is critical for intelligent robots to make decisions during deployment. Advances in photorealistic environment models have enabled robots to develop hyper-realistic reconstructions, which can be used to generate images that are intuitive for human inspection. In particular, the recently introduced \\ac{3DGS}, which describes the scene with up to millions of primitive ellipsoids, can be rendered in real time. \\ac{3DGS} has rapidly gained prominence. However, a critical unsolved problem persists: how can we fuse multiple \\ac{3DGS} into a single coherent model? Solving this problem will enable robot teams to jointly build \\ac{3DGS} models of their surroundings. A key insight of this work is to leverage the {duality} between photorealistic reconstructions, which render realistic 2D images from 3D structure, and \\emph{3D foundation models}, which predict 3D structure from image pairs. To this end, we develop PhotoReg, a framework to register multiple photorealistic \\ac{3DGS} models with 3D foundation models. As \\ac{3DGS} models are generally built from monocular camera images, they have \\emph{arbitrary scale}. To resolve this, PhotoReg actively enforces scale consistency among the different \\ac{3DGS} models by considering depth estimates within these models. Then, the alignment is iteratively refined with fine-grained photometric losses to produce high-quality fused \\ac{3DGS} models. We rigorously evaluate PhotoReg on both standard benchmark datasets and our custom-collected datasets, including with two quadruped robots. The code is released at \\url{ziweny11.github.io/photoreg}.", + "arxiv_url": "http://arxiv.org/abs/2410.05044v1", + "pdf_url": "http://arxiv.org/pdf/2410.05044v1", + "published_date": "2024-10-07", + "categories": [ + "cs.RO", + "cs.AI", + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering", + "authors": [ + "Zhongpai Gao", + "Benjamin Planche", + "Meng Zheng", + "Anwesa Choudhuri", + "Terrence Chen", + "Ziyan Wu" + ], + "abstract": "Novel view synthesis has advanced significantly with the development of neural radiance fields (NeRF) and 3D Gaussian splatting (3DGS). However, achieving high quality without compromising real-time rendering remains challenging, particularly for physically-based ray tracing with view-dependent effects. Recently, N-dimensional Gaussians (N-DG) introduced a 6D spatial-angular representation to better incorporate view-dependent effects, but the Gaussian representation and control scheme are sub-optimal. In this paper, we revisit 6D Gaussians and introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. Our approach is fully compatible with the 3DGS framework and significantly improves real-time radiance field rendering by better modeling view-dependent effects and fine details. Experiments demonstrate that 6DGS significantly outperforms 3DGS and N-DG, achieving up to a 15.73 dB improvement in PSNR with a reduction of 66.5% Gaussian points compared to 3DGS. The project page is: https://gaozhongpai.github.io/6dgs/", + "arxiv_url": "http://arxiv.org/abs/2410.04974v2", + "pdf_url": "http://arxiv.org/pdf/2410.04974v2", + "published_date": "2024-10-07", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting", + "authors": [ + "Matthew Strong", + "Boshu Lei", + "Aiden Swann", + "Wen Jiang", + "Kostas Daniilidis", + "Monroe Kennedy III" + ], + "abstract": "We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at https://arm.stanford.edu/next-best-sense.", + "arxiv_url": "http://arxiv.org/abs/2410.04680v3", + "pdf_url": "http://arxiv.org/pdf/2410.04680v3", + "published_date": "2024-10-07", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering", + "authors": [ + "Yonghan Lee", + "Jaehoon Choi", + "Dongki Jung", + "Jaeseong Yun", + "Soohyun Ryu", + "Dinesh Manocha", + "Suyong Yeon" + ], + "abstract": "We present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on the true geometry in ground-robot datasets. Our method integrates pixel-aligned anchors from monocular depths and generates Gaussian splats around these anchors using residual-form Gaussian decoders. To address the inherent scale ambiguity of monocular depth, we parameterize anchors with per-view depth-scales and employ scale-consistent depth loss for online scale calibration. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns, and achieves state-of-the-art rendering performance on the R3LIVE odometry dataset and the Tanks and Temples dataset.", + "arxiv_url": "http://arxiv.org/abs/2410.04646v1", + "pdf_url": "http://arxiv.org/pdf/2410.04646v1", + "published_date": "2024-10-06", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "StreetSurfGS: Scalable Urban Street Surface Reconstruction with Planar-based Gaussian Splatting", + "authors": [ + "Xiao Cui", + "Weicai Ye", + "Yifan Wang", + "Guofeng Zhang", + "Wengang Zhou", + "Houqiang Li" + ], + "abstract": "Reconstructing urban street scenes is crucial due to its vital role in applications such as autonomous driving and urban planning. These scenes are characterized by long and narrow camera trajectories, occlusion, complex object relationships, and data sparsity across multiple scales. Despite recent advancements, existing surface reconstruction methods, which are primarily designed for object-centric scenarios, struggle to adapt effectively to the unique characteristics of street scenes. To address this challenge, we introduce StreetSurfGS, the first method to employ Gaussian Splatting specifically tailored for scalable urban street scene surface reconstruction. StreetSurfGS utilizes a planar-based octree representation and segmented training to reduce memory costs, accommodate unique camera characteristics, and ensure scalability. Additionally, to mitigate depth inaccuracies caused by object overlap, we propose a guided smoothing strategy within regularization to eliminate inaccurate boundary points and outliers. Furthermore, to address sparse views and multi-scale challenges, we use a dual-step matching strategy that leverages adjacent and long-term information. Extensive experiments validate the efficacy of StreetSurfGS in both novel view synthesis and surface reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2410.04354v2", + "pdf_url": "http://arxiv.org/pdf/2410.04354v2", + "published_date": "2024-10-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Variational Bayes Gaussian Splatting", + "authors": [ + "Toon Van de Maele", + "Ozan Catal", + "Alexander Tschantz", + "Christopher L. Buckley", + "Tim Verbelen" + ], + "abstract": "Recently, 3D Gaussian Splatting has emerged as a promising approach for modeling 3D scenes using mixtures of Gaussians. The predominant optimization method for these models relies on backpropagating gradients through a differentiable rendering pipeline, which struggles with catastrophic forgetting when dealing with continuous streams of data. To address this limitation, we propose Variational Bayes Gaussian Splatting (VBGS), a novel approach that frames training a Gaussian splat as variational inference over model parameters. By leveraging the conjugacy properties of multivariate Gaussians, we derive a closed-form variational update rule, allowing efficient updates from partial, sequential observations without the need for replay buffers. Our experiments show that VBGS not only matches state-of-the-art performance on static datasets, but also enables continual learning from sequentially streamed 2D and 3D data, drastically improving performance in this setting.", + "arxiv_url": "http://arxiv.org/abs/2410.03592v1", + "pdf_url": "http://arxiv.org/pdf/2410.03592v1", + "published_date": "2024-10-04", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Flash-Splat: 3D Reflection Removal with Flash Cues and Gaussian Splats", + "authors": [ + "Mingyang Xie", + "Haoming Cai", + "Sachin Shah", + "Yiran Xu", + "Brandon Y. Feng", + "Jia-Bin Huang", + "Christopher A. Metzler" + ], + "abstract": "We introduce a simple yet effective approach for separating transmitted and reflected light. Our key insight is that the powerful novel view synthesis capabilities provided by modern inverse rendering methods (e.g.,~3D Gaussian splatting) allow one to perform flash/no-flash reflection separation using unpaired measurements -- this relaxation dramatically simplifies image acquisition over conventional paired flash/no-flash reflection separation methods. Through extensive real-world experiments, we demonstrate our method, Flash-Splat, accurately reconstructs both transmitted and reflected scenes in 3D. Our method outperforms existing 3D reflection separation methods, which do not leverage illumination control, by a large margin. Our project webpage is at https://flash-splat.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2410.02764v1", + "pdf_url": "http://arxiv.org/pdf/2410.02764v1", + "published_date": "2024-10-03", + "categories": [ + "cs.CV", + "cs.LG", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering", + "authors": [ + "Hongze Chen", + "Zehong Lin", + "Jun Zhang" + ], + "abstract": "We present GI-GS, a novel inverse rendering framework that leverages 3D Gaussian Splatting (3DGS) and deferred shading to achieve photo-realistic novel view synthesis and relighting. In inverse rendering, accurately modeling the shading processes of objects is essential for achieving high-fidelity results. Therefore, it is critical to incorporate global illumination to account for indirect lighting that reaches an object after multiple bounces across the scene. Previous 3DGS-based methods have attempted to model indirect lighting by characterizing indirect illumination as learnable lighting volumes or additional attributes of each Gaussian, while using baked occlusion to represent shadow effects. These methods, however, fail to accurately model the complex physical interactions between light and objects, making it impossible to construct realistic indirect illumination during relighting. To address this limitation, we propose to calculate indirect lighting using efficient path tracing with deferred shading. In our framework, we first render a G-buffer to capture the detailed geometry and material properties of the scene. Then, we perform physically-based rendering (PBR) only for direct lighting. With the G-buffer and previous rendering results, the indirect lighting can be calculated through a lightweight path tracing. Our method effectively models indirect lighting under any given lighting conditions, thereby achieving better novel view synthesis and relighting. Quantitative and qualitative results show that our GI-GS outperforms existing baselines in both rendering quality and efficiency.", + "arxiv_url": "http://arxiv.org/abs/2410.02619v1", + "pdf_url": "http://arxiv.org/pdf/2410.02619v1", + "published_date": "2024-10-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SuperGS: Super-Resolution 3D Gaussian Splatting via Latent Feature Field and Gradient-guided Splitting", + "authors": [ + "Shiyun Xie", + "Zhiru Wang", + "Yinghao Zhu", + "Chengwei Pan" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has exceled in novel view synthesis with its real-time rendering capabilities and superior quality. However, it faces challenges for high-resolution novel view synthesis (HRNVS) due to the coarse nature of primitives derived from low-resolution input views. To address this issue, we propose Super-Resolution 3DGS (SuperGS), which is an expansion of 3DGS designed with a two-stage coarse-to-fine training framework, utilizing pretrained low-resolution scene representation as an initialization for super-resolution optimization. Moreover, we introduce Multi-resolution Feature Gaussian Splatting (MFGS) to incorporates a latent feature field for flexible feature sampling and Gradient-guided Selective Splitting (GSS) for effective Gaussian upsampling. By integrating these strategies within the coarse-to-fine framework ensure both high fidelity and memory efficiency. Extensive experiments demonstrate that SuperGS surpasses state-of-the-art HRNVS methods on challenging real-world datasets using only low-resolution inputs.", + "arxiv_url": "http://arxiv.org/abs/2410.02571v2", + "pdf_url": "http://arxiv.org/pdf/2410.02571v2", + "published_date": "2024-10-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis", + "authors": [ + "Xiaobiao Du", + "Yida Wang", + "Xin Yu" + ], + "abstract": "Recent works in volume rendering, \\textit{e.g.} NeRF and 3D Gaussian Splatting (3DGS), significantly advance the rendering quality and efficiency with the help of the learned implicit neural radiance field or 3D Gaussians. Rendering on top of an explicit representation, the vanilla 3DGS and its variants deliver real-time efficiency by optimizing the parametric model with single-view supervision per iteration during training which is adopted from NeRF. Consequently, certain views are overfitted, leading to unsatisfying appearance in novel-view synthesis and imprecise 3D geometries. To solve aforementioned problems, we propose a new 3DGS optimization method embodying four key novel contributions: 1) We transform the conventional single-view training paradigm into a multi-view training strategy. With our proposed multi-view regulation, 3D Gaussian attributes are further optimized without overfitting certain training views. As a general solution, we improve the overall accuracy in a variety of scenarios and different Gaussian variants. 2) Inspired by the benefit introduced by additional views, we further propose a cross-intrinsic guidance scheme, leading to a coarse-to-fine training procedure concerning different resolutions. 3) Built on top of our multi-view regulated training, we further propose a cross-ray densification strategy, densifying more Gaussian kernels in the ray-intersect regions from a selection of views. 4) By further investigating the densification strategy, we found that the effect of densification should be enhanced when certain views are distinct dramatically. As a solution, we propose a novel multi-view augmented densification strategy, where 3D Gaussians are encouraged to get densified to a sufficient number accordingly, resulting in improved reconstruction accuracy.", + "arxiv_url": "http://arxiv.org/abs/2410.02103v1", + "pdf_url": "http://arxiv.org/pdf/2410.02103v1", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis", + "authors": [ + "Alexander Mai", + "Peter Hedman", + "George Kopanas", + "Dor Verbin", + "David Futschik", + "Qiangeng Xu", + "Falko Kuester", + "Jonathan T. Barron", + "Yinda Zhang" + ], + "abstract": "We present Exact Volumetric Ellipsoid Rendering (EVER), a method for real-time differentiable emission-only volume rendering. Unlike recent rasterization based approach by 3D Gaussian Splatting (3DGS), our primitive based representation allows for exact volume rendering, rather than alpha compositing 3D Gaussian billboards. As such, unlike 3DGS our formulation does not suffer from popping artifacts and view dependent density, but still achieves frame rates of $\\sim\\!30$ FPS at 720p on an NVIDIA RTX4090. Since our approach is built upon ray tracing it enables effects such as defocus blur and camera distortion (e.g. such as from fisheye cameras), which are difficult to achieve by rasterization. We show that our method is more accurate with fewer blending issues than 3DGS and follow-up work on view-consistent rendering, especially on the challenging large-scale scenes from the Zip-NeRF dataset where it achieves sharpest results among real-time techniques.", + "arxiv_url": "http://arxiv.org/abs/2410.01804v5", + "pdf_url": "http://arxiv.org/pdf/2410.01804v5", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection", + "authors": [ + "Yang Cao", + "Yuanliang Jv", + "Dan Xu" + ], + "abstract": "Neural Radiance Fields (NeRF) are widely used for novel-view synthesis and have been adapted for 3D Object Detection (3DOD), offering a promising approach to 3DOD through view-synthesis representation. However, NeRF faces inherent limitations: (i) limited representational capacity for 3DOD due to its implicit nature, and (ii) slow rendering speeds. Recently, 3D Gaussian Splatting (3DGS) has emerged as an explicit 3D representation that addresses these limitations. Inspired by these advantages, this paper introduces 3DGS into 3DOD for the first time, identifying two main challenges: (i) Ambiguous spatial distribution of Gaussian blobs: 3DGS primarily relies on 2D pixel-level supervision, resulting in unclear 3D spatial distribution of Gaussian blobs and poor differentiation between objects and background, which hinders 3DOD; (ii) Excessive background blobs: 2D images often include numerous background pixels, leading to densely reconstructed 3DGS with many noisy Gaussian blobs representing the background, negatively affecting detection. To tackle the challenge (i), we leverage the fact that 3DGS reconstruction is derived from 2D images, and propose an elegant and efficient solution by incorporating 2D Boundary Guidance to significantly enhance the spatial distribution of Gaussian blobs, resulting in clearer differentiation between objects and their background. To address the challenge (ii), we propose a Box-Focused Sampling strategy using 2D boxes to generate object probability distribution in 3D spaces, allowing effective probabilistic sampling in 3D to retain more object blobs and reduce noisy background blobs. Benefiting from our designs, our 3DGS-DET significantly outperforms the SOTA NeRF-based method, NeRF-Det, achieving improvements of +6.6 on mAP@0.25 and +8.1 on mAP@0.5 for the ScanNet dataset, and impressive +31.5 on mAP@0.25 for the ARKITScenes dataset.", + "arxiv_url": "http://arxiv.org/abs/2410.01647v1", + "pdf_url": "http://arxiv.org/pdf/2410.01647v1", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting in Mirrors: Reflection-Aware Rendering via Virtual Camera Optimization", + "authors": [ + "Zihan Wang", + "Shuzhe Wang", + "Matias Turkulainen", + "Junyuan Fang", + "Juho Kannala" + ], + "abstract": "Recent advancements in 3D Gaussian Splatting (3D-GS) have revolutionized novel view synthesis, facilitating real-time, high-quality image rendering. However, in scenarios involving reflective surfaces, particularly mirrors, 3D-GS often misinterprets reflections as virtual spaces, resulting in blurred and inconsistent multi-view rendering within mirrors. Our paper presents a novel method aimed at obtaining high-quality multi-view consistent reflection rendering by modelling reflections as physically-based virtual cameras. We estimate mirror planes with depth and normal estimates from 3D-GS and define virtual cameras that are placed symmetrically about the mirror plane. These virtual cameras are then used to explain mirror reflections in the scene. To address imperfections in mirror plane estimates, we propose a straightforward yet effective virtual camera optimization method to enhance reflection quality. We collect a new mirror dataset including three real-world scenarios for more diverse evaluation. Experimental validation on both Mirror-Nerf and our real-world dataset demonstrate the efficacy of our approach. We achieve comparable or superior results while significantly reducing training time compared to previous state-of-the-art.", + "arxiv_url": "http://arxiv.org/abs/2410.01614v1", + "pdf_url": "http://arxiv.org/pdf/2410.01614v1", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians", + "authors": [ + "Shuyi Jiang", + "Qihao Zhao", + "Hossein Rahmani", + "De Wen Soh", + "Jun Liu", + "Na Zhao" + ], + "abstract": "Recently, with the development of Neural Radiance Fields and Gaussian Splatting, 3D reconstruction techniques have achieved remarkably high fidelity. However, the latent representations learnt by these methods are highly entangled and lack interpretability. In this paper, we propose a novel part-aware compositional reconstruction method, called GaussianBlock, that enables semantically coherent and disentangled representations, allowing for precise and physical editing akin to building blocks, while simultaneously maintaining high fidelity. Our GaussianBlock introduces a hybrid representation that leverages the advantages of both primitives, known for their flexible actionability and editability, and 3D Gaussians, which excel in reconstruction quality. Specifically, we achieve semantically coherent primitives through a novel attention-guided centering loss derived from 2D semantic priors, complemented by a dynamic splitting and fusion strategy. Furthermore, we utilize 3D Gaussians that hybridize with primitives to refine structural details and enhance fidelity. Additionally, a binding inheritance strategy is employed to strengthen and maintain the connection between the two. Our reconstructed scenes are evidenced to be disentangled, compositional, and compact across diverse benchmarks, enabling seamless, direct and precise editing while maintaining high quality.", + "arxiv_url": "http://arxiv.org/abs/2410.01535v2", + "pdf_url": "http://arxiv.org/pdf/2410.01535v2", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MiraGe: Editable 2D Images using Gaussian Splatting", + "authors": [ + "Joanna Waczyńska", + "Tomasz Szczepanik", + "Piotr Borycki", + "Sławomir Tadeja", + "Thomas Bohné", + "Przemysław Spurek" + ], + "abstract": "Implicit Neural Representations (INRs) approximate discrete data through continuous functions and are commonly used for encoding 2D images. Traditional image-based INRs employ neural networks to map pixel coordinates to RGB values, capturing shapes, colors, and textures within the network's weights. Recently, GaussianImage has been proposed as an alternative, using Gaussian functions instead of neural networks to achieve comparable quality and compression. Such a solution obtains a quality and compression ratio similar to classical INR models but does not allow image modification. In contrast, our work introduces a novel method, MiraGe, which uses mirror reflections to perceive 2D images in 3D space and employs flat-controlled Gaussians for precise 2D image editing. Our approach improves the rendering quality and allows realistic image modifications, including human-inspired perception of photos in the 3D world. Thanks to modeling images in 3D space, we obtain the illusion of 3D-based modification in 2D images. We also show that our Gaussian representation can be easily combined with a physics engine to produce physics-based modification of 2D images. Consequently, MiraGe allows for better quality than the standard approach and natural modification of 2D images.", + "arxiv_url": "http://arxiv.org/abs/2410.01521v1", + "pdf_url": "http://arxiv.org/pdf/2410.01521v1", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "UW-GS: Distractor-Aware 3D Gaussian Splatting for Enhanced Underwater Scene Reconstruction", + "authors": [ + "Haoran Wang", + "Nantheera Anantrasirichai", + "Fan Zhang", + "David Bull" + ], + "abstract": "3D Gaussian splatting (3DGS) offers the capability to achieve real-time high quality 3D scene rendering. However, 3DGS assumes that the scene is in a clear medium environment and struggles to generate satisfactory representations in underwater scenes, where light absorption and scattering are prevalent and moving objects are involved. To overcome these, we introduce a novel Gaussian Splatting-based method, UW-GS, designed specifically for underwater applications. It introduces a color appearance that models distance-dependent color variation, employs a new physics-based density control strategy to enhance clarity for distant objects, and uses a binary motion mask to handle dynamic content. Optimized with a well-designed loss function supporting for scattering media and strengthened by pseudo-depth maps, UW-GS outperforms existing methods with PSNR gains up to 1.26dB. To fully verify the effectiveness of the model, we also developed a new underwater dataset, S-UW, with dynamic object masks.", + "arxiv_url": "http://arxiv.org/abs/2410.01517v1", + "pdf_url": "http://arxiv.org/pdf/2410.01517v1", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EVA-Gaussian: 3D Gaussian-based Real-time Human Novel View Synthesis under Diverse Camera Settings", + "authors": [ + "Yingdong Hu", + "Zhening Liu", + "Jiawei Shao", + "Zehong Lin", + "Jun Zhang" + ], + "abstract": "The feed-forward based 3D Gaussian Splatting method has demonstrated exceptional capability in real-time human novel view synthesis. However, existing approaches are restricted to dense viewpoint settings, which limits their flexibility in free-viewpoint rendering across a wide range of camera view angle discrepancies. To address this limitation, we propose a real-time pipeline named EVA-Gaussian for 3D human novel view synthesis across diverse camera settings. Specifically, we first introduce an Efficient cross-View Attention (EVA) module to accurately estimate the position of each 3D Gaussian from the source images. Then, we integrate the source images with the estimated Gaussian position map to predict the attributes and feature embeddings of the 3D Gaussians. Moreover, we employ a recurrent feature refiner to correct artifacts caused by geometric errors in position estimation and enhance visual fidelity.To further improve synthesis quality, we incorporate a powerful anchor loss function for both 3D Gaussian attributes and human face landmarks. Experimental results on the THuman2.0 and THumansit datasets showcase the superiority of our EVA-Gaussian approach in rendering quality across diverse camera settings. Project page: https://zhenliuzju.github.io/huyingdong/EVA-Gaussian.", + "arxiv_url": "http://arxiv.org/abs/2410.01425v1", + "pdf_url": "http://arxiv.org/pdf/2410.01425v1", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian-Det: Learning Closed-Surface Gaussians for 3D Object Detection", + "authors": [ + "Hongru Yan", + "Yu Zheng", + "Yueqi Duan" + ], + "abstract": "Skins wrapping around our bodies, leathers covering over the sofa, sheet metal coating the car - it suggests that objects are enclosed by a series of continuous surfaces, which provides us with informative geometry prior for objectness deduction. In this paper, we propose Gaussian-Det which leverages Gaussian Splatting as surface representation for multi-view based 3D object detection. Unlike existing monocular or NeRF-based methods which depict the objects via discrete positional data, Gaussian-Det models the objects in a continuous manner by formulating the input Gaussians as feature descriptors on a mass of partial surfaces. Furthermore, to address the numerous outliers inherently introduced by Gaussian splatting, we accordingly devise a Closure Inferring Module (CIM) for the comprehensive surface-based objectness deduction. CIM firstly estimates the probabilistic feature residuals for partial surfaces given the underdetermined nature of Gaussian Splatting, which are then coalesced into a holistic representation on the overall surface closure of the object proposal. In this way, the surface information Gaussian-Det exploits serves as the prior on the quality and reliability of objectness and the information basis of proposal refinement. Experiments on both synthetic and real-world datasets demonstrate that Gaussian-Det outperforms various existing approaches, in terms of both average precision and recall.", + "arxiv_url": "http://arxiv.org/abs/2410.01404v1", + "pdf_url": "http://arxiv.org/pdf/2410.01404v1", + "published_date": "2024-10-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation", + "authors": [ + "Junlin Han", + "Jianyuan Wang", + "Andrea Vedaldi", + "Philip Torr", + "Filippos Kokkinos" + ], + "abstract": "Generating high-quality 3D content from text, single images, or sparse view images remains a challenging task with broad applications. Existing methods typically employ multi-view diffusion models to synthesize multi-view images, followed by a feed-forward process for 3D reconstruction. However, these approaches are often constrained by a small and fixed number of input views, limiting their ability to capture diverse viewpoints and, even worse, leading to suboptimal generation results if the synthesized views are of poor quality. To address these limitations, we propose Flex3D, a novel two-stage framework capable of leveraging an arbitrary number of high-quality input views. The first stage consists of a candidate view generation and curation pipeline. We employ a fine-tuned multi-view image diffusion model and a video diffusion model to generate a pool of candidate views, enabling a rich representation of the target 3D object. Subsequently, a view selection pipeline filters these views based on quality and consistency, ensuring that only the high-quality and reliable views are used for reconstruction. In the second stage, the curated views are fed into a Flexible Reconstruction Model (FlexRM), built upon a transformer architecture that can effectively process an arbitrary number of inputs. FlemRM directly outputs 3D Gaussian points leveraging a tri-plane representation, enabling efficient and detailed 3D generation. Through extensive exploration of design and training strategies, we optimize FlexRM to achieve superior performance in both reconstruction and generation tasks. Our results demonstrate that Flex3D achieves state-of-the-art performance, with a user study winning rate of over 92% in 3D generation tasks when compared to several of the latest feed-forward 3D generative models.", + "arxiv_url": "http://arxiv.org/abs/2410.00890v2", + "pdf_url": "http://arxiv.org/pdf/2410.00890v2", + "published_date": "2024-10-01", + "categories": [ + "cs.CV", + "cs.GR", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CaRtGS: Computational Alignment for Real-Time Gaussian Splatting SLAM", + "authors": [ + "Dapeng Feng", + "Zhiqiang Chen", + "Yizhen Yin", + "Shipeng Zhong", + "Yuhua Qi", + "Hongbo Chen" + ], + "abstract": "Simultaneous Localization and Mapping (SLAM) is pivotal in robotics, with photorealistic scene reconstruction emerging as a key challenge. To address this, we introduce Computational Alignment for Real-Time Gaussian Splatting SLAM (CaRtGS), a novel method enhancing the efficiency and quality of photorealistic scene reconstruction in real-time environments. Leveraging 3D Gaussian Splatting (3DGS), CaRtGS achieves superior rendering quality and processing speed, which is crucial for scene photorealistic reconstruction. Our approach tackles computational misalignment in Gaussian Splatting SLAM (GS-SLAM) through an adaptive strategy that optimizes training, addresses long-tail optimization, and refines densification. Experiments on Replica and TUM-RGBD datasets demonstrate CaRtGS's effectiveness in achieving high-fidelity rendering with fewer Gaussian primitives. This work propels SLAM towards real-time, photorealistic dense rendering, significantly advancing photorealistic scene representation. For the benefit of the research community, we release the code on our project website: https://dapengfeng.github.io/cartgs.", + "arxiv_url": "http://arxiv.org/abs/2410.00486v2", + "pdf_url": "http://arxiv.org/pdf/2410.00486v2", + "published_date": "2024-10-01", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGR-CAR: Coronary artery reconstruction from ultra-sparse 2D X-ray views with a 3D Gaussians representation", + "authors": [ + "Xueming Fu", + "Yingtai Li", + "Fenghe Tang", + "Jun Li", + "Mingyue Zhao", + "Gao-Jun Teng", + "S. Kevin Zhou" + ], + "abstract": "Reconstructing 3D coronary arteries is important for coronary artery disease diagnosis, treatment planning and operation navigation. Traditional reconstruction techniques often require many projections, while reconstruction from sparse-view X-ray projections is a potential way of reducing radiation dose. However, the extreme sparsity of coronary arteries in a 3D volume and ultra-limited number of projections pose significant challenges for efficient and accurate 3D reconstruction. To this end, we propose 3DGR-CAR, a 3D Gaussian Representation for Coronary Artery Reconstruction from ultra-sparse X-ray projections. We leverage 3D Gaussian representation to avoid the inefficiency caused by the extreme sparsity of coronary artery data and propose a Gaussian center predictor to overcome the noisy Gaussian initialization from ultra-sparse view projections. The proposed scheme enables fast and accurate 3D coronary artery reconstruction with only 2 views. Experimental results on two datasets indicate that the proposed approach significantly outperforms other methods in terms of voxel accuracy and visual quality of coronary arteries. The code will be available in https://github.com/windrise/3DGR-CAR.", + "arxiv_url": "http://arxiv.org/abs/2410.00404v1", + "pdf_url": "http://arxiv.org/pdf/2410.00404v1", + "published_date": "2024-10-01", + "categories": [ + "eess.IV", + "cs.CV" + ], + "github_url": "https://github.com/windrise/3DGR-CAR", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance", + "authors": [ + "Hongchao Shu", + "Mingxu Liu", + "Lalithkumar Seenivasan", + "Suxi Gu", + "Ping-Cheng Ku", + "Jonathan Knopf", + "Russell Taylor", + "Mathias Unberath" + ], + "abstract": "Arthroscopy is a minimally invasive surgical procedure used to diagnose and treat joint problems. The clinical workflow of arthroscopy typically involves inserting an arthroscope into the joint through a small incision, during which surgeons navigate and operate largely by relying on their visual assessment through the arthroscope. However, the arthroscope's restricted field of view and lack of depth perception pose challenges in navigating complex articular structures and achieving surgical precision during procedures. Aiming at enhancing intraoperative awareness, we present a robust pipeline that incorporates simultaneous localization and mapping, depth estimation, and 3D Gaussian splatting to realistically reconstruct intra-articular structures solely based on monocular arthroscope video. Extending 3D reconstruction to Augmented Reality (AR) applications, our solution offers AR assistance for articular notch measurement and annotation anchoring in a human-in-the-loop manner. Compared to traditional Structure-from-Motion and Neural Radiance Field-based methods, our pipeline achieves dense 3D reconstruction and competitive rendering fidelity with explicit 3D representation in 7 minutes on average. When evaluated on four phantom datasets, our method achieves RMSE = 2.21mm reconstruction error, PSNR = 32.86 and SSIM = 0.89 on average. Because our pipeline enables AR reconstruction and guidance directly from monocular arthroscopy without any additional data and/or hardware, our solution may hold the potential for enhancing intraoperative awareness and facilitating surgical precision in arthroscopy. Our AR measurement tool achieves accuracy within 1.59 +/- 1.81mm and the AR annotation tool achieves a mIoU of 0.721.", + "arxiv_url": "http://arxiv.org/abs/2410.00386v1", + "pdf_url": "http://arxiv.org/pdf/2410.00386v1", + "published_date": "2024-10-01", + "categories": [ + "cs.CV", + "cs.LG", + "F.2.2; I.2.7" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving", + "authors": [ + "Zhangshuo Qi", + "Junyi Ma", + "Jingyi Xu", + "Zijie Zhou", + "Luqi Cheng", + "Guangming Xiong" + ], + "abstract": "Place recognition is a crucial module to ensure autonomous vehicles obtain usable localization information in GPS-denied environments. In recent years, multimodal place recognition methods have gained increasing attention due to their ability to overcome the weaknesses of unimodal sensor systems by leveraging complementary information from different modalities. However, challenges arise from the necessity of harmonizing data across modalities and exploiting the spatio-temporal correlations between them sufficiently. In this paper, we propose a 3D Gaussian Splatting-based multimodal place recognition neural network dubbed GSPR. It explicitly combines multi-view RGB images and LiDAR point clouds into a spatio-temporally unified scene representation with the proposed Multimodal Gaussian Splatting. A network composed of 3D graph convolution and transformer is designed to extract high-level spatio-temporal features and global descriptors from the Gaussian scenes for place recognition. We evaluate our method on the nuScenes dataset, and the experimental results demonstrate that our method can effectively leverage complementary strengths of both multi-view cameras and LiDAR, achieving SOTA place recognition performance while maintaining solid generalization ability. Our open-source code is available at https://github.com/QiZS-BIT/GSPR.", + "arxiv_url": "http://arxiv.org/abs/2410.00299v1", + "pdf_url": "http://arxiv.org/pdf/2410.00299v1", + "published_date": "2024-10-01", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/QiZS-BIT/GSPR", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DressRecon: Freeform 4D Human Reconstruction from Monocular Video", + "authors": [ + "Jeff Tan", + "Donglai Xiang", + "Shubham Tulsiani", + "Deva Ramanan", + "Gengshan Yang" + ], + "abstract": "We present a method to reconstruct time-consistent human body models from monocular videos, focusing on extremely loose clothing or handheld object interactions. Prior work in human reconstruction is either limited to tight clothing with no object interactions, or requires calibrated multi-view captures or personalized template scans which are costly to collect at scale. Our key insight for high-quality yet flexible reconstruction is the careful combination of generic human priors about articulated body shape (learned from large-scale training data) with video-specific articulated \"bag-of-bones\" deformation (fit to a single video via test-time optimization). We accomplish this by learning a neural implicit model that disentangles body versus clothing deformations as separate motion model layers. To capture subtle geometry of clothing, we leverage image-based priors such as human body pose, surface normals, and optical flow during optimization. The resulting neural fields can be extracted into time-consistent meshes, or further optimized as explicit 3D Gaussians for high-fidelity interactive rendering. On datasets with highly challenging clothing deformations and object interactions, DressRecon yields higher-fidelity 3D reconstructions than prior art. Project page: https://jefftan969.github.io/dressrecon/", + "arxiv_url": "http://arxiv.org/abs/2409.20563v2", + "pdf_url": "http://arxiv.org/pdf/2409.20563v2", + "published_date": "2024-09-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RL-GSBridge: 3D Gaussian Splatting Based Real2Sim2Real Method for Robotic Manipulation Learning", + "authors": [ + "Yuxuan Wu", + "Lei Pan", + "Wenhua Wu", + "Guangming Wang", + "Yanzi Miao", + "Hesheng Wang" + ], + "abstract": "Sim-to-Real refers to the process of transferring policies learned in simulation to the real world, which is crucial for achieving practical robotics applications. However, recent Sim2real methods either rely on a large amount of augmented data or large learning models, which is inefficient for specific tasks. In recent years, radiance field-based reconstruction methods, especially the emergence of 3D Gaussian Splatting, making it possible to reproduce realistic real-world scenarios. To this end, we propose a novel real-to-sim-to-real reinforcement learning framework, RL-GSBridge, which introduces a mesh-based 3D Gaussian Splatting method to realize zero-shot sim-to-real transfer for vision-based deep reinforcement learning. We improve the mesh-based 3D GS modeling method by using soft binding constraints, enhancing the rendering quality of mesh models. We then employ a GS editing approach to synchronize rendering with the physics simulator, reflecting the interactions of the physical robot more accurately. Through a series of sim-to-real robotic arm experiments, including grasping and pick-and-place tasks, we demonstrate that RL-GSBridge maintains a satisfactory success rate in real-world task completion during sim-to-real transfer. Furthermore, a series of rendering metrics and visualization results indicate that our proposed mesh-based 3D Gaussian reduces artifacts in unstructured objects, demonstrating more realistic rendering performance.", + "arxiv_url": "http://arxiv.org/abs/2409.20291v1", + "pdf_url": "http://arxiv.org/pdf/2409.20291v1", + "published_date": "2024-09-30", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Robust Gaussian Splatting SLAM by Leveraging Loop Closure", + "authors": [ + "Zunjie Zhu", + "Youxu Fang", + "Xin Li", + "Chengang Yan", + "Feng Xu", + "Chau Yuen", + "Yanyan Li" + ], + "abstract": "3D Gaussian Splatting algorithms excel in novel view rendering applications and have been adapted to extend the capabilities of traditional SLAM systems. However, current Gaussian Splatting SLAM methods, designed mainly for hand-held RGB or RGB-D sensors, struggle with tracking drifts when used with rotating RGB-D camera setups. In this paper, we propose a robust Gaussian Splatting SLAM architecture that utilizes inputs from rotating multiple RGB-D cameras to achieve accurate localization and photorealistic rendering performance. The carefully designed Gaussian Splatting Loop Closure module effectively addresses the issue of accumulated tracking and mapping errors found in conventional Gaussian Splatting SLAM systems. First, each Gaussian is associated with an anchor frame and categorized as historical or novel based on its timestamp. By rendering different types of Gaussians at the same viewpoint, the proposed loop detection strategy considers both co-visibility relationships and distinct rendering outcomes. Furthermore, a loop closure optimization approach is proposed to remove camera pose drift and maintain the high quality of 3D Gaussian models. The approach uses a lightweight pose graph optimization algorithm to correct pose drift and updates Gaussians based on the optimized poses. Additionally, a bundle adjustment scheme further refines camera poses using photometric and geometric constraints, ultimately enhancing the global consistency of scenarios. Quantitative and qualitative evaluations on both synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art methods in camera pose estimation and novel view rendering tasks. The code will be open-sourced for the community.", + "arxiv_url": "http://arxiv.org/abs/2409.20111v1", + "pdf_url": "http://arxiv.org/pdf/2409.20111v1", + "published_date": "2024-09-30", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RNG: Relightable Neural Gaussians", + "authors": [ + "Jiahui Fan", + "Fujun Luan", + "Jian Yang", + "Miloš Hašan", + "Beibei Wang" + ], + "abstract": "3D Gaussian Splatting (3DGS) has shown impressive results for the novel view synthesis task, where lighting is assumed to be fixed. However, creating relightable 3D assets, especially for objects with ill-defined shapes (fur, fabric, etc.), remains a challenging task. The decomposition between light, geometry, and material is ambiguous, especially if either smooth surface assumptions or surfacebased analytical shading models do not apply. We propose Relightable Neural Gaussians (RNG), a novel 3DGS-based framework that enables the relighting of objects with both hard surfaces or soft boundaries, while avoiding assumptions on the shading model. We condition the radiance at each point on both view and light directions. We also introduce a shadow cue, as well as a depth refinement network to improve shadow accuracy. Finally, we propose a hybrid forward-deferred fitting strategy to balance geometry and appearance quality. Our method achieves significantly faster training (1.3 hours) and rendering (60 frames per second) compared to a prior method based on neural radiance fields and produces higher-quality shadows than a concurrent 3DGS-based method.", + "arxiv_url": "http://arxiv.org/abs/2409.19702v4", + "pdf_url": "http://arxiv.org/pdf/2409.19702v4", + "published_date": "2024-09-29", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "G3R: Gradient Guided Generalizable Reconstruction", + "authors": [ + "Yun Chen", + "Jingkang Wang", + "Ze Yang", + "Sivabalan Manivasagam", + "Raquel Urtasun" + ], + "abstract": "Large scale 3D scene reconstruction is important for applications such as virtual reality and simulation. Existing neural rendering approaches (e.g., NeRF, 3DGS) have achieved realistic reconstructions on large scenes, but optimize per scene, which is expensive and slow, and exhibit noticeable artifacts under large view changes due to overfitting. Generalizable approaches or large reconstruction models are fast, but primarily work for small scenes/objects and often produce lower quality rendering results. In this work, we introduce G3R, a generalizable reconstruction approach that can efficiently predict high-quality 3D scene representations for large scenes. We propose to learn a reconstruction network that takes the gradient feedback signals from differentiable rendering to iteratively update a 3D scene representation, combining the benefits of high photorealism from per-scene optimization with data-driven priors from fast feed-forward prediction methods. Experiments on urban-driving and drone datasets show that G3R generalizes across diverse large scenes and accelerates the reconstruction process by at least 10x while achieving comparable or better realism compared to 3DGS, and also being more robust to large view changes.", + "arxiv_url": "http://arxiv.org/abs/2409.19405v1", + "pdf_url": "http://arxiv.org/pdf/2409.19405v1", + "published_date": "2024-09-28", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting", + "authors": [ + "Tao Liu", + "Runze Yuan", + "Yi'ang Ju", + "Xun Xu", + "Jiaqi Yang", + "Xiangting Meng", + "Xavier Lagorce", + "Laurent Kneip" + ], + "abstract": "Reliable self-localization is a foundational skill for many intelligent mobile platforms. This paper explores the use of event cameras for motion tracking thereby providing a solution with inherent robustness under difficult dynamics and illumination. In order to circumvent the challenge of event camera-based mapping, the solution is framed in a cross-modal way. It tracks a map representation that comes directly from frame-based cameras. Specifically, the proposed method operates on top of gaussian splatting, a state-of-the-art representation that permits highly efficient and realistic novel view synthesis. The key of our approach consists of a novel pose parametrization that uses a reference pose plus first order dynamics for local differential image rendering. The latter is then compared against images of integrated events in a staggered coarse-to-fine optimization scheme. As demonstrated by our results, the realistic view rendering ability of gaussian splatting leads to stable and accurate tracking across a variety of both publicly available and newly recorded data sequences.", + "arxiv_url": "http://arxiv.org/abs/2409.19228v1", + "pdf_url": "http://arxiv.org/pdf/2409.19228v1", + "published_date": "2024-09-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction", + "authors": [ + "Jeongwan On", + "Kyeonghwan Gwak", + "Gunyoung Kang", + "Hyein Hwang", + "Soohyun Hwang", + "Junuk Cha", + "Jaewook Han", + "Seungryul Baek" + ], + "abstract": "This report describes our 1st place solution to the 8th HANDS workshop challenge (ARCTIC track) in conjunction with ECCV 2024. In this challenge, we address the task of bimanual category-agnostic hand-object interaction reconstruction, which aims to generate 3D reconstructions of both hands and the object from a monocular video, without relying on predefined templates. This task is particularly challenging due to the significant occlusion and dynamic contact between the hands and the object during bimanual manipulation. We worked to resolve these issues by introducing a mask loss and a 3D contact loss, respectively. Moreover, we applied 3D Gaussian Splatting (3DGS) to this task. As a result, our method achieved a value of 38.69 in the main metric, CD$_h$, on the ARCTIC test set.", + "arxiv_url": "http://arxiv.org/abs/2409.19215v2", + "pdf_url": "http://arxiv.org/pdf/2409.19215v2", + "published_date": "2024-09-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes", + "authors": [ + "Shuo Wang", + "Binbin Huang", + "Ruoyu Wang", + "Shenghua Gao" + ], + "abstract": "Previous surface reconstruction methods either suffer from low geometric accuracy or lengthy training times when dealing with real-world complex dynamic scenes involving multi-person activities, and human-object interactions. To tackle the dynamic contents and the occlusions in complex scenes, we present a space-time 2D Gaussian Splatting approach. Specifically, to improve geometric quality in dynamic scenes, we learn canonical 2D Gaussian splats and deform these 2D Gaussian splats while enforcing the disks of the Gaussian located on the surface of the objects by introducing depth and normal regularizers. Further, to tackle the occlusion issues in complex scenes, we introduce a compositional opacity deformation strategy, which further reduces the surface recovery of those occluded areas. Experiments on real-world sparse-view video datasets and monocular dynamic datasets demonstrate that our reconstructions outperform state-of-the-art methods, especially for the surface of the details. The project page and more visualizations can be found at: https://tb2-sy.github.io/st-2dgs/.", + "arxiv_url": "http://arxiv.org/abs/2409.18852v1", + "pdf_url": "http://arxiv.org/pdf/2409.18852v1", + "published_date": "2024-09-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation", + "authors": [ + "Mahtab Dahaghin", + "Myrna Castillo", + "Kourosh Riahidehkordi", + "Matteo Toso", + "Alessio Del Bue" + ], + "abstract": "The creation of digital replicas of physical objects has valuable applications for the preservation and dissemination of tangible cultural heritage. However, existing methods are often slow, expensive, and require expert knowledge. We propose a pipeline to generate a 3D replica of a scene using only RGB images (e.g. photos of a museum) and then extract a model for each item of interest (e.g. pieces in the exhibit). We do this by leveraging the advancements in novel view synthesis and Gaussian Splatting, modified to enable efficient 3D segmentation. This approach does not need manual annotation, and the visual inputs can be captured using a standard smartphone, making it both affordable and easy to deploy. We provide an overview of the method and baseline evaluation of the accuracy of object segmentation. The code is available at https://mahtaabdn.github.io/gaussian_heritage.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2409.19039v1", + "pdf_url": "http://arxiv.org/pdf/2409.19039v1", + "published_date": "2024-09-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration", + "authors": [ + "Yuezhan Tao", + "Dexter Ong", + "Varun Murali", + "Igor Spasojevic", + "Pratik Chaudhari", + "Vijay Kumar" + ], + "abstract": "We propose a framework for active mapping and exploration that leverages Gaussian splatting for constructing information-rich maps. Further, we develop a parallelized motion planning algorithm that can exploit the Gaussian map for real-time navigation. The Gaussian map constructed onboard the robot is optimized for both photometric and geometric quality while enabling real-time situational awareness for autonomy. We show through simulation experiments that our method is competitive with approaches that use alternate information gain metrics, while being orders of magnitude faster to compute. In real-world experiments, our algorithm achieves better map quality (10% higher Peak Signal-to-Noise Ratio (PSNR) and 30% higher geometric reconstruction accuracy) than Gaussian maps constructed by traditional exploration baselines. Experiment videos and more details can be found on our project page: https://tyuezhan.github.io/RT_GuIDE/", + "arxiv_url": "http://arxiv.org/abs/2409.18122v1", + "pdf_url": "http://arxiv.org/pdf/2409.18122v1", + "published_date": "2024-09-26", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Language-Embedded Gaussian Splats (LEGS): Incrementally Building Room-Scale Representations with a Mobile Robot", + "authors": [ + "Justin Yu", + "Kush Hari", + "Kishore Srinivas", + "Karim El-Refai", + "Adam Rashid", + "Chung Min Kim", + "Justin Kerr", + "Richard Cheng", + "Muhammad Zubair Irshad", + "Ashwin Balakrishna", + "Thomas Kollar", + "Ken Goldberg" + ], + "abstract": "Building semantic 3D maps is valuable for searching for objects of interest in offices, warehouses, stores, and homes. We present a mapping system that incrementally builds a Language-Embedded Gaussian Splat (LEGS): a detailed 3D scene representation that encodes both appearance and semantics in a unified representation. LEGS is trained online as a robot traverses its environment to enable localization of open-vocabulary object queries. We evaluate LEGS on 4 room-scale scenes where we query for objects in the scene to assess how LEGS can capture semantic meaning. We compare LEGS to LERF and find that while both systems have comparable object query success rates, LEGS trains over 3.5x faster than LERF. Results suggest that a multi-camera setup and incremental bundle adjustment can boost visual reconstruction quality in constrained robot trajectories, and suggest LEGS can localize open-vocabulary and long-tail object queries with up to 66% accuracy.", + "arxiv_url": "http://arxiv.org/abs/2409.18108v1", + "pdf_url": "http://arxiv.org/pdf/2409.18108v1", + "published_date": "2024-09-26", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians", + "authors": [ + "Dmytro Kotovenko", + "Olga Grebenkova", + "Nikolaos Sarafianos", + "Avinash Paliwal", + "Pingchuan Ma", + "Omid Poursaeed", + "Sreyas Mohan", + "Yuchen Fan", + "Yilei Li", + "Rakesh Ranjan", + "Björn Ommer" + ], + "abstract": "While style transfer techniques have been well-developed for 2D image stylization, the extension of these methods to 3D scenes remains relatively unexplored. Existing approaches demonstrate proficiency in transferring colors and textures but often struggle with replicating the geometry of the scenes. In our work, we leverage an explicit Gaussian Splatting (GS) representation and directly match the distributions of Gaussians between style and content scenes using the Earth Mover's Distance (EMD). By employing the entropy-regularized Wasserstein-2 distance, we ensure that the transformation maintains spatial smoothness. Additionally, we decompose the scene stylization problem into smaller chunks to enhance efficiency. This paradigm shift reframes stylization from a pure generative process driven by latent space losses to an explicit matching of distributions between two Gaussian representations. Our method achieves high-resolution 3D stylization by faithfully transferring details from 3D style scenes onto the content scene. Furthermore, WaSt-3D consistently delivers results across diverse content and style scenes without necessitating any training, as it relies solely on optimization-based techniques. See our project page for additional results and source code: $\\href{https://compvis.github.io/wast3d/}{https://compvis.github.io/wast3d/}$.", + "arxiv_url": "http://arxiv.org/abs/2409.17917v1", + "pdf_url": "http://arxiv.org/pdf/2409.17917v1", + "published_date": "2024-09-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HGS-Planner: Hierarchical Planning Framework for Active Scene Reconstruction Using 3D Gaussian Splatting", + "authors": [ + "Zijun Xu", + "Rui Jin", + "Ke Wu", + "Yi Zhao", + "Zhiwei Zhang", + "Jieru Zhao", + "Fei Gao", + "Zhongxue Gan", + "Wenchao Ding" + ], + "abstract": "In complex missions such as search and rescue,robots must make intelligent decisions in unknown environments, relying on their ability to perceive and understand their surroundings. High-quality and real-time reconstruction enhances situational awareness and is crucial for intelligent robotics. Traditional methods often struggle with poor scene representation or are too slow for real-time use. Inspired by the efficacy of 3D Gaussian Splatting (3DGS), we propose a hierarchical planning framework for fast and high-fidelity active reconstruction. Our method evaluates completion and quality gain to adaptively guide reconstruction, integrating global and local planning for efficiency. Experiments in simulated and real-world environments show our approach outperforms existing real-time methods.", + "arxiv_url": "http://arxiv.org/abs/2409.17624v2", + "pdf_url": "http://arxiv.org/pdf/2409.17624v2", + "published_date": "2024-09-26", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SeaSplat: Representing Underwater Scenes with 3D Gaussian Splatting and a Physically Grounded Image Formation Model", + "authors": [ + "Daniel Yang", + "John J. Leonard", + "Yogesh Girdhar" + ], + "abstract": "We introduce SeaSplat, a method to enable real-time rendering of underwater scenes leveraging recent advances in 3D radiance fields. Underwater scenes are challenging visual environments, as rendering through a medium such as water introduces both range and color dependent effects on image capture. We constrain 3D Gaussian Splatting (3DGS), a recent advance in radiance fields enabling rapid training and real-time rendering of full 3D scenes, with a physically grounded underwater image formation model. Applying SeaSplat to the real-world scenes from SeaThru-NeRF dataset, a scene collected by an underwater vehicle in the US Virgin Islands, and simulation-degraded real-world scenes, not only do we see increased quantitative performance on rendering novel viewpoints from the scene with the medium present, but are also able to recover the underlying true color of the scene and restore renders to be without the presence of the intervening medium. We show that the underwater image formation helps learn scene structure, with better depth maps, as well as show that our improvements maintain the significant computational improvements afforded by leveraging a 3D Gaussian representation.", + "arxiv_url": "http://arxiv.org/abs/2409.17345v1", + "pdf_url": "http://arxiv.org/pdf/2409.17345v1", + "published_date": "2024-09-25", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Disco4D: Disentangled 4D Human Generation and Animation from a Single Image", + "authors": [ + "Hui En Pang", + "Shuai Liu", + "Zhongang Cai", + "Lei Yang", + "Tianwei Zhang", + "Ziwei Liu" + ], + "abstract": "We present \\textbf{Disco4D}, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancing the generation details and flexibility. It has the following technical innovations. \\textbf{1)} Disco4D learns to efficiently fit the clothing Gaussians over the SMPL-X Gaussians. \\textbf{2)} It adopts diffusion models to enhance the 3D generation process, \\textit{e.g.}, modeling occluded parts not visible in the input image. \\textbf{3)} It learns an identity encoding for each clothing Gaussian to facilitate the separation and extraction of clothing assets. Furthermore, Disco4D naturally supports 4D human animation with vivid dynamics. Extensive experiments demonstrate the superiority of Disco4D on 4D human generation and animation tasks. Our visualizations can be found in \\url{https://disco-4d.github.io/}.", + "arxiv_url": "http://arxiv.org/abs/2409.17280v1", + "pdf_url": "http://arxiv.org/pdf/2409.17280v1", + "published_date": "2024-09-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion", + "authors": [ + "Yukun Huang", + "Jianan Wang", + "Ailing Zeng", + "Zheng-Jun Zha", + "Lei Zhang", + "Xihui Liu" + ], + "abstract": "Leveraging pretrained 2D diffusion models and score distillation sampling (SDS), recent methods have shown promising results for text-to-3D avatar generation. However, generating high-quality 3D avatars capable of expressive animation remains challenging. In this work, we present DreamWaltz-G, a novel learning framework for animatable 3D avatar generation from text. The core of this framework lies in Skeleton-guided Score Distillation and Hybrid 3D Gaussian Avatar representation. Specifically, the proposed skeleton-guided score distillation integrates skeleton controls from 3D human templates into 2D diffusion models, enhancing the consistency of SDS supervision in terms of view and human pose. This facilitates the generation of high-quality avatars, mitigating issues such as multiple faces, extra limbs, and blurring. The proposed hybrid 3D Gaussian avatar representation builds on the efficient 3D Gaussians, combining neural implicit fields and parameterized 3D meshes to enable real-time rendering, stable SDS optimization, and expressive animation. Extensive experiments demonstrate that DreamWaltz-G is highly effective in generating and animating 3D avatars, outperforming existing methods in both visual quality and animation expressiveness. Our framework further supports diverse applications, including human video reenactment and multi-subject scene composition.", + "arxiv_url": "http://arxiv.org/abs/2409.17145v1", + "pdf_url": "http://arxiv.org/pdf/2409.17145v1", + "published_date": "2024-09-25", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Go-SLAM: Grounded Object Segmentation and Localization with Gaussian Splatting SLAM", + "authors": [ + "Phu Pham", + "Dipam Patel", + "Damon Conover", + "Aniket Bera" + ], + "abstract": "We introduce Go-SLAM, a novel framework that utilizes 3D Gaussian Splatting SLAM to reconstruct dynamic environments while embedding object-level information within the scene representations. This framework employs advanced object segmentation techniques, assigning a unique identifier to each Gaussian splat that corresponds to the object it represents. Consequently, our system facilitates open-vocabulary querying, allowing users to locate objects using natural language descriptions. Furthermore, the framework features an optimal path generation module that calculates efficient navigation paths for robots toward queried objects, considering obstacles and environmental uncertainties. Comprehensive evaluations in various scene settings demonstrate the effectiveness of our approach in delivering high-fidelity scene reconstructions, precise object segmentation, flexible object querying, and efficient robot path planning. This work represents an additional step forward in bridging the gap between 3D scene reconstruction, semantic object understanding, and real-time environment interactions.", + "arxiv_url": "http://arxiv.org/abs/2409.16944v1", + "pdf_url": "http://arxiv.org/pdf/2409.16944v1", + "published_date": "2024-09-25", + "categories": [ + "cs.RO", + "cs.AI", + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model", + "authors": [ + "Hongliang Zhong", + "Can Wang", + "Jingbo Zhang", + "Jing Liao" + ], + "abstract": "Generating and inserting new objects into 3D content is a compelling approach for achieving versatile scene recreation. Existing methods, which rely on SDS optimization or single-view inpainting, often struggle to produce high-quality results. To address this, we propose a novel method for object insertion in 3D content represented by Gaussian Splatting. Our approach introduces a multi-view diffusion model, dubbed MVInpainter, which is built upon a pre-trained stable video diffusion model to facilitate view-consistent object inpainting. Within MVInpainter, we incorporate a ControlNet-based conditional injection module to enable controlled and more predictable multi-view generation. After generating the multi-view inpainted results, we further propose a mask-aware 3D reconstruction technique to refine Gaussian Splatting reconstruction from these sparse inpainted views. By leveraging these fabricate techniques, our approach yields diverse results, ensures view-consistent and harmonious insertions, and produces better object quality. Extensive experiments demonstrate that our approach outperforms existing methods.", + "arxiv_url": "http://arxiv.org/abs/2409.16938v1", + "pdf_url": "http://arxiv.org/pdf/2409.16938v1", + "published_date": "2024-09-25", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Let's Make a Splan: Risk-Aware Trajectory Optimization in a Normalized Gaussian Splat", + "authors": [ + "Jonathan Michaux", + "Seth Isaacson", + "Challen Enninful Adu", + "Adam Li", + "Rahul Kashyap Swayampakula", + "Parker Ewen", + "Sean Rice", + "Katherine A. Skinner", + "Ram Vasudevan" + ], + "abstract": "Neural Radiance Fields and Gaussian Splatting have transformed the field of computer vision by enabling photo-realistic representation of complex scenes. Despite this success, they have seen only limited use in real-world robotics tasks such as trajectory optimization. Two key factors have contributed to this limited success. First, it is challenging to reason about collisions in radiance models. Second, it is difficult to perform inference of radiance models fast enough for real-time trajectory synthesis. This paper addresses these challenges by proposing SPLANNING, a risk-aware trajectory optimizer that operates in a Gaussian Splatting model. This paper first derives a method for rigorously upper-bounding the probability of collision between a robot and a radiance field. Second, this paper introduces a normalized reformulation of Gaussian Splatting that enables the efficient computation of the collision bound in a Gaussian Splat. Third, a method is presented to optimize trajectories while avoiding collisions with a scene represented by a Gaussian Splat. Experiments demonstrate that SPLANNING outperforms state-of-the-art methods in generating collision-free trajectories in highly cluttered environments. The proposed system is also tested on a real-world robot manipulator. A project page is available at https://roahmlab.github.io/splanning.", + "arxiv_url": "http://arxiv.org/abs/2409.16915v1", + "pdf_url": "http://arxiv.org/pdf/2409.16915v1", + "published_date": "2024-09-25", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Towards Unified 3D Hair Reconstruction from Single-View Portraits", + "authors": [ + "Yujian Zheng", + "Yuda Qiu", + "Leyang Jin", + "Chongyang Ma", + "Haibin Huang", + "Di Zhang", + "Pengfei Wan", + "Xiaoguang Han" + ], + "abstract": "Single-view 3D hair reconstruction is challenging, due to the wide range of shape variations among diverse hairstyles. Current state-of-the-art methods are specialized in recovering un-braided 3D hairs and often take braided styles as their failure cases, because of the inherent difficulty to define priors for complex hairstyles, whether rule-based or data-based. We propose a novel strategy to enable single-view 3D reconstruction for a variety of hair types via a unified pipeline. To achieve this, we first collect a large-scale synthetic multi-view hair dataset SynMvHair with diverse 3D hair in both braided and un-braided styles, and learn two diffusion priors specialized on hair. Then we optimize 3D Gaussian-based hair from the priors with two specially designed modules, i.e. view-wise and pixel-wise Gaussian refinement. Our experiments demonstrate that reconstructing braided and un-braided 3D hair from single-view images via a unified approach is possible and our method achieves the state-of-the-art performance in recovering complex hairstyles. It is worth to mention that our method shows good generalization ability to real images, although it learns hair priors from synthetic data.", + "arxiv_url": "http://arxiv.org/abs/2409.16863v1", + "pdf_url": "http://arxiv.org/pdf/2409.16863v1", + "published_date": "2024-09-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization", + "authors": [ + "Gennady Sidorov", + "Malik Mohrat", + "Ksenia Lebedeva", + "Ruslan Rakhimov", + "Sergey Kolyubin" + ], + "abstract": "Although various visual localization approaches exist, such as scene coordinate and pose regression, these methods often struggle with high memory consumption or extensive optimization requirements. To address these challenges, we utilize recent advancements in novel view synthesis, particularly 3D Gaussian Splatting (3DGS), to enhance localization. 3DGS allows for the compact encoding of both 3D geometry and scene appearance with its spatial features. Our method leverages the dense description maps produced by XFeat's lightweight keypoint detection and description model. We propose distilling these dense keypoint descriptors into 3DGS to improve the model's spatial understanding, leading to more accurate camera pose predictions through 2D-3D correspondences. After estimating an initial pose, we refine it using a photometric warping loss. Benchmarking on popular indoor and outdoor datasets shows that our approach surpasses state-of-the-art Neural Render Pose (NRP) methods, including NeRFMatch and PNeRFLoc.", + "arxiv_url": "http://arxiv.org/abs/2409.16502v1", + "pdf_url": "http://arxiv.org/pdf/2409.16502v1", + "published_date": "2024-09-24", + "categories": [ + "cs.CV", + "cs.AI", + "cs.LG", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Frequency-based View Selection in Gaussian Splatting Reconstruction", + "authors": [ + "Monica M. Q. Li", + "Pierre-Yves Lajoie", + "Giovanni Beltrame" + ], + "abstract": "Three-dimensional reconstruction is a fundamental problem in robotics perception. We examine the problem of active view selection to perform 3D Gaussian Splatting reconstructions with as few input images as possible. Although 3D Gaussian Splatting has made significant progress in image rendering and 3D reconstruction, the quality of the reconstruction is strongly impacted by the selection of 2D images and the estimation of camera poses through Structure-from-Motion (SfM) algorithms. Current methods to select views that rely on uncertainties from occlusions, depth ambiguities, or neural network predictions directly are insufficient to handle the issue and struggle to generalize to new scenes. By ranking the potential views in the frequency domain, we are able to effectively estimate the potential information gain of new viewpoints without ground truth data. By overcoming current constraints on model architecture and efficacy, our method achieves state-of-the-art results in view selection, demonstrating its potential for efficient image-based 3D reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2409.16470v1", + "pdf_url": "http://arxiv.org/pdf/2409.16470v1", + "published_date": "2024-09-24", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AIR-Embodied: An Efficient Active 3DGS-based Interaction and Reconstruction Framework with Embodied Large Language Model", + "authors": [ + "Zhenghao Qi", + "Shenghai Yuan", + "Fen Liu", + "Haozhi Cao", + "Tianchen Deng", + "Jianfei Yang", + "Lihua Xie" + ], + "abstract": "Recent advancements in 3D reconstruction and neural rendering have enhanced the creation of high-quality digital assets, yet existing methods struggle to generalize across varying object shapes, textures, and occlusions. While Next Best View (NBV) planning and Learning-based approaches offer solutions, they are often limited by predefined criteria and fail to manage occlusions with human-like common sense. To address these problems, we present AIR-Embodied, a novel framework that integrates embodied AI agents with large-scale pretrained multi-modal language models to improve active 3DGS reconstruction. AIR-Embodied utilizes a three-stage process: understanding the current reconstruction state via multi-modal prompts, planning tasks with viewpoint selection and interactive actions, and employing closed-loop reasoning to ensure accurate execution. The agent dynamically refines its actions based on discrepancies between the planned and actual outcomes. Experimental evaluations across virtual and real-world environments demonstrate that AIR-Embodied significantly enhances reconstruction efficiency and quality, providing a robust solution to challenges in active 3D reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2409.16019v1", + "pdf_url": "http://arxiv.org/pdf/2409.16019v1", + "published_date": "2024-09-24", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "neural rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Semantics-Controlled Gaussian Splatting for Outdoor Scene Reconstruction and Rendering in Virtual Reality", + "authors": [ + "Hannah Schieber", + "Jacob Young", + "Tobias Langlotz", + "Stefanie Zollmann", + "Daniel Roth" + ], + "abstract": "Advancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancement or to incorporate 3D assets, segmenting Gaussians by class is essential. Existing segmentation approaches are typically limited to certain types of scenes, e.g., ''circular'' scenes, to determine clear object boundaries. However, this method is ineffective when removing large objects in non-''circling'' scenes such as large outdoor scenes. We propose Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments. SCGS allows scene editing and the extraction of scene parts for VR. Additionally, we introduce a challenging outdoor dataset, overcoming the ''circling'' setup. We outperform the state-of-the-art in visual quality on our dataset and in segmentation quality on the 3D-OVS dataset. We conducted an exploratory user study, comparing a 360-video, plain GS, and SCGS in VR with a fixed viewpoint. In our subsequent main study, users were allowed to move freely, evaluating plain GS and SCGS. Our main study results show that participants clearly prefer SCGS over plain GS. We overall present an innovative approach that surpasses the state-of-the-art both technically and in user experience.", + "arxiv_url": "http://arxiv.org/abs/2409.15959v1", + "pdf_url": "http://arxiv.org/pdf/2409.15959v1", + "published_date": "2024-09-24", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Plenoptic PNG: Real-Time Neural Radiance Fields in 150 KB", + "authors": [ + "Jae Yong Lee", + "Yuqun Wu", + "Chuhang Zou", + "Derek Hoiem", + "Shenlong Wang" + ], + "abstract": "The goal of this paper is to encode a 3D scene into an extremely compact representation from 2D images and to enable its transmittance, decoding and rendering in real-time across various platforms. Despite the progress in NeRFs and Gaussian Splats, their large model size and specialized renderers make it challenging to distribute free-viewpoint 3D content as easily as images. To address this, we have designed a novel 3D representation that encodes the plenoptic function into sinusoidal function indexed dense volumes. This approach facilitates feature sharing across different locations, improving compactness over traditional spatial voxels. The memory footprint of the dense 3D feature grid can be further reduced using spatial decomposition techniques. This design combines the strengths of spatial hashing functions and voxel decomposition, resulting in a model size as small as 150 KB for each 3D scene. Moreover, PPNG features a lightweight rendering pipeline with only 300 lines of code that decodes its representation into standard GL textures and fragment shaders. This enables real-time rendering using the traditional GL pipeline, ensuring universal compatibility and efficiency across various platforms without additional dependencies.", + "arxiv_url": "http://arxiv.org/abs/2409.15689v1", + "pdf_url": "http://arxiv.org/pdf/2409.15689v1", + "published_date": "2024-09-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpikeGS: Learning 3D Gaussian Fields from Continuous Spike Stream", + "authors": [ + "Jinze Yu", + "Xin Peng", + "Zhengda Lu", + "Laurent Kneip", + "Yiqun Wang" + ], + "abstract": "A spike camera is a specialized high-speed visual sensor that offers advantages such as high temporal resolution and high dynamic range compared to conventional frame cameras. These features provide the camera with significant advantages in many computer vision tasks. However, the tasks of novel view synthesis based on spike cameras remain underdeveloped. Although there are existing methods for learning neural radiance fields from spike stream, they either lack robustness in extremely noisy, low-quality lighting conditions or suffer from high computational complexity due to the deep fully connected neural networks and ray marching rendering strategies used in neural radiance fields, making it difficult to recover fine texture details. In contrast, the latest advancements in 3DGS have achieved high-quality real-time rendering by optimizing the point cloud representation into Gaussian ellipsoids. Building on this, we introduce SpikeGS, the method to learn 3D Gaussian fields solely from spike stream. We designed a differentiable spike stream rendering framework based on 3DGS, incorporating noise embedding and spiking neurons. By leveraging the multi-view consistency of 3DGS and the tile-based multi-threaded parallel rendering mechanism, we achieved high-quality real-time rendering results. Additionally, we introduced a spike rendering loss function that generalizes under varying illumination conditions. Our method can reconstruct view synthesis results with fine texture details from a continuous spike stream captured by a moving spike camera, while demonstrating high robustness in extremely noisy low-light scenarios. Experimental results on both real and synthetic datasets demonstrate that our method surpasses existing approaches in terms of rendering quality and speed. Our code will be available at https://github.com/520jz/SpikeGS.", + "arxiv_url": "http://arxiv.org/abs/2409.15176v5", + "pdf_url": "http://arxiv.org/pdf/2409.15176v5", + "published_date": "2024-09-23", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/520jz/SpikeGS", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TextToon: Real-Time Text Toonify Head Avatar from Single Video", + "authors": [ + "Luchuan Song", + "Lele Chen", + "Celong Liu", + "Pinxin Liu", + "Chenliang Xu" + ], + "abstract": "We propose TextToon, a method to generate a drivable toonified avatar. Given a short monocular video sequence and a written instruction about the avatar style, our model can generate a high-fidelity toonified avatar that can be driven in real-time by another video with arbitrary identities. Existing related works heavily rely on multi-view modeling to recover geometry via texture embeddings, presented in a static manner, leading to control limitations. The multi-view video input also makes it difficult to deploy these models in real-world applications. To address these issues, we adopt a conditional embedding Tri-plane to learn realistic and stylized facial representations in a Gaussian deformation field. Additionally, we expand the stylization capabilities of 3D Gaussian Splatting by introducing an adaptive pixel-translation neural network and leveraging patch-aware contrastive learning to achieve high-quality images. To push our work into consumer applications, we develop a real-time system that can operate at 48 FPS on a GPU machine and 15-18 FPS on a mobile machine. Extensive experiments demonstrate the efficacy of our approach in generating textual avatars over existing methods in terms of quality and real-time animation. Please refer to our project page for more details: https://songluchuan.github.io/TextToon/.", + "arxiv_url": "http://arxiv.org/abs/2410.07160v1", + "pdf_url": "http://arxiv.org/pdf/2410.07160v1", + "published_date": "2024-09-23", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Human Hair Reconstruction with Strand-Aligned 3D Gaussians", + "authors": [ + "Egor Zakharov", + "Vanessa Sklyarova", + "Michael Black", + "Giljoo Nam", + "Justus Thies", + "Otmar Hilliges" + ], + "abstract": "We introduce a new hair modeling method that uses a dual representation of classical hair strands and 3D Gaussians to produce accurate and realistic strand-based reconstructions from multi-view data. In contrast to recent approaches that leverage unstructured Gaussians to model human avatars, our method reconstructs the hair using 3D polylines, or strands. This fundamental difference allows the use of the resulting hairstyles out-of-the-box in modern computer graphics engines for editing, rendering, and simulation. Our 3D lifting method relies on unstructured Gaussians to generate multi-view ground truth data to supervise the fitting of hair strands. The hairstyle itself is represented in the form of the so-called strand-aligned 3D Gaussians. This representation allows us to combine strand-based hair priors, which are essential for realistic modeling of the inner structure of hairstyles, with the differentiable rendering capabilities of 3D Gaussian Splatting. Our method, named Gaussian Haircut, is evaluated on synthetic and real scenes and demonstrates state-of-the-art performance in the task of strand-based hair reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2409.14778v1", + "pdf_url": "http://arxiv.org/pdf/2409.14778v1", + "published_date": "2024-09-23", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Deja-vu: Creating Controllable 3D Gaussian Head-Avatars with Enhanced Generalization and Personalization Abilities", + "authors": [ + "Peizhi Yan", + "Rabab Ward", + "Qiang Tang", + "Shan Du" + ], + "abstract": "Recent advancements in 3D Gaussian Splatting (3DGS) have unlocked significant potential for modeling 3D head avatars, providing greater flexibility than mesh-based methods and more efficient rendering compared to NeRF-based approaches. Despite these advancements, the creation of controllable 3DGS-based head avatars remains time-intensive, often requiring tens of minutes to hours. To expedite this process, we here introduce the \"Gaussian Deja-vu\" framework, which first obtains a generalized model of the head avatar and then personalizes the result. The generalized model is trained on large 2D (synthetic and real) image datasets. This model provides a well-initialized 3D Gaussian head that is further refined using a monocular video to achieve the personalized head avatar. For personalizing, we propose learnable expression-aware rectification blendmaps to correct the initial 3D Gaussians, ensuring rapid convergence without the reliance on neural networks. Experiments demonstrate that the proposed method meets its objectives. It outperforms state-of-the-art 3D Gaussian head avatars in terms of photorealistic quality as well as reduces training time consumption to at least a quarter of the existing methods, producing the avatar in minutes.", + "arxiv_url": "http://arxiv.org/abs/2409.16147v3", + "pdf_url": "http://arxiv.org/pdf/2409.16147v3", + "published_date": "2024-09-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views", + "authors": [ + "Wangze Xu", + "Huachen Gao", + "Shihe Shen", + "Rui Peng", + "Jianbo Jiao", + "Ronggang Wang" + ], + "abstract": "Recently, the Neural Radiance Field (NeRF) advancement has facilitated few-shot Novel View Synthesis (NVS), which is a significant challenge in 3D vision applications. Despite numerous attempts to reduce the dense input requirement in NeRF, it still suffers from time-consumed training and rendering processes. More recently, 3D Gaussian Splatting (3DGS) achieves real-time high-quality rendering with an explicit point-based representation. However, similar to NeRF, it tends to overfit the train views for lack of constraints. In this paper, we propose \\textbf{MVPGS}, a few-shot NVS method that excavates the multi-view priors based on 3D Gaussian Splatting. We leverage the recent learning-based Multi-view Stereo (MVS) to enhance the quality of geometric initialization for 3DGS. To mitigate overfitting, we propose a forward-warping method for additional appearance constraints conforming to scenes based on the computed geometry. Furthermore, we introduce a view-consistent geometry constraint for Gaussian parameters to facilitate proper optimization convergence and utilize a monocular depth regularization as compensation. Experiments show that the proposed method achieves state-of-the-art performance with real-time rendering speed. Project page: https://zezeaaa.github.io/projects/MVPGS/", + "arxiv_url": "http://arxiv.org/abs/2409.14316v1", + "pdf_url": "http://arxiv.org/pdf/2409.14316v1", + "published_date": "2024-09-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatLoc: 3D Gaussian Splatting-based Visual Localization for Augmented Reality", + "authors": [ + "Hongjia Zhai", + "Xiyu Zhang", + "Boming Zhao", + "Hai Li", + "Yijia He", + "Zhaopeng Cui", + "Hujun Bao", + "Guofeng Zhang" + ], + "abstract": "Visual localization plays an important role in the applications of Augmented Reality (AR), which enable AR devices to obtain their 6-DoF pose in the pre-build map in order to render virtual content in real scenes. However, most existing approaches can not perform novel view rendering and require large storage capacities for maps. To overcome these limitations, we propose an efficient visual localization method capable of high-quality rendering with fewer parameters. Specifically, our approach leverages 3D Gaussian primitives as the scene representation. To ensure precise 2D-3D correspondences for pose estimation, we develop an unbiased 3D scene-specific descriptor decoder for Gaussian primitives, distilled from a constructed feature volume. Additionally, we introduce a salient 3D landmark selection algorithm that selects a suitable primitive subset based on the saliency score for localization. We further regularize key Gaussian primitives to prevent anisotropic effects, which also improves localization performance. Extensive experiments on two widely used datasets demonstrate that our method achieves superior or comparable rendering and localization performance to state-of-the-art implicit-based visual localization approaches. Project page: \\href{https://zju3dv.github.io/splatloc}{https://zju3dv.github.io/splatloc}.", + "arxiv_url": "http://arxiv.org/abs/2409.14067v1", + "pdf_url": "http://arxiv.org/pdf/2409.14067v1", + "published_date": "2024-09-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians", + "authors": [ + "Penghao Wang", + "Zhirui Zhang", + "Liao Wang", + "Kaixin Yao", + "Siyuan Xie", + "Jingyi Yu", + "Minye Wu", + "Lan Xu" + ], + "abstract": "Experiencing high-fidelity volumetric video as seamlessly as 2D videos is a long-held dream. However, current dynamic 3DGS methods, despite their high rendering quality, face challenges in streaming on mobile devices due to computational and bandwidth constraints. In this paper, we introduce V^3 (Viewing Volumetric Videos), a novel approach that enables high-quality mobile rendering through the streaming of dynamic Gaussians. Our key innovation is to view dynamic 3DGS as 2D videos, facilitating the use of hardware video codecs. Additionally, we propose a two-stage training strategy to reduce storage requirements with rapid training speed. The first stage employs hash encoding and shallow MLP to learn motion, then reduces the number of Gaussians through pruning to meet the streaming requirements, while the second stage fine tunes other Gaussian attributes using residual entropy loss and temporal loss to improve temporal continuity. This strategy, which disentangles motion and appearance, maintains high rendering quality with compact storage requirements. Meanwhile, we designed a multi-platform player to decode and render 2D Gaussian videos. Extensive experiments demonstrate the effectiveness of V^3, outperforming other methods by enabling high-quality rendering and streaming on common devices, which is unseen before. As the first to stream dynamic Gaussians on mobile devices, our companion player offers users an unprecedented volumetric video experience, including smooth scrolling and instant sharing. Our project page with source code is available at https://authoritywang.github.io/v3/.", + "arxiv_url": "http://arxiv.org/abs/2409.13648v2", + "pdf_url": "http://arxiv.org/pdf/2409.13648v2", + "published_date": "2024-09-20", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Portrait Video Editing Empowered by Multimodal Generative Priors", + "authors": [ + "Xuan Gao", + "Haiyao Xiao", + "Chenglai Zhong", + "Shimin Hu", + "Yudong Guo", + "Juyong Zhang" + ], + "abstract": "We introduce PortraitGen, a powerful portrait video editing method that achieves consistent and expressive stylization with multimodal prompts. Traditional portrait video editing methods often struggle with 3D and temporal consistency, and typically lack in rendering quality and efficiency. To address these issues, we lift the portrait video frames to a unified dynamic 3D Gaussian field, which ensures structural and temporal coherence across frames. Furthermore, we design a novel Neural Gaussian Texture mechanism that not only enables sophisticated style editing but also achieves rendering speed over 100FPS. Our approach incorporates multimodal inputs through knowledge distilled from large-scale 2D generative models. Our system also incorporates expression similarity guidance and a face-aware portrait editing module, effectively mitigating degradation issues associated with iterative dataset updates. Extensive experiments demonstrate the temporal consistency, editing efficiency, and superior rendering quality of our method. The broad applicability of the proposed approach is demonstrated through various applications, including text-driven editing, image-driven editing, and relighting, highlighting its great potential to advance the field of video editing. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/PortraitGen/", + "arxiv_url": "http://arxiv.org/abs/2409.13591v1", + "pdf_url": "http://arxiv.org/pdf/2409.13591v1", + "published_date": "2024-09-20", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors", + "authors": [ + "Zixin Zhang", + "Kanghao Chen", + "Lin Wang" + ], + "abstract": "Event cameras are bio-inspired sensors that output asynchronous and sparse event streams, instead of fixed frames. Benefiting from their distinct advantages, such as high dynamic range and high temporal resolution, event cameras have been applied to address 3D reconstruction, important for robotic mapping. Recently, neural rendering techniques, such as 3D Gaussian splatting (3DGS), have been shown successful in 3D reconstruction. However, it still remains under-explored how to develop an effective event-based 3DGS pipeline. In particular, as 3DGS typically depends on high-quality initialization and dense multiview constraints, a potential problem appears for the 3DGS optimization with events given its inherent sparse property. To this end, we propose a novel event-based 3DGS framework, named Elite-EvGS. Our key idea is to distill the prior knowledge from the off-the-shelf event-to-video (E2V) models to effectively reconstruct 3D scenes from events in a coarse-to-fine optimization manner. Specifically, to address the complexity of 3DGS initialization from events, we introduce a novel warm-up initialization strategy that optimizes a coarse 3DGS from the frames generated by E2V models and then incorporates events to refine the details. Then, we propose a progressive event supervision strategy that employs the window-slicing operation to progressively reduce the number of events used for supervision. This subtly relives the temporal randomness of the event frames, benefiting the optimization of local textural and global structural details. Experiments on the benchmark datasets demonstrate that Elite-EvGS can reconstruct 3D scenes with better textural and structural details. Meanwhile, our method yields plausible performance on the captured real-world data, including diverse challenging conditions, such as fast motion and low light scenes.", + "arxiv_url": "http://arxiv.org/abs/2409.13392v1", + "pdf_url": "http://arxiv.org/pdf/2409.13392v1", + "published_date": "2024-09-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D-GSW: 3D Gaussian Splatting for Robust Watermarking", + "authors": [ + "Youngdong Jang", + "Hyunje Park", + "Feng Yang", + "Heeju Ko", + "Euijin Choo", + "Sangpil Kim" + ], + "abstract": "As 3D Gaussian Splatting (3D-GS) gains significant attention and its commercial usage increases, the need for watermarking technologies to prevent unauthorized use of the 3D-GS models and rendered images has become increasingly important. In this paper, we introduce a robust watermarking method for 3D-GS that secures ownership of both the model and its rendered images. Our proposed method remains robust against distortions in rendered images and model attacks while maintaining high rendering quality. To achieve these objectives, we present Frequency-Guided Densification (FGD), which removes 3D Gaussians based on their contribution to rendering quality, enhancing real-time rendering and the robustness of the message. FGD utilizes Discrete Fourier Transform to split 3D Gaussians in high-frequency areas, improving rendering quality. Furthermore, we employ a gradient mask for 3D Gaussians and design a wavelet-subband loss to enhance rendering quality. Our experiments show that our method embeds the message in the rendered images invisibly and robustly against various attacks, including model distortion. Our method achieves state-of-the-art performance.", + "arxiv_url": "http://arxiv.org/abs/2409.13222v2", + "pdf_url": "http://arxiv.org/pdf/2409.13222v2", + "published_date": "2024-09-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MGSO: Monocular Real-time Photometric SLAM with Efficient 3D Gaussian Splatting", + "authors": [ + "Yan Song Hu", + "Nicolas Abboud", + "Muhammad Qasim Ali", + "Adam Srebrnjak Yang", + "Imad Elhajj", + "Daniel Asmar", + "Yuhao Chen", + "John S. Zelek" + ], + "abstract": "Real-time SLAM with dense 3D mapping is computationally challenging, especially on resource-limited devices. The recent development of 3D Gaussian Splatting (3DGS) offers a promising approach for real-time dense 3D reconstruction. However, existing 3DGS-based SLAM systems struggle to balance hardware simplicity, speed, and map quality. Most systems excel in one or two of the aforementioned aspects but rarely achieve all. A key issue is the difficulty of initializing 3D Gaussians while concurrently conducting SLAM. To address these challenges, we present Monocular GSO (MGSO), a novel real-time SLAM system that integrates photometric SLAM with 3DGS. Photometric SLAM provides dense structured point clouds for 3DGS initialization, accelerating optimization and producing more efficient maps with fewer Gaussians. As a result, experiments show that our system generates reconstructions with a balance of quality, memory efficiency, and speed that outperforms the state-of-the-art. Furthermore, our system achieves all results using RGB inputs. We evaluate the Replica, TUM-RGBD, and EuRoC datasets against current live dense reconstruction systems. Not only do we surpass contemporary systems, but experiments also show that we maintain our performance on laptop hardware, making it a practical solution for robotics, A/R, and other real-time applications.", + "arxiv_url": "http://arxiv.org/abs/2409.13055v1", + "pdf_url": "http://arxiv.org/pdf/2409.13055v1", + "published_date": "2024-09-19", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GStex: Per-Primitive Texturing of 2D Gaussian Splatting for Decoupled Appearance and Geometry Modeling", + "authors": [ + "Victor Rong", + "Jingxiang Chen", + "Sherwin Bahmani", + "Kiriakos N. Kutulakos", + "David B. Lindell" + ], + "abstract": "Gaussian splatting has demonstrated excellent performance for view synthesis and scene reconstruction. The representation achieves photorealistic quality by optimizing the position, scale, color, and opacity of thousands to millions of 2D or 3D Gaussian primitives within a scene. However, since each Gaussian primitive encodes both appearance and geometry, these attributes are strongly coupled--thus, high-fidelity appearance modeling requires a large number of Gaussian primitives, even when the scene geometry is simple (e.g., for a textured planar surface). We propose to texture each 2D Gaussian primitive so that even a single Gaussian can be used to capture appearance details. By employing per-primitive texturing, our appearance representation is agnostic to the topology and complexity of the scene's geometry. We show that our approach, GStex, yields improved visual quality over prior work in texturing Gaussian splats. Furthermore, we demonstrate that our decoupling enables improved novel view synthesis performance compared to 2D Gaussian splatting when reducing the number of Gaussian primitives, and that GStex can be used for scene appearance editing and re-texturing.", + "arxiv_url": "http://arxiv.org/abs/2409.12954v2", + "pdf_url": "http://arxiv.org/pdf/2409.12954v2", + "published_date": "2024-09-19", + "categories": [ + "cs.CV", + "cs.GR", + "I.3; I.4" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LI-GS: Gaussian Splatting with LiDAR Incorporated for Accurate Large-Scale Reconstruction", + "authors": [ + "Changjian Jiang", + "Ruilan Gao", + "Kele Shao", + "Yue Wang", + "Rong Xiong", + "Yu Zhang" + ], + "abstract": "Large-scale 3D reconstruction is critical in the field of robotics, and the potential of 3D Gaussian Splatting (3DGS) for achieving accurate object-level reconstruction has been demonstrated. However, ensuring geometric accuracy in outdoor and unbounded scenes remains a significant challenge. This study introduces LI-GS, a reconstruction system that incorporates LiDAR and Gaussian Splatting to enhance geometric accuracy in large-scale scenes. 2D Gaussain surfels are employed as the map representation to enhance surface alignment. Additionally, a novel modeling method is proposed to convert LiDAR point clouds to plane-constrained multimodal Gaussian Mixture Models (GMMs). The GMMs are utilized during both initialization and optimization stages to ensure sufficient and continuous supervision over the entire scene while mitigating the risk of over-fitting. Furthermore, GMMs are employed in mesh extraction to eliminate artifacts and improve the overall geometric quality. Experiments demonstrate that our method outperforms state-of-the-art methods in large-scale 3D reconstruction, achieving higher accuracy compared to both LiDAR-based methods and Gaussian-based methods with improvements of 52.6% and 68.7%, respectively.", + "arxiv_url": "http://arxiv.org/abs/2409.12899v1", + "pdf_url": "http://arxiv.org/pdf/2409.12899v1", + "published_date": "2024-09-19", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt", + "authors": [ + "Lukas Höllein", + "Aljaž Božič", + "Michael Zollhöfer", + "Matthias Nießner" + ], + "abstract": "We present 3DGS-LM, a new method that accelerates the reconstruction of 3D Gaussian Splatting (3DGS) by replacing its ADAM optimizer with a tailored Levenberg-Marquardt (LM). Existing methods reduce the optimization time by decreasing the number of Gaussians or by improving the implementation of the differentiable rasterizer. However, they still rely on the ADAM optimizer to fit Gaussian parameters of a scene in thousands of iterations, which can take up to an hour. To this end, we change the optimizer to LM that runs in conjunction with the 3DGS differentiable rasterizer. For efficient GPU parallization, we propose a caching data structure for intermediate gradients that allows us to efficiently calculate Jacobian-vector products in custom CUDA kernels. In every LM iteration, we calculate update directions from multiple image subsets using these kernels and combine them in a weighted mean. Overall, our method is 30% faster than the original 3DGS while obtaining the same reconstruction quality. Our optimization is also agnostic to other methods that acclerate 3DGS, thus enabling even faster speedups compared to vanilla 3DGS.", + "arxiv_url": "http://arxiv.org/abs/2409.12892v1", + "pdf_url": "http://arxiv.org/pdf/2409.12892v1", + "published_date": "2024-09-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EdgeGaussians -- 3D Edge Mapping via Gaussian Splatting", + "authors": [ + "Kunal Chelani", + "Assia Benbihi", + "Torsten Sattler", + "Fredrik Kahl" + ], + "abstract": "With their meaningful geometry and their omnipresence in the 3D world, edges are extremely useful primitives in computer vision. 3D edges comprise of lines and curves, and methods to reconstruct them use either multi-view images or point clouds as input. State-of-the-art image-based methods first learn a 3D edge point cloud then fit 3D edges to it. The edge point cloud is obtained by learning a 3D neural implicit edge field from which the 3D edge points are sampled on a specific level set (0 or 1). However, such methods present two important drawbacks: i) it is not realistic to sample points on exact level sets due to float imprecision and training inaccuracies. Instead, they are sampled within a range of levels so the points do not lie accurately on the 3D edges and require further processing. ii) Such implicit representations are computationally expensive and require long training times. In this paper, we address these two limitations and propose a 3D edge mapping that is simpler, more efficient, and preserves accuracy. Our method learns explicitly the 3D edge points and their edge direction hence bypassing the need for point sampling. It casts a 3D edge point as the center of a 3D Gaussian and the edge direction as the principal axis of the Gaussian. Such a representation has the advantage of being not only geometrically meaningful but also compatible with the efficient training optimization defined in Gaussian Splatting. Results show that the proposed method produces edges as accurate and complete as the state-of-the-art while being an order of magnitude faster. Code is released at https://github.com/kunalchelani/EdgeGaussians.", + "arxiv_url": "http://arxiv.org/abs/2409.12886v1", + "pdf_url": "http://arxiv.org/pdf/2409.12886v1", + "published_date": "2024-09-19", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/kunalchelani/EdgeGaussians", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaRField++: Reinforced Gaussian Radiance Fields for Large-Scale 3D Scene Reconstruction", + "authors": [ + "Hanyue Zhang", + "Zhiliu Yang", + "Xinhe Zuo", + "Yuxin Tong", + "Ying Long", + "Chen Liu" + ], + "abstract": "This paper proposes a novel framework for large-scale scene reconstruction based on 3D Gaussian splatting (3DGS) and aims to address the scalability and accuracy challenges faced by existing methods. For tackling the scalability issue, we split the large scene into multiple cells, and the candidate point-cloud and camera views of each cell are correlated through a visibility-based camera selection and a progressive point-cloud extension. To reinforce the rendering quality, three highlighted improvements are made in comparison with vanilla 3DGS, which are a strategy of the ray-Gaussian intersection and the novel Gaussians density control for learning efficiency, an appearance decoupling module based on ConvKAN network to solve uneven lighting conditions in large-scale scenes, and a refined final loss with the color loss, the depth distortion loss, and the normal consistency loss. Finally, the seamless stitching procedure is executed to merge the individual Gaussian radiance field for novel view synthesis across different cells. Evaluation of Mill19, Urban3D, and MatrixCity datasets shows that our method consistently generates more high-fidelity rendering results than state-of-the-art methods of large-scale scene reconstruction. We further validate the generalizability of the proposed approach by rendering on self-collected video clips recorded by a commercial drone.", + "arxiv_url": "http://arxiv.org/abs/2409.12774v3", + "pdf_url": "http://arxiv.org/pdf/2409.12774v3", + "published_date": "2024-09-19", + "categories": [ + "cs.CV", + "cs.AI", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Spectral-GS: Taming 3D Gaussian Splatting with Spectral Entropy", + "authors": [ + "Letian Huang", + "Jie Guo", + "Jialin Dan", + "Ruoyu Fu", + "Shujie Wang", + "Yuanqi Li", + "Yanwen Guo" + ], + "abstract": "Recently, 3D Gaussian Splatting (3D-GS) has achieved impressive results in novel view synthesis, demonstrating high fidelity and efficiency. However, it easily exhibits needle-like artifacts, especially when increasing the sampling rate. Mip-Splatting tries to remove these artifacts with a 3D smoothing filter for frequency constraints and a 2D Mip filter for approximated supersampling. Unfortunately, it tends to produce over-blurred results, and sometimes needle-like Gaussians still persist. Our spectral analysis of the covariance matrix during optimization and densification reveals that current 3D-GS lacks shape awareness, relying instead on spectral radius and view positional gradients to determine splitting. As a result, needle-like Gaussians with small positional gradients and low spectral entropy fail to split and overfit high-frequency details. Furthermore, both the filters used in 3D-GS and Mip-Splatting reduce the spectral entropy and increase the condition number during zooming in to synthesize novel view, causing view inconsistencies and more pronounced artifacts. Our Spectral-GS, based on spectral analysis, introduces 3D shape-aware splitting and 2D view-consistent filtering strategies, effectively addressing these issues, enhancing 3D-GS's capability to represent high-frequency details without noticeable artifacts, and achieving high-quality photorealistic rendering.", + "arxiv_url": "http://arxiv.org/abs/2409.12771v2", + "pdf_url": "http://arxiv.org/pdf/2409.12771v2", + "published_date": "2024-09-19", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input", + "authors": [ + "Qijian Tian", + "Xin Tan", + "Yuan Xie", + "Lizhuang Ma" + ], + "abstract": "We propose DrivingForward, a feed-forward Gaussian Splatting model that reconstructs driving scenes from flexible surround-view input. Driving scene images from vehicle-mounted cameras are typically sparse, with limited overlap, and the movement of the vehicle further complicates the acquisition of camera extrinsics. To tackle these challenges and achieve real-time reconstruction, we jointly train a pose network, a depth network, and a Gaussian network to predict the Gaussian primitives that represent the driving scenes. The pose network and depth network determine the position of the Gaussian primitives in a self-supervised manner, without using depth ground truth and camera extrinsics during training. The Gaussian network independently predicts primitive parameters from each input image, including covariance, opacity, and spherical harmonics coefficients. At the inference stage, our model can achieve feed-forward reconstruction from flexible multi-frame surround-view input. Experiments on the nuScenes dataset show that our model outperforms existing state-of-the-art feed-forward and scene-optimized reconstruction methods in terms of reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2409.12753v1", + "pdf_url": "http://arxiv.org/pdf/2409.12753v1", + "published_date": "2024-09-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CrossRT: A cross platform programming technology for hardware-accelerated ray tracing in CG and CV applications", + "authors": [ + "Vladimir Frolov", + "Vadim Sanzharov", + "Garifullin Albert", + "Maxim Raenchuk", + "Alexei Voloboy" + ], + "abstract": "We propose a programming technology that bridges cross-platform compatibility and hardware acceleration in ray tracing applications. Our methodology enables developers to define algorithms while our translator manages implementation specifics for different hardware or APIs. Features include: generating hardware-accelerated code from hardware-agnostic, object-oriented C++ algorithm descriptions; enabling users to define software fallbacks for non-hardware-accelerated CPUs and GPUs; producing GPU programming API-based algorithm implementations resembling manually ported C++ versions. The generated code is editable and readable, allowing for additional hardware acceleration. Our translator supports single megakernel and multiple kernel path tracing implementations without altering the programming model or input source code. Wavefront mode is crucial for NeRF and SDF, ensuring efficient evaluation with multiple kernels. Validation on tasks such as BVH tree build/traversal, ray-surface intersection for SDF, ray-volume intersection for 3D Gaussian Splatting, and complex Path Tracing models showed comparable performance levels to expert-written implementations for GPUs. Our technology outperformed existing Path Tracing implementations.", + "arxiv_url": "http://arxiv.org/abs/2409.12617v1", + "pdf_url": "http://arxiv.org/pdf/2409.12617v1", + "published_date": "2024-09-19", + "categories": [ + "cs.GR", + "I.3" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Hi-SLAM: Scaling-up Semantics in SLAM with a Hierarchically Categorical Gaussian Splatting", + "authors": [ + "Boying Li", + "Zhixi Cai", + "Yuan-Fang Li", + "Ian Reid", + "Hamid Rezatofighi" + ], + "abstract": "We propose Hi-SLAM, a semantic 3D Gaussian Splatting SLAM method featuring a novel hierarchical categorical representation, which enables accurate global 3D semantic mapping, scaling-up capability, and explicit semantic label prediction in the 3D world. The parameter usage in semantic SLAM systems increases significantly with the growing complexity of the environment, making it particularly challenging and costly for scene understanding. To address this problem, we introduce a novel hierarchical representation that encodes semantic information in a compact form into 3D Gaussian Splatting, leveraging the capabilities of large language models (LLMs). We further introduce a novel semantic loss designed to optimize hierarchical semantic information through both inter-level and cross-level optimization. Furthermore, we enhance the whole SLAM system, resulting in improved tracking and mapping performance. Our Hi-SLAM outperforms existing dense SLAM methods in both mapping and tracking accuracy, while achieving a 2x operation speed-up. Additionally, it exhibits competitive performance in rendering semantic segmentation in small synthetic scenes, with significantly reduced storage and training time requirements. Rendering FPS impressively reaches 2,000 with semantic information and 3,000 without it. Most notably, it showcases the capability of handling the complex real-world scene with more than 500 semantic classes, highlighting its valuable scaling-up capability.", + "arxiv_url": "http://arxiv.org/abs/2409.12518v2", + "pdf_url": "http://arxiv.org/pdf/2409.12518v2", + "published_date": "2024-09-19", + "categories": [ + "cs.RO", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus", + "authors": [ + "Jinchang Zhang", + "Ningning Xu", + "Hao Zhang", + "Guoyu Lu" + ], + "abstract": "Depth estimation is a fundamental task in 3D geometry. While stereo depth estimation can be achieved through triangulation methods, it is not as straightforward for monocular methods, which require the integration of global and local information. The Depth from Defocus (DFD) method utilizes camera lens models and parameters to recover depth information from blurred images and has been proven to perform well. However, these methods rely on All-In-Focus (AIF) images for depth estimation, which is nearly impossible to obtain in real-world applications. To address this issue, we propose a self-supervised framework based on 3D Gaussian splatting and Siamese networks. By learning the blur levels at different focal distances of the same scene in the focal stack, the framework predicts the defocus map and Circle of Confusion (CoC) from a single defocused image, using the defocus map as input to DepthNet for monocular depth estimation. The 3D Gaussian splatting model renders defocused images using the predicted CoC, and the differences between these and the real defocused images provide additional supervision signals for the Siamese Defocus self-supervised network. This framework has been validated on both artificially synthesized and real blurred datasets. Subsequent quantitative and visualization experiments demonstrate that our proposed framework is highly effective as a DFD method.", + "arxiv_url": "http://arxiv.org/abs/2409.12323v1", + "pdf_url": "http://arxiv.org/pdf/2409.12323v1", + "published_date": "2024-09-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Vista3D: Unravel the 3D Darkside of a Single Image", + "authors": [ + "Qiuhong Shen", + "Xingyi Yang", + "Michael Bi Mi", + "Xinchao Wang" + ], + "abstract": "We embark on the age-old quest: unveiling the hidden dimensions of objects from mere glimpses of their visible parts. To address this, we present Vista3D, a framework that realizes swift and consistent 3D generation within a mere 5 minutes. At the heart of Vista3D lies a two-phase approach: the coarse phase and the fine phase. In the coarse phase, we rapidly generate initial geometry with Gaussian Splatting from a single image. In the fine phase, we extract a Signed Distance Function (SDF) directly from learned Gaussian Splatting, optimizing it with a differentiable isosurface representation. Furthermore, it elevates the quality of generation by using a disentangled representation with two independent implicit functions to capture both visible and obscured aspects of objects. Additionally, it harmonizes gradients from 2D diffusion prior with 3D-aware diffusion priors by angular diffusion prior composition. Through extensive evaluation, we demonstrate that Vista3D effectively sustains a balance between the consistency and diversity of the generated 3D objects. Demos and code will be available at https://github.com/florinshen/Vista3D.", + "arxiv_url": "http://arxiv.org/abs/2409.12193v1", + "pdf_url": "http://arxiv.org/pdf/2409.12193v1", + "published_date": "2024-09-18", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GT", + "cs.MM" + ], + "github_url": "https://github.com/florinshen/Vista3D", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianHeads: End-to-End Learning of Drivable Gaussian Head Avatars from Coarse-to-fine Representations", + "authors": [ + "Kartik Teotia", + "Hyeongwoo Kim", + "Pablo Garrido", + "Marc Habermann", + "Mohamed Elgharib", + "Christian Theobalt" + ], + "abstract": "Real-time rendering of human head avatars is a cornerstone of many computer graphics applications, such as augmented reality, video games, and films, to name a few. Recent approaches address this challenge with computationally efficient geometry primitives in a carefully calibrated multi-view setup. Albeit producing photorealistic head renderings, it often fails to represent complex motion changes such as the mouth interior and strongly varying head poses. We propose a new method to generate highly dynamic and deformable human head avatars from multi-view imagery in real-time. At the core of our method is a hierarchical representation of head models that allows to capture the complex dynamics of facial expressions and head movements. First, with rich facial features extracted from raw input frames, we learn to deform the coarse facial geometry of the template mesh. We then initialize 3D Gaussians on the deformed surface and refine their positions in a fine step. We train this coarse-to-fine facial avatar model along with the head pose as a learnable parameter in an end-to-end framework. This enables not only controllable facial animation via video inputs, but also high-fidelity novel view synthesis of challenging facial expressions, such as tongue deformations and fine-grained teeth structure under large motion changes. Moreover, it encourages the learned head avatar to generalize towards new facial expressions and head poses at inference time. We demonstrate the performance of our method with comparisons against the related methods on different datasets, spanning challenging facial expression sequences across multiple identities. We also show the potential application of our approach by demonstrating a cross-identity facial performance transfer application.", + "arxiv_url": "http://arxiv.org/abs/2409.11951v1", + "pdf_url": "http://arxiv.org/pdf/2409.11951v1", + "published_date": "2024-09-18", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation", + "authors": [ + "Mingze Sun", + "Chen Guo", + "Puhua Jiang", + "Shiwei Mao", + "Yurun Chen", + "Ruqi Huang" + ], + "abstract": "In this paper, we propose SRIF, a novel Semantic shape Registration framework based on diffusion-based Image morphing and Flow estimation. More concretely, given a pair of extrinsically aligned shapes, we first render them from multi-views, and then utilize an image interpolation framework based on diffusion models to generate sequences of intermediate images between them. The images are later fed into a dynamic 3D Gaussian splatting framework, with which we reconstruct and post-process for intermediate point clouds respecting the image morphing processing. In the end, tailored for the above, we propose a novel registration module to estimate continuous normalizing flow, which deforms source shape consistently towards the target, with intermediate point clouds as weak guidance. Our key insight is to leverage large vision models (LVMs) to associate shapes and therefore obtain much richer semantic information on the relationship between shapes than the ad-hoc feature extraction and alignment. As a consequence, SRIF achieves high-quality dense correspondences on challenging shape pairs, but also delivers smooth, semantically meaningful interpolation in between. Empirical evidence justifies the effectiveness and superiority of our method as well as specific design choices. The code is released at https://github.com/rqhuang88/SRIF.", + "arxiv_url": "http://arxiv.org/abs/2409.11682v2", + "pdf_url": "http://arxiv.org/pdf/2409.11682v2", + "published_date": "2024-09-18", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/rqhuang88/SRIF", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks", + "authors": [ + "Joji Joseph", + "Bharadwaj Amrutur", + "Shalabh Bhatnagar" + ], + "abstract": "3D Gaussian Splatting has emerged as a powerful 3D scene representation technique, capturing fine details with high efficiency. In this paper, we introduce a novel voting-based method that extends 2D segmentation models to 3D Gaussian splats. Our approach leverages masked gradients, where gradients are filtered by input 2D masks, and these gradients are used as votes to achieve accurate segmentation. As a byproduct, we discovered that inference-time gradients can also be used to prune Gaussians, resulting in up to 21% compression. Additionally, we explore few-shot affordance transfer, allowing annotations from 2D images to be effectively transferred onto 3D Gaussian splats. The robust yet straightforward mathematical formulation underlying this approach makes it a highly effective tool for numerous downstream applications, such as augmented reality (AR), object editing, and robotics. The project code and additional resources are available at https://jojijoseph.github.io/3dgs-segmentation.", + "arxiv_url": "http://arxiv.org/abs/2409.11681v1", + "pdf_url": "http://arxiv.org/pdf/2409.11681v1", + "published_date": "2024-09-18", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RenderWorld: World Model with Self-Supervised 3D Label", + "authors": [ + "Ziyang Yan", + "Wenzhen Dong", + "Yihua Shao", + "Yuhang Lu", + "Liu Haiyang", + "Jingwen Liu", + "Haozhe Wang", + "Zhe Wang", + "Yan Wang", + "Fabio Remondino", + "Yuexin Ma" + ], + "abstract": "End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose RenderWorld, a vision-only end-to-end autonomous driving framework, which generates 3D occupancy labels using a self-supervised gaussian-based Img2Occ Module, then encodes the labels by AM-VAE, and uses world model for forecasting and planning. RenderWorld employs Gaussian Splatting to represent 3D scenes and render 2D images greatly improves segmentation accuracy and reduces GPU memory consumption compared with NeRF-based methods. By applying AM-VAE to encode air and non-air separately, RenderWorld achieves more fine-grained scene element representation, leading to state-of-the-art performance in both 4D occupancy forecasting and motion planning from autoregressive world model.", + "arxiv_url": "http://arxiv.org/abs/2409.11356v1", + "pdf_url": "http://arxiv.org/pdf/2409.11356v1", + "published_date": "2024-09-17", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module", + "authors": [ + "Yichen Zhang", + "Zihan Wang", + "Jiali Han", + "Peilin Li", + "Jiaxun Zhang", + "Jianqiang Wang", + "Lei He", + "Keqiang Li" + ], + "abstract": "3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generalization and practicality. To address these limitations, we propose GS-Net, a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation. To the best of our knowledge, GS-Net is the first plug-and-play 3DGS module with cross-scene generalization capabilities. Additionally, we introduce the CARLA-NVS dataset, which incorporates additional camera viewpoints to thoroughly evaluate reconstruction and rendering quality. Extensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR improvement of 2.08 dB for conventional viewpoints and 1.86 dB for novel viewpoints, confirming the method's effectiveness and robustness.", + "arxiv_url": "http://arxiv.org/abs/2409.11307v1", + "pdf_url": "http://arxiv.org/pdf/2409.11307v1", + "published_date": "2024-09-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction", + "authors": [ + "Marko Mihajlovic", + "Sergey Prokudin", + "Siyu Tang", + "Robert Maier", + "Federica Bogo", + "Tony Tung", + "Edmond Boyer" + ], + "abstract": "Digitizing 3D static scenes and 4D dynamic events from multi-view images has long been a challenge in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a practical and scalable reconstruction method, gaining popularity due to its impressive reconstruction quality, real-time rendering capabilities, and compatibility with widely used visualization tools. However, the method requires a substantial number of input views to achieve high-quality scene reconstruction, introducing a significant practical bottleneck. This challenge is especially severe in capturing dynamic scenes, where deploying an extensive camera array can be prohibitively costly. In this work, we identify the lack of spatial autocorrelation of splat features as one of the factors contributing to the suboptimal performance of the 3DGS technique in sparse reconstruction settings. To address the issue, we propose an optimization strategy that effectively regularizes splat features by modeling them as the outputs of a corresponding implicit neural field. This results in a consistent enhancement of reconstruction quality across various scenarios. Our approach effectively handles static and dynamic cases, as demonstrated by extensive testing across different setups and scene complexities.", + "arxiv_url": "http://arxiv.org/abs/2409.11211v1", + "pdf_url": "http://arxiv.org/pdf/2409.11211v1", + "published_date": "2024-09-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GLC-SLAM: Gaussian Splatting SLAM with Efficient Loop Closure", + "authors": [ + "Ziheng Xu", + "Qingfeng Li", + "Chen Chen", + "Xuefeng Liu", + "Jianwei Niu" + ], + "abstract": "3D Gaussian Splatting (3DGS) has gained significant attention for its application in dense Simultaneous Localization and Mapping (SLAM), enabling real-time rendering and high-fidelity mapping. However, existing 3DGS-based SLAM methods often suffer from accumulated tracking errors and map drift, particularly in large-scale environments. To address these issues, we introduce GLC-SLAM, a Gaussian Splatting SLAM system that integrates global optimization of camera poses and scene models. Our approach employs frame-to-model tracking and triggers hierarchical loop closure using a global-to-local strategy to minimize drift accumulation. By dividing the scene into 3D Gaussian submaps, we facilitate efficient map updates following loop corrections in large scenes. Additionally, our uncertainty-minimized keyframe selection strategy prioritizes keyframes observing more valuable 3D Gaussians to enhance submap optimization. Experimental results on various datasets demonstrate that GLC-SLAM achieves superior or competitive tracking and mapping performance compared to state-of-the-art dense RGB-D SLAM systems.", + "arxiv_url": "http://arxiv.org/abs/2409.10982v1", + "pdf_url": "http://arxiv.org/pdf/2409.10982v1", + "published_date": "2024-09-17", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HGSLoc: 3DGS-based Heuristic Camera Pose Refinement", + "authors": [ + "Zhongyan Niu", + "Zhen Tan", + "Jinpu Zhang", + "Xueliang Yang", + "Dewen Hu" + ], + "abstract": "Visual localization refers to the process of determining camera poses and orientation within a known scene representation. This task is often complicated by factors such as illumination changes and variations in viewing angles. In this paper, we propose HGSLoc, a novel lightweight, plug and-play pose optimization framework, which integrates 3D reconstruction with a heuristic refinement strategy to achieve higher pose estimation accuracy. Specifically, we introduce an explicit geometric map for 3D representation and high-fidelity rendering, allowing the generation of high-quality synthesized views to support accurate visual localization. Our method demonstrates a faster rendering speed and higher localization accuracy compared to NeRF-based neural rendering localization approaches. We introduce a heuristic refinement strategy, its efficient optimization capability can quickly locate the target node, while we set the step-level optimization step to enhance the pose accuracy in the scenarios with small errors. With carefully designed heuristic functions, it offers efficient optimization capabilities, enabling rapid error reduction in rough localization estimations. Our method mitigates the dependence on complex neural network models while demonstrating improved robustness against noise and higher localization accuracy in challenging environments, as compared to neural network joint optimization strategies. The optimization framework proposed in this paper introduces novel approaches to visual localization by integrating the advantages of 3D reconstruction and heuristic refinement strategy, which demonstrates strong performance across multiple benchmark datasets, including 7Scenes and DB dataset.", + "arxiv_url": "http://arxiv.org/abs/2409.10925v2", + "pdf_url": "http://arxiv.org/pdf/2409.10925v2", + "published_date": "2024-09-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "neural rendering", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Phys3DGS: Physically-based 3D Gaussian Splatting for Inverse Rendering", + "authors": [ + "Euntae Choi", + "Sungjoo Yoo" + ], + "abstract": "We propose two novel ideas (adoption of deferred rendering and mesh-based representation) to improve the quality of 3D Gaussian splatting (3DGS) based inverse rendering. We first report a problem incurred by hidden Gaussians, where Gaussians beneath the surface adversely affect the pixel color in the volume rendering adopted by the existing methods. In order to resolve the problem, we propose applying deferred rendering and report new problems incurred in a naive application of deferred rendering to the existing 3DGS-based inverse rendering. In an effort to improve the quality of 3DGS-based inverse rendering under deferred rendering, we propose a novel two-step training approach which (1) exploits mesh extraction and utilizes a hybrid mesh-3DGS representation and (2) applies novel regularization methods to better exploit the mesh. Our experiments show that, under relighting, the proposed method offers significantly better rendering quality than the existing 3DGS-based inverse rendering methods. Compared with the SOTA voxel grid-based inverse rendering method, it gives better rendering quality while offering real-time rendering.", + "arxiv_url": "http://arxiv.org/abs/2409.10335v1", + "pdf_url": "http://arxiv.org/pdf/2409.10335v1", + "published_date": "2024-09-16", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "BEINGS: Bayesian Embodied Image-goal Navigation with Gaussian Splatting", + "authors": [ + "Wugang Meng", + "Tianfu Wu", + "Huan Yin", + "Fumin Zhang" + ], + "abstract": "Image-goal navigation enables a robot to reach the location where a target image was captured, using visual cues for guidance. However, current methods either rely heavily on data and computationally expensive learning-based approaches or lack efficiency in complex environments due to insufficient exploration strategies. To address these limitations, we propose Bayesian Embodied Image-goal Navigation Using Gaussian Splatting, a novel method that formulates ImageNav as an optimal control problem within a model predictive control framework. BEINGS leverages 3D Gaussian Splatting as a scene prior to predict future observations, enabling efficient, real-time navigation decisions grounded in the robot's sensory experiences. By integrating Bayesian updates, our method dynamically refines the robot's strategy without requiring extensive prior experience or data. Our algorithm is validated through extensive simulations and physical experiments, showcasing its potential for embodied robot systems in visually complex scenarios.", + "arxiv_url": "http://arxiv.org/abs/2409.10216v1", + "pdf_url": "http://arxiv.org/pdf/2409.10216v1", + "published_date": "2024-09-16", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatSim: Zero-Shot Sim2Real Transfer of RGB Manipulation Policies Using Gaussian Splatting", + "authors": [ + "Mohammad Nomaan Qureshi", + "Sparsh Garg", + "Francisco Yandun", + "David Held", + "George Kantor", + "Abhisesh Silwal" + ], + "abstract": "Sim2Real transfer, particularly for manipulation policies relying on RGB images, remains a critical challenge in robotics due to the significant domain shift between synthetic and real-world visual data. In this paper, we propose SplatSim, a novel framework that leverages Gaussian Splatting as the primary rendering primitive to reduce the Sim2Real gap for RGB-based manipulation policies. By replacing traditional mesh representations with Gaussian Splats in simulators, SplatSim produces highly photorealistic synthetic data while maintaining the scalability and cost-efficiency of simulation. We demonstrate the effectiveness of our framework by training manipulation policies within SplatSim and deploying them in the real world in a zero-shot manner, achieving an average success rate of 86.25%, compared to 97.5% for policies trained on real-world data. Videos can be found on our project page: https://splatsim.github.io", + "arxiv_url": "http://arxiv.org/abs/2409.10161v3", + "pdf_url": "http://arxiv.org/pdf/2409.10161v3", + "published_date": "2024-09-16", + "categories": [ + "cs.RO", + "cs.AI", + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression", + "authors": [ + "Yi-Hsin Li", + "Sebastian Knorr", + "Mårten Sjöström", + "Thomas Sikora" + ], + "abstract": "Kernel image regression methods have shown to provide excellent efficiency in many image processing task, such as image and light-field compression, Gaussian Splatting, denoising and super-resolution. The estimation of parameters for these methods frequently employ gradient descent iterative optimization, which poses significant computational burden for many applications. In this paper, we introduce a novel adaptive segmentation-based initialization method targeted for optimizing Steered-Mixture-of Experts (SMoE) gating networks and Radial-Basis-Function (RBF) networks with steering kernels. The novel initialization method allocates kernels into pre-calculated image segments. The optimal number of kernels, kernel positions, and steering parameters are derived per segment in an iterative optimization and kernel sparsification procedure. The kernel information from \"local\" segments is then transferred into a \"global\" initialization, ready for use in iterative optimization of SMoE, RBF, and related kernel image regression methods. Results show that drastic objective and subjective quality improvements are achievable compared to widely used regular grid initialization, \"state-of-the-art\" K-Means initialization and previously introduced segmentation-based initialization methods, while also drastically improving the sparsity of the regression models. For same quality, the novel initialization results in models with around 50% reduction of kernels. In addition, a significant reduction of convergence time is achieved, with overall run-time savings of up to 50%. The segmentation-based initialization strategy itself admits heavy parallel computation; in theory, it may be divided into as many tasks as there are segments in the images. By accessing only four parallel GPUs, run-time savings of already 50% for initialization are achievable.", + "arxiv_url": "http://arxiv.org/abs/2409.10101v1", + "pdf_url": "http://arxiv.org/pdf/2409.10101v1", + "published_date": "2024-09-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments", + "authors": [ + "Mahmud A. Mohamad", + "Gamal Elghazaly", + "Arthur Hubert", + "Raphael Frank" + ], + "abstract": "This paper presents DENSER, an efficient and effective approach leveraging 3D Gaussian splatting (3DGS) for the reconstruction of dynamic urban environments. While several methods for photorealistic scene representations, both implicitly using neural radiance fields (NeRF) and explicitly using 3DGS have shown promising results in scene reconstruction of relatively complex dynamic scenes, modeling the dynamic appearance of foreground objects tend to be challenging, limiting the applicability of these methods to capture subtleties and details of the scenes, especially far dynamic objects. To this end, we propose DENSER, a framework that significantly enhances the representation of dynamic objects and accurately models the appearance of dynamic objects in the driving scene. Instead of directly using Spherical Harmonics (SH) to model the appearance of dynamic objects, we introduce and integrate a new method aiming at dynamically estimating SH bases using wavelets, resulting in better representation of dynamic objects appearance in both space and time. Besides object appearance, DENSER enhances object shape representation through densification of its point cloud across multiple scene frames, resulting in faster convergence of model training. Extensive evaluations on KITTI dataset show that the proposed approach significantly outperforms state-of-the-art methods by a wide margin. Source codes and models will be uploaded to this repository https://github.com/sntubix/denser", + "arxiv_url": "http://arxiv.org/abs/2409.10041v1", + "pdf_url": "http://arxiv.org/pdf/2409.10041v1", + "published_date": "2024-09-16", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/sntubix/denser", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SAFER-Splat: A Control Barrier Function for Safe Navigation with Online Gaussian Splatting Maps", + "authors": [ + "Timothy Chen", + "Aiden Swann", + "Javier Yu", + "Ola Shorinwa", + "Riku Murai", + "Monroe Kennedy III", + "Mac Schwager" + ], + "abstract": "SAFER-Splat (Simultaneous Action Filtering and Environment Reconstruction) is a real-time, scalable, and minimally invasive action filter, based on control barrier functions, for safe robotic navigation in a detailed map constructed at runtime using Gaussian Splatting (GSplat). We propose a novel Control Barrier Function (CBF) that not only induces safety with respect to all Gaussian primitives in the scene, but when synthesized into a controller, is capable of processing hundreds of thousands of Gaussians while maintaining a minimal memory footprint and operating at 15 Hz during online Splat training. Of the total compute time, a small fraction of it consumes GPU resources, enabling uninterrupted training. The safety layer is minimally invasive, correcting robot actions only when they are unsafe. To showcase the safety filter, we also introduce SplatBridge, an open-source software package built with ROS for real-time GSplat mapping for robots. We demonstrate the safety and robustness of our pipeline first in simulation, where our method is 20-50x faster, safer, and less conservative than competing methods based on neural radiance fields. Further, we demonstrate simultaneous GSplat mapping and safety filtering on a drone hardware platform using only on-board perception. We verify that under teleoperation a human pilot cannot invoke a collision. Our videos and codebase can be found at https://chengine.github.io/safer-splat.", + "arxiv_url": "http://arxiv.org/abs/2409.09868v1", + "pdf_url": "http://arxiv.org/pdf/2409.09868v1", + "published_date": "2024-09-15", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation", + "authors": [ + "Shuzhao Xie", + "Weixiang Zhang", + "Chen Tang", + "Yunpeng Bai", + "Rongwei Lu", + "Shijia Ge", + "Zhi Wang" + ], + "abstract": "3D Gaussian Splatting demonstrates excellent quality and speed in novel view synthesis. Nevertheless, the huge file size of the 3D Gaussians presents challenges for transmission and storage. Current works design compact models to replace the substantial volume and attributes of 3D Gaussians, along with intensive training to distill information. These endeavors demand considerable training time, presenting formidable hurdles for practical deployment. To this end, we propose MesonGS, a codec for post-training compression of 3D Gaussians. Initially, we introduce a measurement criterion that considers both view-dependent and view-independent factors to assess the impact of each Gaussian point on the rendering output, enabling the removal of insignificant points. Subsequently, we decrease the entropy of attributes through two transformations that complement subsequent entropy coding techniques to enhance the file compression rate. More specifically, we first replace rotation quaternions with Euler angles; then, we apply region adaptive hierarchical transform to key attributes to reduce entropy. Lastly, we adopt finer-grained quantization to avoid excessive information loss. Moreover, a well-crafted finetune scheme is devised to restore quality. Extensive experiments demonstrate that MesonGS significantly reduces the size of 3D Gaussians while preserving competitive quality.", + "arxiv_url": "http://arxiv.org/abs/2409.09756v1", + "pdf_url": "http://arxiv.org/pdf/2409.09756v1", + "published_date": "2024-09-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GEVO: Memory-Efficient Monocular Visual Odometry Using Gaussians", + "authors": [ + "Dasong Gao", + "Peter Zhi Xuan Li", + "Vivienne Sze", + "Sertac Karaman" + ], + "abstract": "Constructing a high-fidelity representation of the 3D scene using a monocular camera can enable a wide range of applications on mobile devices, such as micro-robots, smartphones, and AR/VR headsets. On these devices, memory is often limited in capacity and its access often dominates the consumption of compute energy. Although Gaussian Splatting (GS) allows for high-fidelity reconstruction of 3D scenes, current GS-based SLAM is not memory efficient as a large number of past images is stored to retrain Gaussians for reducing catastrophic forgetting. These images often require two-orders-of-magnitude higher memory than the map itself and thus dominate the total memory usage. In this work, we present GEVO, a GS-based monocular SLAM framework that achieves comparable fidelity as prior methods by rendering (instead of storing) them from the existing map. Novel Gaussian initialization and optimization techniques are proposed to remove artifacts from the map and delay the degradation of the rendered images over time. Across a variety of environments, GEVO achieves comparable map fidelity while reducing the memory overhead to around 58 MBs, which is up to 94x lower than prior works.", + "arxiv_url": "http://arxiv.org/abs/2409.09295v1", + "pdf_url": "http://arxiv.org/pdf/2409.09295v1", + "published_date": "2024-09-14", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis", + "authors": [ + "Yohan Poirier-Ginter", + "Alban Gauthier", + "Julien Philip", + "Jean-Francois Lalonde", + "George Drettakis" + ], + "abstract": "Relighting radiance fields is severely underconstrained for multi-view data, which is most often captured under a single illumination condition; It is especially hard for full scenes containing multiple objects. We introduce a method to create relightable radiance fields using such single-illumination data by exploiting priors extracted from 2D image diffusion models. We first fine-tune a 2D diffusion model on a multi-illumination dataset conditioned by light direction, allowing us to augment a single-illumination capture into a realistic -- but possibly inconsistent -- multi-illumination dataset from directly defined light directions. We use this augmented data to create a relightable radiance field represented by 3D Gaussian splats. To allow direct control of light direction for low-frequency lighting, we represent appearance with a multi-layer perceptron parameterized on light direction. To enforce multi-view consistency and overcome inaccuracies we optimize a per-image auxiliary feature vector. We show results on synthetic and real multi-view data under single illumination, demonstrating that our method successfully exploits 2D diffusion model priors to allow realistic 3D relighting for complete scenes. Project site https://repo-sam.inria.fr/fungraph/generative-radiance-field-relighting/", + "arxiv_url": "http://arxiv.org/abs/2409.08947v2", + "pdf_url": "http://arxiv.org/pdf/2409.08947v2", + "published_date": "2024-09-13", + "categories": [ + "cs.CV", + "cs.GR", + "I.3; I.4" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AdR-Gaussian: Accelerating Gaussian Splatting with Adaptive Radius", + "authors": [ + "Xinzhe Wang", + "Ran Yi", + "Lizhuang Ma" + ], + "abstract": "3D Gaussian Splatting (3DGS) is a recent explicit 3D representation that has achieved high-quality reconstruction and real-time rendering of complex scenes. However, the rasterization pipeline still suffers from unnecessary overhead resulting from avoidable serial Gaussian culling, and uneven load due to the distinct number of Gaussian to be rendered across pixels, which hinders wider promotion and application of 3DGS. In order to accelerate Gaussian splatting, we propose AdR-Gaussian, which moves part of serial culling in Render stage into the earlier Preprocess stage to enable parallel culling, employing adaptive radius to narrow the rendering pixel range for each Gaussian, and introduces a load balancing method to minimize thread waiting time during the pixel-parallel rendering. Our contributions are threefold, achieving a rendering speed of 310% while maintaining equivalent or even better quality than the state-of-the-art. Firstly, we propose to early cull Gaussian-Tile pairs of low splatting opacity based on an adaptive radius in the Gaussian-parallel Preprocess stage, which reduces the number of affected tile through the Gaussian bounding circle, thus reducing unnecessary overhead and achieving faster rendering speed. Secondly, we further propose early culling based on axis-aligned bounding box for Gaussian splatting, which achieves a more significant reduction in ineffective expenses by accurately calculating the Gaussian size in the 2D directions. Thirdly, we propose a balancing algorithm for pixel thread load, which compresses the information of heavy-load pixels to reduce thread waiting time, and enhance information of light-load pixels to hedge against rendering quality loss. Experiments on three datasets demonstrate that our algorithm can significantly improve the Gaussian Splatting rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2409.08669v1", + "pdf_url": "http://arxiv.org/pdf/2409.08669v1", + "published_date": "2024-09-13", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dense Point Clouds Matter: Dust-GS for Scene Reconstruction from Sparse Viewpoints", + "authors": [ + "Shan Chen", + "Jiale Zhou", + "Lei Li" + ], + "abstract": "3D Gaussian Splatting (3DGS) has demonstrated remarkable performance in scene synthesis and novel view synthesis tasks. Typically, the initialization of 3D Gaussian primitives relies on point clouds derived from Structure-from-Motion (SfM) methods. However, in scenarios requiring scene reconstruction from sparse viewpoints, the effectiveness of 3DGS is significantly constrained by the quality of these initial point clouds and the limited number of input images. In this study, we present Dust-GS, a novel framework specifically designed to overcome the limitations of 3DGS in sparse viewpoint conditions. Instead of relying solely on SfM, Dust-GS introduces an innovative point cloud initialization technique that remains effective even with sparse input data. Our approach leverages a hybrid strategy that integrates an adaptive depth-based masking technique, thereby enhancing the accuracy and detail of reconstructed scenes. Extensive experiments conducted on several benchmark datasets demonstrate that Dust-GS surpasses traditional 3DGS methods in scenarios with sparse viewpoints, achieving superior scene reconstruction quality with a reduced number of input images.", + "arxiv_url": "http://arxiv.org/abs/2409.08613v1", + "pdf_url": "http://arxiv.org/pdf/2409.08613v1", + "published_date": "2024-09-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CSS: Overcoming Pose and Scene Challenges in Crowd-Sourced 3D Gaussian Splatting", + "authors": [ + "Runze Chen", + "Mingyu Xiao", + "Haiyong Luo", + "Fang Zhao", + "Fan Wu", + "Hao Xiong", + "Qi Liu", + "Meng Song" + ], + "abstract": "We introduce Crowd-Sourced Splatting (CSS), a novel 3D Gaussian Splatting (3DGS) pipeline designed to overcome the challenges of pose-free scene reconstruction using crowd-sourced imagery. The dream of reconstructing historically significant but inaccessible scenes from collections of photographs has long captivated researchers. However, traditional 3D techniques struggle with missing camera poses, limited viewpoints, and inconsistent lighting. CSS addresses these challenges through robust geometric priors and advanced illumination modeling, enabling high-quality novel view synthesis under complex, real-world conditions. Our method demonstrates clear improvements over existing approaches, paving the way for more accurate and flexible applications in AR, VR, and large-scale 3D reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2409.08562v1", + "pdf_url": "http://arxiv.org/pdf/2409.08562v1", + "published_date": "2024-09-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos", + "authors": [ + "Yuheng Jiang", + "Zhehao Shen", + "Yu Hong", + "Chengcheng Guo", + "Yize Wu", + "Yingliang Zhang", + "Jingyi Yu", + "Lan Xu" + ], + "abstract": "Volumetric video represents a transformative advancement in visual media, enabling users to freely navigate immersive virtual experiences and narrowing the gap between digital and real worlds. However, the need for extensive manual intervention to stabilize mesh sequences and the generation of excessively large assets in existing workflows impedes broader adoption. In this paper, we present a novel Gaussian-based approach, dubbed \\textit{DualGS}, for real-time and high-fidelity playback of complex human performance with excellent compression ratios. Our key idea in DualGS is to separately represent motion and appearance using the corresponding skin and joint Gaussians. Such an explicit disentanglement can significantly reduce motion redundancy and enhance temporal coherence. We begin by initializing the DualGS and anchoring skin Gaussians to joint Gaussians at the first frame. Subsequently, we employ a coarse-to-fine training strategy for frame-by-frame human performance modeling. It includes a coarse alignment phase for overall motion prediction as well as a fine-grained optimization for robust tracking and high-fidelity rendering. To integrate volumetric video seamlessly into VR environments, we efficiently compress motion using entropy encoding and appearance using codec compression coupled with a persistent codebook. Our approach achieves a compression ratio of up to 120 times, only requiring approximately 350KB of storage per frame. We demonstrate the efficacy of our representation through photo-realistic, free-view experiences on VR headsets, enabling users to immersively watch musicians in performance and feel the rhythm of the notes at the performers' fingertips.", + "arxiv_url": "http://arxiv.org/abs/2409.08353v1", + "pdf_url": "http://arxiv.org/pdf/2409.08353v1", + "published_date": "2024-09-12", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally", + "authors": [ + "Qiuhong Shen", + "Xingyi Yang", + "Xinchao Wang" + ], + "abstract": "This study addresses the challenge of accurately segmenting 3D Gaussian Splatting from 2D masks. Conventional methods often rely on iterative gradient descent to assign each Gaussian a unique label, leading to lengthy optimization and sub-optimal solutions. Instead, we propose a straightforward yet globally optimal solver for 3D-GS segmentation. The core insight of our method is that, with a reconstructed 3D-GS scene, the rendering of the 2D masks is essentially a linear function with respect to the labels of each Gaussian. As such, the optimal label assignment can be solved via linear programming in closed form. This solution capitalizes on the alpha blending characteristic of the splatting process for single step optimization. By incorporating the background bias in our objective function, our method shows superior robustness in 3D segmentation against noises. Remarkably, our optimization completes within 30 seconds, about 50$\\times$ faster than the best existing methods. Extensive experiments demonstrate the efficiency and robustness of our method in segmenting various scenes, and its superior performance in downstream tasks such as object removal and inpainting. Demos and code will be available at https://github.com/florinshen/FlashSplat.", + "arxiv_url": "http://arxiv.org/abs/2409.08270v1", + "pdf_url": "http://arxiv.org/pdf/2409.08270v1", + "published_date": "2024-09-12", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "cs.MM" + ], + "github_url": "https://github.com/florinshen/FlashSplat", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis", + "authors": [ + "Qian Chen", + "Shihao Shu", + "Xiangzhi Bai" + ], + "abstract": "Novel-view synthesis based on visible light has been extensively studied. In comparison to visible light imaging, thermal infrared imaging offers the advantage of all-weather imaging and strong penetration, providing increased possibilities for reconstruction in nighttime and adverse weather scenarios. However, thermal infrared imaging is influenced by physical characteristics such as atmospheric transmission effects and thermal conduction, hindering the precise reconstruction of intricate details in thermal infrared scenes, manifesting as issues of floaters and indistinct edge features in synthesized images. To address these limitations, this paper introduces a physics-induced 3D Gaussian splatting method named Thermal3D-GS. Thermal3D-GS begins by modeling atmospheric transmission effects and thermal conduction in three-dimensional media using neural networks. Additionally, a temperature consistency constraint is incorporated into the optimization objective to enhance the reconstruction accuracy of thermal infrared images. Furthermore, to validate the effectiveness of our method, the first large-scale benchmark dataset for this field named Thermal Infrared Novel-view Synthesis Dataset (TI-NSD) is created. This dataset comprises 20 authentic thermal infrared video scenes, covering indoor, outdoor, and UAV(Unmanned Aerial Vehicle) scenarios, totaling 6,664 frames of thermal infrared image data. Based on this dataset, this paper experimentally verifies the effectiveness of Thermal3D-GS. The results indicate that our method outperforms the baseline method with a 3.03 dB improvement in PSNR and significantly addresses the issues of floaters and indistinct edge features present in the baseline method. Our dataset and codebase will be released in \\href{https://github.com/mzzcdf/Thermal3DGS}{\\textcolor{red}{Thermal3DGS}}.", + "arxiv_url": "http://arxiv.org/abs/2409.08042v1", + "pdf_url": "http://arxiv.org/pdf/2409.08042v1", + "published_date": "2024-09-12", + "categories": [ + "cs.CV", + "cs.GR", + "I.3.3; I.4.5" + ], + "github_url": "https://github.com/mzzcdf/Thermal3DGS", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length", + "authors": [ + "Bangya Liu", + "Suman Banerjee" + ], + "abstract": "Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant attention in computer vision and computer graphics due to its high rendering speed and remarkable quality. While extant research has endeavored to extend the application of 3DGS from static to dynamic scenes, such efforts have been consistently impeded by excessive model sizes, constraints on video duration, and content deviation. These limitations significantly compromise the streamability of dynamic 3D Gaussian models, thereby restricting their utility in downstream applications, including volumetric video, autonomous vehicle, and immersive technologies such as virtual, augmented, and mixed reality. This paper introduces SwinGS, a novel framework for training, delivering, and rendering volumetric video in a real-time streaming fashion. To address the aforementioned challenges and enhance streamability, SwinGS integrates spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to fit various 3D scenes across frames, in the meantime employing a sliding window captures Gaussian snapshots for each frame in an accumulative way. We implement a prototype of SwinGS and demonstrate its streamability across various datasets and scenes. Additionally, we develop an interactive WebGL viewer enabling real-time volumetric video playback on most devices with modern browsers, including smartphones and tablets. Experimental results show that SwinGS reduces transmission costs by 83.6% compared to previous work with ignorable compromise in PSNR. Moreover, SwinGS easily scales to long video sequences without compromising quality.", + "arxiv_url": "http://arxiv.org/abs/2409.07759v1", + "pdf_url": "http://arxiv.org/pdf/2409.07759v1", + "published_date": "2024-09-12", + "categories": [ + "cs.MM", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs", + "authors": [ + "Sadra Safadoust", + "Fabio Tosi", + "Fatma Güney", + "Matteo Poggi" + ], + "abstract": "3D Gaussian Splatting (GS) significantly struggles to accurately represent the underlying 3D scene geometry, resulting in inaccuracies and floating artifacts when rendering depth maps. In this paper, we address this limitation, undertaking a comprehensive analysis of the integration of depth priors throughout the optimization process of Gaussian primitives, and present a novel strategy for this purpose. This latter dynamically exploits depth cues from a readily available stereo network, processing virtual stereo pairs rendered by the GS model itself during training and achieving consistent self-improvement of the scene representation. Experimental results on three popular datasets, breaking ground as the first to assess depth accuracy for these models, validate our findings.", + "arxiv_url": "http://arxiv.org/abs/2409.07456v1", + "pdf_url": "http://arxiv.org/pdf/2409.07456v1", + "published_date": "2024-09-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models", + "authors": [ + "Haibo Yang", + "Yang Chen", + "Yingwei Pan", + "Ting Yao", + "Zhineng Chen", + "Chong-Wah Ngo", + "Tao Mei" + ], + "abstract": "Despite having tremendous progress in image-to-3D generation, existing methods still struggle to produce multi-view consistent images with high-resolution textures in detail, especially in the paradigm of 2D diffusion that lacks 3D awareness. In this work, we present High-resolution Image-to-3D model (Hi3D), a new video diffusion based paradigm that redefines a single image to multi-view images as 3D-aware sequential image generation (i.e., orbital video generation). This methodology delves into the underlying temporal consistency knowledge in video diffusion model that generalizes well to geometry consistency across multiple views in 3D generation. Technically, Hi3D first empowers the pre-trained video diffusion model with 3D-aware prior (camera pose condition), yielding multi-view images with low-resolution texture details. A 3D-aware video-to-video refiner is learnt to further scale up the multi-view images with high-resolution texture details. Such high-resolution multi-view images are further augmented with novel views through 3D Gaussian Splatting, which are finally leveraged to obtain high-fidelity meshes via 3D reconstruction. Extensive experiments on both novel view synthesis and single view reconstruction demonstrate that our Hi3D manages to produce superior multi-view consistency images with highly-detailed textures. Source code and data are available at \\url{https://github.com/yanghb22-fdu/Hi3D-Official}.", + "arxiv_url": "http://arxiv.org/abs/2409.07452v1", + "pdf_url": "http://arxiv.org/pdf/2409.07452v1", + "published_date": "2024-09-11", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "https://github.com/yanghb22-fdu/Hi3D-Official", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering", + "authors": [ + "Dafei Qin", + "Hongyang Lin", + "Qixuan Zhang", + "Kaichun Qiao", + "Longwen Zhang", + "Zijun Zhao", + "Jun Saito", + "Jingyi Yu", + "Lan Xu", + "Taku Komura" + ], + "abstract": "We propose GauFace, a novel Gaussian Splatting representation, tailored for efficient animation and rendering of physically-based facial assets. Leveraging strong geometric priors and constrained optimization, GauFace ensures a neat and structured Gaussian representation, delivering high fidelity and real-time facial interaction of 30fps@1440p on a Snapdragon 8 Gen 2 mobile platform. Then, we introduce TransGS, a diffusion transformer that instantly translates physically-based facial assets into the corresponding GauFace representations. Specifically, we adopt a patch-based pipeline to handle the vast number of Gaussians effectively. We also introduce a novel pixel-aligned sampling scheme with UV positional encoding to ensure the throughput and rendering quality of GauFace assets generated by our TransGS. Once trained, TransGS can instantly translate facial assets with lighting conditions to GauFace representation, With the rich conditioning modalities, it also enables editing and animation capabilities reminiscent of traditional CG pipelines. We conduct extensive evaluations and user studies, compared to traditional offline and online renderers, as well as recent neural rendering methods, which demonstrate the superior performance of our approach for facial asset rendering. We also showcase diverse immersive applications of facial assets using our TransGS approach and GauFace representation, across various platforms like PCs, phones and even VR headsets.", + "arxiv_url": "http://arxiv.org/abs/2409.07441v2", + "pdf_url": "http://arxiv.org/pdf/2409.07441v2", + "published_date": "2024-09-11", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks", + "authors": [ + "Ruihan Xu", + "Anthony Opipari", + "Joshua Mah", + "Stanley Lewis", + "Haoran Zhang", + "Hanzhe Guo", + "Odest Chadwicke Jenkins" + ], + "abstract": "This paper introduces SO(2)-Equivariant Gaussian Sculpting Networks (GSNs) as an approach for SO(2)-Equivariant 3D object reconstruction from single-view image observations. GSNs take a single observation as input to generate a Gaussian splat representation describing the observed object's geometry and texture. By using a shared feature extractor before decoding Gaussian colors, covariances, positions, and opacities, GSNs achieve extremely high throughput (>150FPS). Experiments demonstrate that GSNs can be trained efficiently using a multi-view rendering loss and are competitive, in quality, with expensive diffusion-based reconstruction algorithms. The GSN model is validated on multiple benchmark experiments. Moreover, we demonstrate the potential for GSNs to be used within a robotic manipulation pipeline for object-centric grasping.", + "arxiv_url": "http://arxiv.org/abs/2409.07245v1", + "pdf_url": "http://arxiv.org/pdf/2409.07245v1", + "published_date": "2024-09-11", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ThermalGaussian: Thermal 3D Gaussian Splatting", + "authors": [ + "Rongfeng Lu", + "Hangyu Chen", + "Zunjie Zhu", + "Yuhang Qin", + "Ming Lu", + "Le Zhang", + "Chenggang Yan", + "Anke Xue" + ], + "abstract": "Thermography is especially valuable for the military and other users of surveillance cameras. Some recent methods based on Neural Radiance Fields (NeRF) are proposed to reconstruct the thermal scenes in 3D from a set of thermal and RGB images. However, unlike NeRF, 3D Gaussian splatting (3DGS) prevails due to its rapid training and real-time rendering. In this work, we propose ThermalGaussian, the first thermal 3DGS approach capable of rendering high-quality images in RGB and thermal modalities. We first calibrate the RGB camera and the thermal camera to ensure that both modalities are accurately aligned. Subsequently, we use the registered images to learn the multimodal 3D Gaussians. To prevent the overfitting of any single modality, we introduce several multimodal regularization constraints. We also develop smoothing constraints tailored to the physical characteristics of the thermal modality. Besides, we contribute a real-world dataset named RGBT-Scenes, captured by a hand-hold thermal-infrared camera, facilitating future research on thermal scene reconstruction. We conduct comprehensive experiments to show that ThermalGaussian achieves photorealistic rendering of thermal images and improves the rendering quality of RGB images. With the proposed multimodal regularization constraints, we also reduced the model's storage cost by 90\\%. The code and dataset will be released.", + "arxiv_url": "http://arxiv.org/abs/2409.07200v1", + "pdf_url": "http://arxiv.org/pdf/2409.07200v1", + "published_date": "2024-09-11", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "gsplat: An Open-Source Library for Gaussian Splatting", + "authors": [ + "Vickie Ye", + "Ruilong Li", + "Justin Kerr", + "Matias Turkulainen", + "Brent Yi", + "Zhuoyang Pan", + "Otto Seiskari", + "Jianbo Ye", + "Jeffrey Hu", + "Matthew Tancik", + "Angjoo Kanazawa" + ], + "abstract": "gsplat is an open-source library designed for training and developing Gaussian Splatting methods. It features a front-end with Python bindings compatible with the PyTorch library and a back-end with highly optimized CUDA kernels. gsplat offers numerous features that enhance the optimization of Gaussian Splatting models, which include optimization improvements for speed, memory, and convergence times. Experimental results demonstrate that gsplat achieves up to 10% less training time and 4x less memory than the original implementation. Utilized in several research projects, gsplat is actively maintained on GitHub. Source code is available at https://github.com/nerfstudio-project/gsplat under Apache License 2.0. We welcome contributions from the open-source community.", + "arxiv_url": "http://arxiv.org/abs/2409.06765v1", + "pdf_url": "http://arxiv.org/pdf/2409.06765v1", + "published_date": "2024-09-10", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/nerfstudio-project/gsplat", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction", + "authors": [ + "Junyi Chen", + "Weicai Ye", + "Yifan Wang", + "Danpeng Chen", + "Di Huang", + "Wanli Ouyang", + "Guofeng Zhang", + "Yu Qiao", + "Tong He" + ], + "abstract": "3D Gaussian Splatting (3DGS) has shown promising performance in novel view synthesis. Previous methods adapt it to obtaining surfaces of either individual 3D objects or within limited scenes. In this paper, we make the first attempt to tackle the challenging task of large-scale scene surface reconstruction. This task is particularly difficult due to the high GPU memory consumption, different levels of details for geometric representation, and noticeable inconsistencies in appearance. To this end, we propose GigaGS, the first work for high-quality surface reconstruction for large-scale scenes using 3DGS. GigaGS first applies a partitioning strategy based on the mutual visibility of spatial regions, which effectively grouping cameras for parallel processing. To enhance the quality of the surface, we also propose novel multi-view photometric and geometric consistency constraints based on Level-of-Detail representation. In doing so, our method can reconstruct detailed surface structures. Comprehensive experiments are conducted on various datasets. The consistent improvement demonstrates the superiority of GigaGS.", + "arxiv_url": "http://arxiv.org/abs/2409.06685v1", + "pdf_url": "http://arxiv.org/pdf/2409.06685v1", + "published_date": "2024-09-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification", + "authors": [ + "Phu Pham", + "Aradhya N. Mathur", + "Ojaswa Sharma", + "Aniket Bera" + ], + "abstract": "The field of text-to-3D content generation has made significant progress in generating realistic 3D objects, with existing methodologies like Score Distillation Sampling (SDS) offering promising guidance. However, these methods often encounter the \"Janus\" problem-multi-face ambiguities due to imprecise guidance. Additionally, while recent advancements in 3D gaussian splitting have shown its efficacy in representing 3D volumes, optimization of this representation remains largely unexplored. This paper introduces a unified framework for text-to-3D content generation that addresses these critical gaps. Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model, progressively enhancing detail and accuracy. We also introduce a novel densification algorithm that aligns gaussians close to the surface, optimizing the structural integrity and fidelity of the generated models. Extensive experiments validate our approach, demonstrating that it produces high-quality visual outputs with minimal time cost. Notably, our method achieves high-quality results within half an hour of training, offering a substantial efficiency gain over most existing methods, which require hours of training time to achieve comparable results.", + "arxiv_url": "http://arxiv.org/abs/2409.06620v1", + "pdf_url": "http://arxiv.org/pdf/2409.06620v1", + "published_date": "2024-09-10", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sources of Uncertainty in 3D Scene Reconstruction", + "authors": [ + "Marcus Klasson", + "Riccardo Mereu", + "Juho Kannala", + "Arno Solin" + ], + "abstract": "The process of 3D scene reconstruction can be affected by numerous uncertainty sources in real-world scenes. While Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (GS) achieve high-fidelity rendering, they lack built-in mechanisms to directly address or quantify uncertainties arising from the presence of noise, occlusions, confounding outliers, and imprecise camera pose inputs. In this paper, we introduce a taxonomy that categorizes different sources of uncertainty inherent in these methods. Moreover, we extend NeRF- and GS-based methods with uncertainty estimation techniques, including learning uncertainty outputs and ensembles, and perform an empirical study to assess their ability to capture the sensitivity of the reconstruction. Our study highlights the need for addressing various uncertainty aspects when designing NeRF/GS-based methods for uncertainty-aware 3D reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2409.06407v1", + "pdf_url": "http://arxiv.org/pdf/2409.06407v1", + "published_date": "2024-09-10", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Online 3D reconstruction and dense tracking in endoscopic videos", + "authors": [ + "Michel Hayoz", + "Christopher Hahne", + "Thomas Kurmann", + "Max Allan", + "Guido Beldi", + "Daniel Candinas", + "ablo Márquez-Neila", + "Raphael Sznitman" + ], + "abstract": "3D scene reconstruction from stereo endoscopic video data is crucial for advancing surgical interventions. In this work, we present an online framework for online, dense 3D scene reconstruction and tracking, aimed at enhancing surgical scene understanding and assisting interventions. Our method dynamically extends a canonical scene representation using Gaussian splatting, while modeling tissue deformations through a sparse set of control points. We introduce an efficient online fitting algorithm that optimizes the scene parameters, enabling consistent tracking and accurate reconstruction. Through experiments on the StereoMIS dataset, we demonstrate the effectiveness of our approach, outperforming state-of-the-art tracking methods and achieving comparable performance to offline reconstruction techniques. Our work enables various downstream applications thus contributing to advancing the capabilities of surgical assistance systems.", + "arxiv_url": "http://arxiv.org/abs/2409.06037v1", + "pdf_url": "http://arxiv.org/pdf/2409.06037v1", + "published_date": "2024-09-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GASP: Gaussian Splatting for Physic-Based Simulations", + "authors": [ + "Piotr Borycki", + "Weronika Smolak", + "Joanna Waczyńska", + "Marcin Mazur", + "Sławomir Tadeja", + "Przemysław Spurek" + ], + "abstract": "Physics simulation is paramount for modeling and utilization of 3D scenes in various real-world applications. However, its integration with state-of-the-art 3D scene rendering techniques such as Gaussian Splatting (GS) remains challenging. Existing models use additional meshing mechanisms, including triangle or tetrahedron meshing, marching cubes, or cage meshes. As an alternative, we can modify the physics grounded Newtonian dynamics to align with 3D Gaussian components. Current models take the first-order approximation of a deformation map, which locally approximates the dynamics by linear transformations. In contrast, our Gaussian Splatting for Physics-Based Simulations (GASP) model uses such a map (without any modifications) and flat Gaussian distributions, which are parameterized by three points (mesh faces). Subsequently, each 3D point (mesh face node) is treated as a discrete entity within a 3D space. Consequently, the problem of modeling Gaussian components is reduced to working with 3D points. Additionally, the information on mesh faces can be used to incorporate further properties into the physics model, facilitating the use of triangles. Resulting solution can be integrated into any physics engine that can be treated as a black box. As demonstrated in our studies, the proposed model exhibits superior performance on a diverse range of benchmark datasets designed for 3D object rendering.", + "arxiv_url": "http://arxiv.org/abs/2409.05819v1", + "pdf_url": "http://arxiv.org/pdf/2409.05819v1", + "published_date": "2024-09-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LiDAR-3DGS: LiDAR Reinforced 3D Gaussian Splatting for Multimodal Radiance Field Rendering", + "authors": [ + "Hansol Lim", + "Hanbeom Chang", + "Jongseong Brad Choi", + "Chul Min Yeum" + ], + "abstract": "In this paper, we explore the capabilities of multimodal inputs to 3D Gaussian Splatting (3DGS) based Radiance Field Rendering. We present LiDAR-3DGS, a novel method of reinforcing 3DGS inputs with LiDAR generated point clouds to significantly improve the accuracy and detail of 3D models. We demonstrate a systematic approach of LiDAR reinforcement to 3DGS to enable capturing of important features such as bolts, apertures, and other details that are often missed by image-based features alone. These details are crucial for engineering applications such as remote monitoring and maintenance. Without modifying the underlying 3DGS algorithm, we demonstrate that even a modest addition of LiDAR generated point cloud significantly enhances the perceptual quality of the models. At 30k iterations, the model generated by our method resulted in an increase of 7.064% in PSNR and 0.565% in SSIM, respectively. Since the LiDAR used in this research was a commonly used commercial-grade device, the improvements observed were modest and can be further enhanced with higher-grade LiDAR systems. Additionally, these improvements can be supplementary to other derivative works of Radiance Field Rendering and also provide a new insight for future LiDAR and computer vision integrated modeling.", + "arxiv_url": "http://arxiv.org/abs/2409.16296v1", + "pdf_url": "http://arxiv.org/pdf/2409.16296v1", + "published_date": "2024-09-09", + "categories": [ + "cs.CV", + "cs.GR", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Lagrangian Hashing for Compressed Neural Field Representations", + "authors": [ + "Shrisudhan Govindarajan", + "Zeno Sambugaro", + "Akhmedkhan", + "Shabanov", + "Towaki Takikawa", + "Daniel Rebain", + "Weiwei Sun", + "Nicola Conci", + "Kwang Moo Yi", + "Andrea Tagliasacchi" + ], + "abstract": "We present Lagrangian Hashing, a representation for neural fields combining the characteristics of fast training NeRF methods that rely on Eulerian grids (i.e.~InstantNGP), with those that employ points equipped with features as a way to represent information (e.g. 3D Gaussian Splatting or PointNeRF). We achieve this by incorporating a point-based representation into the high-resolution layers of the hierarchical hash tables of an InstantNGP representation. As our points are equipped with a field of influence, our representation can be interpreted as a mixture of Gaussians stored within the hash table. We propose a loss that encourages the movement of our Gaussians towards regions that require more representation budget to be sufficiently well represented. Our main finding is that our representation allows the reconstruction of signals using a more compact representation without compromising quality.", + "arxiv_url": "http://arxiv.org/abs/2409.05334v1", + "pdf_url": "http://arxiv.org/pdf/2409.05334v1", + "published_date": "2024-09-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping", + "authors": [ + "Zeyu Cai", + "Duotun Wang", + "Yixun Liang", + "Zhijing Shao", + "Ying-Cong Chen", + "Xiaohang Zhan", + "Zeyu Wang" + ], + "abstract": "Score Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings such as over-saturated color and excess smoothness. In this paper, we conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. Following this insight, we introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation. This special design enables the efficient training of variational distribution by skipping the calculations of the Jacobians in the diffusion U-Net. We also introduce timestep-dependent Distribution Coefficient Annealing (DCA) to further improve distilling precision. Leveraging VDM and DCA, we use Gaussian Splatting as the 3D representation and build a text-to-3D generation framework. Extensive experiments and evaluations demonstrate the capability of VDM and DCA to generate high-fidelity and realistic assets with optimization efficiency.", + "arxiv_url": "http://arxiv.org/abs/2409.05099v4", + "pdf_url": "http://arxiv.org/pdf/2409.05099v4", + "published_date": "2024-09-08", + "categories": [ + "cs.CV", + "cs.GR", + "I.4.9; I.3.6" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-PT: Exploiting 3D Gaussian Splatting for Comprehensive Point Cloud Understanding via Self-supervised Learning", + "authors": [ + "Keyi Liu", + "Yeqi Luo", + "Weidong Yang", + "Jingyi Xu", + "Zhijun Li", + "Wen-Ming Chen", + "Ben Fei" + ], + "abstract": "Self-supervised learning of point cloud aims to leverage unlabeled 3D data to learn meaningful representations without reliance on manual annotations. However, current approaches face challenges such as limited data diversity and inadequate augmentation for effective feature learning. To address these challenges, we propose GS-PT, which integrates 3D Gaussian Splatting (3DGS) into point cloud self-supervised learning for the first time. Our pipeline utilizes transformers as the backbone for self-supervised pre-training and introduces novel contrastive learning tasks through 3DGS. Specifically, the transformers aim to reconstruct the masked point cloud. 3DGS utilizes multi-view rendered images as input to generate enhanced point cloud distributions and novel view images, facilitating data augmentation and cross-modal contrastive learning. Additionally, we incorporate features from depth maps. By optimizing these tasks collectively, our method enriches the tri-modal self-supervised learning process, enabling the model to leverage the correlation across 3D point clouds and 2D images from various modalities. We freeze the encoder after pre-training and test the model's performance on multiple downstream tasks. Experimental results indicate that GS-PT outperforms the off-the-shelf self-supervised learning methods on various downstream tasks including 3D object classification, real-world classifications, and few-shot learning and segmentation.", + "arxiv_url": "http://arxiv.org/abs/2409.04963v1", + "pdf_url": "http://arxiv.org/pdf/2409.04963v1", + "published_date": "2024-09-08", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Fisheye-GS: Lightweight and Extensible Gaussian Splatting Module for Fisheye Cameras", + "authors": [ + "Zimu Liao", + "Siyan Chen", + "Rong Fu", + "Yi Wang", + "Zhongling Su", + "Hao Luo", + "Li Ma", + "Linning Xu", + "Bo Dai", + "Hengjie Li", + "Zhilin Pei", + "Xingcheng Zhang" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has garnered attention for its high fidelity and real-time rendering. However, adapting 3DGS to different camera models, particularly fisheye lenses, poses challenges due to the unique 3D to 2D projection calculation. Additionally, there are inefficiencies in the tile-based splatting, especially for the extreme curvature and wide field of view of fisheye lenses, which are crucial for its broader real-life applications. To tackle these challenges, we introduce Fisheye-GS.This innovative method recalculates the projection transformation and its gradients for fisheye cameras. Our approach can be seamlessly integrated as a module into other efficient 3D rendering methods, emphasizing its extensibility, lightweight nature, and modular design. Since we only modified the projection component, it can also be easily adapted for use with different camera models. Compared to methods that train after undistortion, our approach demonstrates a clear improvement in visual quality.", + "arxiv_url": "http://arxiv.org/abs/2409.04751v2", + "pdf_url": "http://arxiv.org/pdf/2409.04751v2", + "published_date": "2024-09-07", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GST: Precise 3D Human Body from a Single Image with Gaussian Splatting Transformers", + "authors": [ + "Lorenza Prospero", + "Abdullah Hamdi", + "Joao F. Henriques", + "Christian Rupprecht" + ], + "abstract": "Reconstructing realistic 3D human models from monocular images has significant applications in creative industries, human-computer interfaces, and healthcare. We base our work on 3D Gaussian Splatting (3DGS), a scene representation composed of a mixture of Gaussians. Predicting such mixtures for a human from a single input image is challenging, as it is a non-uniform density (with a many-to-one relationship with input pixels) with strict physical constraints. At the same time, it needs to be flexible to accommodate a variety of clothes and poses. Our key observation is that the vertices of standardized human meshes (such as SMPL) can provide an adequate density and approximate initial position for Gaussians. We can then train a transformer model to jointly predict comparatively small adjustments to these positions, as well as the other Gaussians' attributes and the SMPL parameters. We show empirically that this combination (using only multi-view supervision) can achieve fast inference of 3D human models from a single image without test-time optimization, expensive diffusion models, or 3D points supervision. We also show that it can improve 3D pose estimation by better fitting human models that account for clothes and other variations. The code is available on the project website https://abdullahamdi.com/gst/ .", + "arxiv_url": "http://arxiv.org/abs/2409.04196v1", + "pdf_url": "http://arxiv.org/pdf/2409.04196v1", + "published_date": "2024-09-06", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D-GP-LMVIC: Learning-based Multi-View Image Coding with 3D Gaussian Geometric Priors", + "authors": [ + "Yujun Huang", + "Bin Chen", + "Niu Lian", + "Baoyi An", + "Shu-Tao Xia" + ], + "abstract": "Multi-view image compression is vital for 3D-related applications. To effectively model correlations between views, existing methods typically predict disparity between two views on a 2D plane, which works well for small disparities, such as in stereo images, but struggles with larger disparities caused by significant view changes. To address this, we propose a novel approach: learning-based multi-view image coding with 3D Gaussian geometric priors (3D-GP-LMVIC). Our method leverages 3D Gaussian Splatting to derive geometric priors of the 3D scene, enabling more accurate disparity estimation across views within the compression model. Additionally, we introduce a depth map compression model to reduce redundancy in geometric information between views. A multi-view sequence ordering method is also proposed to enhance correlations between adjacent views. Experimental results demonstrate that 3D-GP-LMVIC surpasses both traditional and learning-based methods in performance, while maintaining fast encoding and decoding speed.", + "arxiv_url": "http://arxiv.org/abs/2409.04013v1", + "pdf_url": "http://arxiv.org/pdf/2409.04013v1", + "published_date": "2024-09-06", + "categories": [ + "cs.CV", + "cs.IT", + "cs.MM", + "math.IT" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors", + "authors": [ + "Hanyang Yu", + "Xiaoxiao Long", + "Ping Tan" + ], + "abstract": "We aim to address sparse-view reconstruction of a 3D scene by leveraging priors from large-scale vision models. While recent advancements such as 3D Gaussian Splatting (3DGS) have demonstrated remarkable successes in 3D reconstruction, these methods typically necessitate hundreds of input images that densely capture the underlying scene, making them time-consuming and impractical for real-world applications. However, sparse-view reconstruction is inherently ill-posed and under-constrained, often resulting in inferior and incomplete outcomes. This is due to issues such as failed initialization, overfitting on input images, and a lack of details. To mitigate these challenges, we introduce LM-Gaussian, a method capable of generating high-quality reconstructions from a limited number of images. Specifically, we propose a robust initialization module that leverages stereo priors to aid in the recovery of camera poses and the reliable point clouds. Additionally, a diffusion-based refinement is iteratively applied to incorporate image diffusion priors into the Gaussian optimization process to preserve intricate scene details. Finally, we utilize video diffusion priors to further enhance the rendered images for realistic visual effects. Overall, our approach significantly reduces the data acquisition requirements compared to previous 3DGS methods. We validate the effectiveness of our framework through experiments on various public datasets, demonstrating its potential for high-quality 360-degree scene reconstruction. Visual results are on our website.", + "arxiv_url": "http://arxiv.org/abs/2409.03456v2", + "pdf_url": "http://arxiv.org/pdf/2409.03456v2", + "published_date": "2024-09-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction", + "authors": [ + "Shen Chen", + "Jiale Zhou", + "Lei Li" + ], + "abstract": "3D Gaussian Splatting (3DGS) has emerged as a promising approach for 3D scene representation, offering a reduction in computational overhead compared to Neural Radiance Fields (NeRF). However, 3DGS is susceptible to high-frequency artifacts and demonstrates suboptimal performance under sparse viewpoint conditions, thereby limiting its applicability in robotics and computer vision. To address these limitations, we introduce SVS-GS, a novel framework for Sparse Viewpoint Scene reconstruction that integrates a 3D Gaussian smoothing filter to suppress artifacts. Furthermore, our approach incorporates a Depth Gradient Profile Prior (DGPP) loss with a dynamic depth mask to sharpen edges and 2D diffusion with Score Distillation Sampling (SDS) loss to enhance geometric consistency in novel view synthesis. Experimental evaluations on the MipNeRF-360 and SeaThru-NeRF datasets demonstrate that SVS-GS markedly improves 3D reconstruction from sparse viewpoints, offering a robust and efficient solution for scene understanding in robotics and computer vision applications.", + "arxiv_url": "http://arxiv.org/abs/2409.03213v1", + "pdf_url": "http://arxiv.org/pdf/2409.03213v1", + "published_date": "2024-09-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models", + "authors": [ + "Zhibin Liu", + "Haoye Dong", + "Aviral Chharia", + "Hefeng Wu" + ], + "abstract": "Generating lifelike 3D humans from a single RGB image remains a challenging task in computer vision, as it requires accurate modeling of geometry, high-quality texture, and plausible unseen parts. Existing methods typically use multi-view diffusion models for 3D generation, but they often face inconsistent view issues, which hinder high-quality 3D human generation. To address this, we propose Human-VDM, a novel method for generating 3D human from a single RGB image using Video Diffusion Models. Human-VDM provides temporally consistent views for 3D human generation using Gaussian Splatting. It consists of three modules: a view-consistent human video diffusion module, a video augmentation module, and a Gaussian Splatting module. First, a single image is fed into a human video diffusion module to generate a coherent human video. Next, the video augmentation module applies super-resolution and video interpolation to enhance the textures and geometric smoothness of the generated video. Finally, the 3D Human Gaussian Splatting module learns lifelike humans under the guidance of these high-resolution and view-consistent images. Experiments demonstrate that Human-VDM achieves high-quality 3D human from a single image, outperforming state-of-the-art methods in both generation quality and quantity. Project page: https://human-vdm.github.io/Human-VDM/", + "arxiv_url": "http://arxiv.org/abs/2409.02851v1", + "pdf_url": "http://arxiv.org/pdf/2409.02851v1", + "published_date": "2024-09-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Object Gaussian for Monocular 6D Pose Estimation from Sparse Views", + "authors": [ + "Luqing Luo", + "Shichu Sun", + "Jiangang Yang", + "Linfang Zheng", + "Jinwei Du", + "Jian Liu" + ], + "abstract": "Monocular object pose estimation, as a pivotal task in computer vision and robotics, heavily depends on accurate 2D-3D correspondences, which often demand costly CAD models that may not be readily available. Object 3D reconstruction methods offer an alternative, among which recent advancements in 3D Gaussian Splatting (3DGS) afford a compelling potential. Yet its performance still suffers and tends to overfit with fewer input views. Embracing this challenge, we introduce SGPose, a novel framework for sparse view object pose estimation using Gaussian-based methods. Given as few as ten views, SGPose generates a geometric-aware representation by starting with a random cuboid initialization, eschewing reliance on Structure-from-Motion (SfM) pipeline-derived geometry as required by traditional 3DGS methods. SGPose removes the dependence on CAD models by regressing dense 2D-3D correspondences between images and the reconstructed model from sparse input and random initialization, while the geometric-consistent depth supervision and online synthetic view warping are key to the success. Experiments on typical benchmarks, especially on the Occlusion LM-O dataset, demonstrate that SGPose outperforms existing methods even under sparse view constraints, under-scoring its potential in real-world applications.", + "arxiv_url": "http://arxiv.org/abs/2409.02581v1", + "pdf_url": "http://arxiv.org/pdf/2409.02581v1", + "published_date": "2024-09-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving", + "authors": [ + "Huasong Han", + "Kaixuan Zhou", + "Xiaoxiao Long", + "Yusen Wang", + "Chunxia Xiao" + ], + "abstract": "We propose GGS, a Generalizable Gaussian Splatting method for Autonomous Driving which can achieve realistic rendering under large viewpoint changes. Previous generalizable 3D gaussian splatting methods are limited to rendering novel views that are very close to the original pair of images, which cannot handle large differences in viewpoint. Especially in autonomous driving scenarios, images are typically collected from a single lane. The limited training perspective makes rendering images of a different lane very challenging. To further improve the rendering capability of GGS under large viewpoint changes, we introduces a novel virtual lane generation module into GSS method to enables high-quality lane switching even without a multi-lane dataset. Besides, we design a diffusion loss to supervise the generation of virtual lane image to further address the problem of lack of data in the virtual lanes. Finally, we also propose a depth refinement module to optimize depth estimation in the GSS model. Extensive validation of our method, compared to existing approaches, demonstrates state-of-the-art performance.", + "arxiv_url": "http://arxiv.org/abs/2409.02382v1", + "pdf_url": "http://arxiv.org/pdf/2409.02382v1", + "published_date": "2024-09-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction", + "authors": [ + "Jenny Seidenschwarz", + "Qunjie Zhou", + "Bardienus Duisterhof", + "Deva Ramanan", + "Laura Leal-Taixé" + ], + "abstract": "Reconstructing scenes and tracking motion are two sides of the same coin. Tracking points allow for geometric reconstruction [14], while geometric reconstruction of (dynamic) scenes allows for 3D tracking of points over time [24, 39]. The latter was recently also exploited for 2D point tracking to overcome occlusion ambiguities by lifting tracking directly into 3D [38]. However, above approaches either require offline processing or multi-view camera setups both unrealistic for real-world applications like robot navigation or mixed reality. We target the challenge of online 2D and 3D point tracking from unposed monocular camera input introducing Dynamic Online Monocular Reconstruction (DynOMo). We leverage 3D Gaussian splatting to reconstruct dynamic scenes in an online fashion. Our approach extends 3D Gaussians to capture new content and object motions while estimating camera movements from a single RGB frame. DynOMo stands out by enabling emergence of point trajectories through robust image feature reconstruction and a novel similarity-enhanced regularization term, without requiring any correspondence-level supervision. It sets the first baseline for online point tracking with monocular unposed cameras, achieving performance on par with existing methods. We aim to inspire the community to advance online point tracking and reconstruction, expanding the applicability to diverse real-world scenarios.", + "arxiv_url": "http://arxiv.org/abs/2409.02104v1", + "pdf_url": "http://arxiv.org/pdf/2409.02104v1", + "published_date": "2024-09-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PRoGS: Progressive Rendering of Gaussian Splats", + "authors": [ + "Brent Zoomers", + "Maarten Wijnants", + "Ivan Molenaers", + "Joni Vanherck", + "Jeroen Put", + "Lode Jorissen", + "Nick Michiels" + ], + "abstract": "Over the past year, 3D Gaussian Splatting (3DGS) has received significant attention for its ability to represent 3D scenes in a perceptually accurate manner. However, it can require a substantial amount of storage since each splat's individual data must be stored. While compression techniques offer a potential solution by reducing the memory footprint, they still necessitate retrieving the entire scene before any part of it can be rendered. In this work, we introduce a novel approach for progressively rendering such scenes, aiming to display visible content that closely approximates the final scene as early as possible without loading the entire scene into memory. This approach benefits both on-device rendering applications limited by memory constraints and streaming applications where minimal bandwidth usage is preferred. To achieve this, we approximate the contribution of each Gaussian to the final scene and construct an order of prioritization on their inclusion in the rendering process. Additionally, we demonstrate that our approach can be combined with existing compression methods to progressively render (and stream) 3DGS scenes, optimizing bandwidth usage by focusing on the most important splats within a scene. Overall, our work establishes a foundation for making remotely hosted 3DGS content more quickly accessible to end-users in over-the-top consumption scenarios, with our results showing significant improvements in quality across all metrics compared to existing methods.", + "arxiv_url": "http://arxiv.org/abs/2409.01761v1", + "pdf_url": "http://arxiv.org/pdf/2409.01761v1", + "published_date": "2024-09-03", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianPU: A Hybrid 2D-3D Upsampling Framework for Enhancing Color Point Clouds via 3D Gaussian Splatting", + "authors": [ + "Zixuan Guo", + "Yifan Xie", + "Weijing Xie", + "Peng Huang", + "Fei Ma", + "Fei Richard Yu" + ], + "abstract": "Dense colored point clouds enhance visual perception and are of significant value in various robotic applications. However, existing learning-based point cloud upsampling methods are constrained by computational resources and batch processing strategies, which often require subdividing point clouds into smaller patches, leading to distortions that degrade perceptual quality. To address this challenge, we propose a novel 2D-3D hybrid colored point cloud upsampling framework (GaussianPU) based on 3D Gaussian Splatting (3DGS) for robotic perception. This approach leverages 3DGS to bridge 3D point clouds with their 2D rendered images in robot vision systems. A dual scale rendered image restoration network transforms sparse point cloud renderings into dense representations, which are then input into 3DGS along with precise robot camera poses and interpolated sparse point clouds to reconstruct dense 3D point clouds. We have made a series of enhancements to the vanilla 3DGS, enabling precise control over the number of points and significantly boosting the quality of the upsampled point cloud for robotic scene understanding. Our framework supports processing entire point clouds on a single consumer-grade GPU, such as the NVIDIA GeForce RTX 3090, eliminating the need for segmentation and thus producing high-quality, dense colored point clouds with millions of points for robot navigation and manipulation tasks. Extensive experimental results on generating million-level point cloud data validate the effectiveness of our method, substantially improving the quality of colored point clouds and demonstrating significant potential for applications involving large-scale point clouds in autonomous robotics and human-robot interaction scenarios.", + "arxiv_url": "http://arxiv.org/abs/2409.01581v1", + "pdf_url": "http://arxiv.org/pdf/2409.01581v1", + "published_date": "2024-09-03", + "categories": [ + "cs.RO", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Free-DyGS: Camera-Pose-Free Scene Reconstruction based on Gaussian Splatting for Dynamic Surgical Videos", + "authors": [ + "Qian Li", + "Shuojue Yang", + "Daiyun Shen", + "Yueming Jin" + ], + "abstract": "Reconstructing endoscopic videos is crucial for high-fidelity visualization and the efficiency of surgical operations. Despite the importance, existing 3D reconstruction methods encounter several challenges, including stringent demands for accuracy, imprecise camera positioning, intricate dynamic scenes, and the necessity for rapid reconstruction. Addressing these issues, this paper presents the first camera-pose-free scene reconstruction framework, Free-DyGS, tailored for dynamic surgical videos, leveraging 3D Gaussian splatting technology. Our approach employs a frame-by-frame reconstruction strategy and is delineated into four distinct phases: Scene Initialization, Joint Learning, Scene Expansion, and Retrospective Learning. We introduce a Generalizable Gaussians Parameterization module within the Scene Initialization and Expansion phases to proficiently generate Gaussian attributes for each pixel from the RGBD frames. The Joint Learning phase is crafted to concurrently deduce scene deformation and camera pose, facilitated by an innovative flexible deformation module. In the scene expansion stage, the Gaussian points gradually grow as the camera moves. The Retrospective Learning phase is dedicated to enhancing the precision of scene deformation through the reassessment of prior frames. The efficacy of the proposed Free-DyGS is substantiated through experiments on two datasets: the StereoMIS and Hamlyn datasets. The experimental outcomes underscore that Free-DyGS surpasses conventional baseline models in both rendering fidelity and computational efficiency.", + "arxiv_url": "http://arxiv.org/abs/2409.01003v2", + "pdf_url": "http://arxiv.org/pdf/2409.01003v2", + "published_date": "2024-09-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Splatting for Large-scale Surface Reconstruction from Aerial Images", + "authors": [ + "YuanZheng Wu", + "Jin Liu", + "Shunping Ji" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has demonstrated excellent ability in small-scale 3D surface reconstruction. However, extending 3DGS to large-scale scenes remains a significant challenge. To address this gap, we propose a novel 3DGS-based method for large-scale surface reconstruction using aerial multi-view stereo (MVS) images, named Aerial Gaussian Splatting (AGS). First, we introduce a data chunking method tailored for large-scale aerial images, making 3DGS feasible for surface reconstruction over extensive scenes. Second, we integrate the Ray-Gaussian Intersection method into 3DGS to obtain depth and normal information. Finally, we implement multi-view geometric consistency constraints to enhance the geometric consistency across different views. Our experiments on multiple datasets demonstrate, for the first time, the 3DGS-based method can match conventional aerial MVS methods on geometric accuracy in aerial large-scale surface reconstruction, and our method also beats state-of-the-art GS-based methods both on geometry and rendering quality.", + "arxiv_url": "http://arxiv.org/abs/2409.00381v3", + "pdf_url": "http://arxiv.org/pdf/2409.00381v3", + "published_date": "2024-08-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM", + "authors": [ + "Mostafa Mansour", + "Ahmed Abdelsalam", + "Ari Happonen", + "Jari Porras", + "Esa Rahtu" + ], + "abstract": "Recent advancements in monocular neural depth estimation, particularly those achieved by the UniDepth network, have prompted the investigation of integrating UniDepth within a Gaussian splatting framework for monocular SLAM.This study presents UDGS-SLAM, a novel approach that eliminates the necessity of RGB-D sensors for depth estimation within Gaussian splatting framework. UDGS-SLAM employs statistical filtering to ensure local consistency of the estimated depth and jointly optimizes camera trajectory and Gaussian scene representation parameters. The proposed method achieves high-fidelity rendered images and low ATERMSE of the camera trajectory. The performance of UDGS-SLAM is rigorously evaluated using the TUM RGB-D dataset and benchmarked against several baseline methods, demonstrating superior performance across various scenarios. Additionally, an ablation study is conducted to validate design choices and investigate the impact of different network backbone encoders on system performance.", + "arxiv_url": "http://arxiv.org/abs/2409.00362v1", + "pdf_url": "http://arxiv.org/pdf/2409.00362v1", + "published_date": "2024-08-31", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping", + "authors": [ + "Meng Wang", + "Junyi Wang", + "Changqun Xia", + "Chen Wang", + "Yue Qi" + ], + "abstract": "3D Gaussian splatting (3DGS) has recently demonstrated promising advancements in RGB-D online dense mapping. Nevertheless, existing methods excessively rely on per-pixel depth cues to perform map densification, which leads to significant redundancy and increased sensitivity to depth noise. Additionally, explicitly storing 3D Gaussian parameters of room-scale scene poses a significant storage challenge. In this paper, we introduce OG-Mapping, which leverages the robust scene structural representation capability of sparse octrees, combined with structured 3D Gaussian representations, to achieve efficient and robust online dense mapping. Moreover, OG-Mapping employs an anchor-based progressive map refinement strategy to recover the scene structures at multiple levels of detail. Instead of maintaining a small number of active keyframes with a fixed keyframe window as previous approaches do, a dynamic keyframe window is employed to allow OG-Mapping to better tackle false local minima and forgetting issues. Experimental results demonstrate that OG-Mapping delivers more robust and superior realism mapping results than existing Gaussian-based RGB-D online mapping methods with a compact model, and no additional post-processing is required.", + "arxiv_url": "http://arxiv.org/abs/2408.17223v1", + "pdf_url": "http://arxiv.org/pdf/2408.17223v1", + "published_date": "2024-08-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction", + "authors": [ + "Ruihan Yu", + "Tianyu Huang", + "Jingwang Ling", + "Feng Xu" + ], + "abstract": "2D Gaussian Splatting has recently emerged as a significant method in 3D reconstruction, enabling novel view synthesis and geometry reconstruction simultaneously. While the well-known Gaussian kernel is broadly used, its lack of anisotropy and deformation ability leads to dim and vague edges at object silhouettes, limiting the reconstruction quality of current Gaussian splatting methods. To enhance the representation power, we draw inspiration from quantum physics and propose to use the Gaussian-Hermite kernel as the new primitive in Gaussian splatting. The new kernel takes a unified mathematical form and extends the Gaussian function, which serves as the zero-rank term in the updated formulation. Our experiments demonstrate the extraordinary performance of Gaussian-Hermite kernel in both geometry reconstruction and novel-view synthesis tasks. The proposed kernel outperforms traditional Gaussian Splatting kernels, showcasing its potential for high-quality 3D reconstruction and rendering.", + "arxiv_url": "http://arxiv.org/abs/2408.16982v1", + "pdf_url": "http://arxiv.org/pdf/2408.16982v1", + "published_date": "2024-08-30", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model", + "authors": [ + "Fangfu Liu", + "Wenqiang Sun", + "Hanyang Wang", + "Yikai Wang", + "Haowen Sun", + "Junliang Ye", + "Jun Zhang", + "Yueqi Duan" + ], + "abstract": "Advancements in 3D scene reconstruction have transformed 2D images from the real world into 3D models, producing realistic 3D results from hundreds of input photos. Despite great success in dense-view reconstruction scenarios, rendering a detailed scene from insufficient captured views is still an ill-posed optimization problem, often resulting in artifacts and distortions in unseen areas. In this paper, we propose ReconX, a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task. The key insight is to unleash the strong generative prior of large pre-trained video diffusion models for sparse-view reconstruction. However, 3D view consistency struggles to be accurately preserved in directly generated video frames from pre-trained models. To address this, given limited input views, the proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition. Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency, ensuring the coherence of the scene from various perspectives. Finally, we recover the 3D scene from the generated video through a confidence-aware 3D Gaussian Splatting optimization scheme. Extensive experiments on various real-world datasets show the superiority of our ReconX over state-of-the-art methods in terms of quality and generalizability.", + "arxiv_url": "http://arxiv.org/abs/2408.16767v2", + "pdf_url": "http://arxiv.org/pdf/2408.16767v2", + "published_date": "2024-08-29", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OmniRe: Omni Urban Scene Reconstruction", + "authors": [ + "Ziyu Chen", + "Jiawei Yang", + "Jiahui Huang", + "Riccardo de Lutio", + "Janick Martinez Esturo", + "Boris Ivanovic", + "Or Litany", + "Zan Gojcic", + "Sanja Fidler", + "Marco Pavone", + "Li Song", + "Yue Wang" + ], + "abstract": "We introduce OmniRe, a holistic approach for efficiently reconstructing high-fidelity dynamic urban scenes from on-device logs. Recent methods for modeling driving sequences using neural radiance fields or Gaussian Splatting have demonstrated the potential of reconstructing challenging dynamic scenes, but often overlook pedestrians and other non-vehicle dynamic actors, hindering a complete pipeline for dynamic urban scene reconstruction. To that end, we propose a comprehensive 3DGS framework for driving scenes, named OmniRe, that allows for accurate, full-length reconstruction of diverse dynamic objects in a driving log. OmniRe builds dynamic neural scene graphs based on Gaussian representations and constructs multiple local canonical spaces that model various dynamic actors, including vehicles, pedestrians, and cyclists, among many others. This capability is unmatched by existing methods. OmniRe allows us to holistically reconstruct different objects present in the scene, subsequently enabling the simulation of reconstructed scenarios with all actors participating in real-time (~60Hz). Extensive evaluations on the Waymo dataset show that our approach outperforms prior state-of-the-art methods quantitatively and qualitatively by a large margin. We believe our work fills a critical gap in driving reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2408.16760v1", + "pdf_url": "http://arxiv.org/pdf/2408.16760v1", + "published_date": "2024-08-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Generic Objects as Pose Probes for Few-Shot View Synthesis", + "authors": [ + "Zhirui Gao", + "Renjiao Yi", + "Chenyang Zhu", + "Ke Zhuang", + "Wei Chen", + "Kai Xu" + ], + "abstract": "Radiance fields including NeRFs and 3D Gaussians demonstrate great potential in high-fidelity rendering and scene reconstruction, while they require a substantial number of posed images as inputs. COLMAP is frequently employed for preprocessing to estimate poses, while it necessitates a large number of feature matches to operate effectively, and it struggles with scenes characterized by sparse features, large baselines between images, or a limited number of input images. We aim to tackle few-view NeRF reconstruction using only 3 to 6 unposed scene images. Traditional methods often use calibration boards but they are not common in images. We propose a novel idea of utilizing everyday objects, commonly found in both images and real life, as \"pose probes\". The probe object is automatically segmented by SAM, whose shape is initialized from a cube. We apply a dual-branch volume rendering optimization (object NeRF and scene NeRF) to constrain the pose optimization and jointly refine the geometry. Specifically, object poses of two views are first estimated by PnP matching in an SDF representation, which serves as initial poses. PnP matching, requiring only a few features, is suitable for feature-sparse scenes. Additional views are incrementally incorporated to refine poses from preceding views. In experiments, PoseProbe achieves state-of-the-art performance in both pose estimation and novel view synthesis across multiple datasets. We demonstrate its effectiveness, particularly in few-view and large-baseline scenes where COLMAP struggles. In ablations, using different objects in a scene yields comparable performance. Our project page is available at: \\href{https://zhirui-gao.github.io/PoseProbe.github.io/}{this https URL}", + "arxiv_url": "http://arxiv.org/abs/2408.16690v2", + "pdf_url": "http://arxiv.org/pdf/2408.16690v2", + "published_date": "2024-08-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Towards Realistic Example-based Modeling via 3D Gaussian Stitching", + "authors": [ + "Xinyu Gao", + "Ziyi Yang", + "Bingchen Gong", + "Xiaoguang Han", + "Sipeng Yang", + "Xiaogang Jin" + ], + "abstract": "Using parts of existing models to rebuild new models, commonly termed as example-based modeling, is a classical methodology in the realm of computer graphics. Previous works mostly focus on shape composition, making them very hard to use for realistic composition of 3D objects captured from real-world scenes. This leads to combining multiple NeRFs into a single 3D scene to achieve seamless appearance blending. However, the current SeamlessNeRF method struggles to achieve interactive editing and harmonious stitching for real-world scenes due to its gradient-based strategy and grid-based representation. To this end, we present an example-based modeling method that combines multiple Gaussian fields in a point-based representation using sample-guided synthesis. Specifically, as for composition, we create a GUI to segment and transform multiple fields in real time, easily obtaining a semantically meaningful composition of models represented by 3D Gaussian Splatting (3DGS). For texture blending, due to the discrete and irregular nature of 3DGS, straightforwardly applying gradient propagation as SeamlssNeRF is not supported. Thus, a novel sampling-based cloning method is proposed to harmonize the blending while preserving the original rich texture and content. Our workflow consists of three steps: 1) real-time segmentation and transformation of a Gaussian model using a well-tailored GUI, 2) KNN analysis to identify boundary points in the intersecting area between the source and target models, and 3) two-phase optimization of the target model using sampling-based cloning and gradient constraints. Extensive experimental results validate that our approach significantly outperforms previous works in terms of realistic synthesis, demonstrating its practicality. More demos are available at https://ingra14m.github.io/gs_stitching_website.", + "arxiv_url": "http://arxiv.org/abs/2408.15708v1", + "pdf_url": "http://arxiv.org/pdf/2408.15708v1", + "published_date": "2024-08-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "G-Style: Stylized Gaussian Splatting", + "authors": [ + "Áron Samuel Kovács", + "Pedro Hermosilla", + "Renata G. Raidou" + ], + "abstract": "We introduce G-Style, a novel algorithm designed to transfer the style of an image onto a 3D scene represented using Gaussian Splatting. Gaussian Splatting is a powerful 3D representation for novel view synthesis, as -- compared to other approaches based on Neural Radiance Fields -- it provides fast scene renderings and user control over the scene. Recent pre-prints have demonstrated that the style of Gaussian Splatting scenes can be modified using an image exemplar. However, since the scene geometry remains fixed during the stylization process, current solutions fall short of producing satisfactory results. Our algorithm aims to address these limitations by following a three-step process: In a pre-processing step, we remove undesirable Gaussians with large projection areas or highly elongated shapes. Subsequently, we combine several losses carefully designed to preserve different scales of the style in the image, while maintaining as much as possible the integrity of the original scene content. During the stylization process and following the original design of Gaussian Splatting, we split Gaussians where additional detail is necessary within our scene by tracking the gradient of the stylized color. Our experiments demonstrate that G-Style generates high-quality stylizations within just a few minutes, outperforming existing methods both qualitatively and quantitatively.", + "arxiv_url": "http://arxiv.org/abs/2408.15695v2", + "pdf_url": "http://arxiv.org/pdf/2408.15695v2", + "published_date": "2024-08-28", + "categories": [ + "cs.GR", + "cs.AI", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty", + "authors": [ + "Saining Zhang", + "Baijun Ye", + "Xiaoxue Chen", + "Yuantao Chen", + "Zongzheng Zhang", + "Cheng Peng", + "Yongliang Shi", + "Hao Zhao" + ], + "abstract": "Robust and realistic rendering for large-scale road scenes is essential in autonomous driving simulation. Recently, 3D Gaussian Splatting (3D-GS) has made groundbreaking progress in neural rendering, but the general fidelity of large-scale road scene renderings is often limited by the input imagery, which usually has a narrow field of view and focuses mainly on the street-level local area. Intuitively, the data from the drone's perspective can provide a complementary viewpoint for the data from the ground vehicle's perspective, enhancing the completeness of scene reconstruction and rendering. However, training naively with aerial and ground images, which exhibit large view disparity, poses a significant convergence challenge for 3D-GS, and does not demonstrate remarkable improvements in performance on road views. In order to enhance the novel view synthesis of road views and to effectively use the aerial information, we design an uncertainty-aware training method that allows aerial images to assist in the synthesis of areas where ground images have poor learning outcomes instead of weighting all pixels equally in 3D-GS training like prior work did. We are the first to introduce the cross-view uncertainty to 3D-GS by matching the car-view ensemble-based rendering uncertainty to aerial images, weighting the contribution of each pixel to the training process. Additionally, to systematically quantify evaluation metrics, we assemble a high-quality synthesized dataset comprising both aerial and ground images for road scenes.", + "arxiv_url": "http://arxiv.org/abs/2408.15242v1", + "pdf_url": "http://arxiv.org/pdf/2408.15242v1", + "published_date": "2024-08-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Learning-based Multi-View Stereo: A Survey", + "authors": [ + "Fangjinhua Wang", + "Qingtian Zhu", + "Di Chang", + "Quankai Gao", + "Junlin Han", + "Tong Zhang", + "Richard Hartley", + "Marc Pollefeys" + ], + "abstract": "3D reconstruction aims to recover the dense 3D structure of a scene. It plays an essential role in various applications such as Augmented/Virtual Reality (AR/VR), autonomous driving and robotics. Leveraging multiple views of a scene captured from different viewpoints, Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environments. Due to its efficiency and effectiveness, MVS has become a pivotal method for image-based 3D reconstruction. Recently, with the success of deep learning, many learning-based MVS methods have been proposed, achieving impressive performance against traditional methods. We categorize these learning-based methods as: depth map-based, voxel-based, NeRF-based, 3D Gaussian Splatting-based, and large feed-forward methods. Among these, we focus significantly on depth map-based methods, which are the main family of MVS due to their conciseness, flexibility and scalability. In this survey, we provide a comprehensive review of the literature at the time of this writing. We investigate these learning-based methods, summarize their performances on popular benchmarks, and discuss promising future research directions in this area.", + "arxiv_url": "http://arxiv.org/abs/2408.15235v1", + "pdf_url": "http://arxiv.org/pdf/2408.15235v1", + "published_date": "2024-08-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Robo-GS: A Physics Consistent Spatial-Temporal Model for Robotic Arm with Hybrid Representation", + "authors": [ + "Haozhe Lou", + "Yurong Liu", + "Yike Pan", + "Yiran Geng", + "Jianteng Chen", + "Wenlong Ma", + "Chenglong Li", + "Lin Wang", + "Hengzhen Feng", + "Lu Shi", + "Liyi Luo", + "Yongliang Shi" + ], + "abstract": "Real2Sim2Real plays a critical role in robotic arm control and reinforcement learning, yet bridging this gap remains a significant challenge due to the complex physical properties of robots and the objects they manipulate. Existing methods lack a comprehensive solution to accurately reconstruct real-world objects with spatial representations and their associated physics attributes. We propose a Real2Sim pipeline with a hybrid representation model that integrates mesh geometry, 3D Gaussian kernels, and physics attributes to enhance the digital asset representation of robotic arms. This hybrid representation is implemented through a Gaussian-Mesh-Pixel binding technique, which establishes an isomorphic mapping between mesh vertices and Gaussian models. This enables a fully differentiable rendering pipeline that can be optimized through numerical solvers, achieves high-fidelity rendering via Gaussian Splatting, and facilitates physically plausible simulation of the robotic arm's interaction with its environment using mesh-based methods. The code,full presentation and datasets will be made publicly available at our website https://robostudioapp.com", + "arxiv_url": "http://arxiv.org/abs/2408.14873v2", + "pdf_url": "http://arxiv.org/pdf/2408.14873v2", + "published_date": "2024-08-27", + "categories": [ + "cs.RO", + "cs.NA", + "math.NA", + "math.OC" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming", + "authors": [ + "Yuang Shi", + "Simone Gasparini", + "Géraldine Morin", + "Wei Tsang Ooi" + ], + "abstract": "The rise of Extended Reality (XR) requires efficient streaming of 3D online worlds, challenging current 3DGS representations to adapt to bandwidth-constrained environments. This paper proposes LapisGS, a layered 3DGS that supports adaptive streaming and progressive rendering. Our method constructs a layered structure for cumulative representation, incorporates dynamic opacity optimization to maintain visual fidelity, and utilizes occupancy maps to efficiently manage Gaussian splats. This proposed model offers a progressive representation supporting a continuous rendering quality adapted for bandwidth-aware streaming. Extensive experiments validate the effectiveness of our approach in balancing visual fidelity with the compactness of the model, with up to 50.71% improvement in SSIM, 286.53% improvement in LPIPS, and 318.41% reduction in model size, and shows its potential for bandwidth-adapted 3D streaming and rendering applications.", + "arxiv_url": "http://arxiv.org/abs/2408.14823v1", + "pdf_url": "http://arxiv.org/pdf/2408.14823v1", + "published_date": "2024-08-27", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Avatar Concept Slider: Manipulate Concepts In Your Human Avatar With Fine-grained Control", + "authors": [ + "Yixuan He", + "Lin Geng Foo", + "Ajmal Saeed Mian", + "Hossein Rahmani", + "Jun Liu" + ], + "abstract": "Language based editing of 3D human avatars to precisely match user requirements is challenging due to the inherent ambiguity and limited expressiveness of natural language. To overcome this, we propose the Avatar Concept Slider (ACS), a 3D avatar editing method that allows precise manipulation of semantic concepts in human avatars towards a specified intermediate point between two extremes of concepts, akin to moving a knob along a slider track. To achieve this, our ACS has three designs. 1) A Concept Sliding Loss based on Linear Discriminant Analysis to pinpoint the concept-specific axis for precise editing. 2) An Attribute Preserving Loss based on Principal Component Analysis for improved preservation of avatar identity during editing. 3) A 3D Gaussian Splatting primitive selection mechanism based on concept-sensitivity, which updates only the primitives that are the most sensitive to our target concept, to improve efficiency. Results demonstrate that our ACS enables fine-grained 3D avatar editing with efficient feedback, without harming the avatar quality or compromising the avatar's identifying attributes.", + "arxiv_url": "http://arxiv.org/abs/2408.13995v2", + "pdf_url": "http://arxiv.org/pdf/2408.13995v2", + "published_date": "2024-08-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DynaSurfGS: Dynamic Surface Reconstruction with Planar-based Gaussian Splatting", + "authors": [ + "Weiwei Cai", + "Weicai Ye", + "Peng Ye", + "Tong He", + "Tao Chen" + ], + "abstract": "Dynamic scene reconstruction has garnered significant attention in recent years due to its capabilities in high-quality and real-time rendering. Among various methodologies, constructing a 4D spatial-temporal representation, such as 4D-GS, has gained popularity for its high-quality rendered images. However, these methods often produce suboptimal surfaces, as the discrete 3D Gaussian point clouds fail to align with the object's surface precisely. To address this problem, we propose DynaSurfGS to achieve both photorealistic rendering and high-fidelity surface reconstruction of dynamic scenarios. Specifically, the DynaSurfGS framework first incorporates Gaussian features from 4D neural voxels with the planar-based Gaussian Splatting to facilitate precise surface reconstruction. It leverages normal regularization to enforce the smoothness of the surface of dynamic objects. It also incorporates the as-rigid-as-possible (ARAP) constraint to maintain the approximate rigidity of local neighborhoods of 3D Gaussians between timesteps and ensure that adjacent 3D Gaussians remain closely aligned throughout. Extensive experiments demonstrate that DynaSurfGS surpasses state-of-the-art methods in both high-fidelity surface reconstruction and photorealistic rendering.", + "arxiv_url": "http://arxiv.org/abs/2408.13972v1", + "pdf_url": "http://arxiv.org/pdf/2408.13972v1", + "published_date": "2024-08-26", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs", + "authors": [ + "Brandon Smart", + "Chuanxia Zheng", + "Iro Laina", + "Victor Adrian Prisacariu" + ], + "abstract": "In this paper, we introduce Splatt3R, a pose-free, feed-forward method for in-the-wild 3D reconstruction and novel view synthesis from stereo pairs. Given uncalibrated natural images, Splatt3R can predict 3D Gaussian Splats without requiring any camera parameters or depth information. For generalizability, we build Splatt3R upon a ``foundation'' 3D geometry reconstruction method, MASt3R, by extending it to deal with both 3D structure and appearance. Specifically, unlike the original MASt3R which reconstructs only 3D point clouds, we predict the additional Gaussian attributes required to construct a Gaussian primitive for each point. Hence, unlike other novel view synthesis methods, Splatt3R is first trained by optimizing the 3D point cloud's geometry loss, and then a novel view synthesis objective. By doing this, we avoid the local minima present in training 3D Gaussian Splats from stereo views. We also propose a novel loss masking strategy that we empirically find is critical for strong performance on extrapolated viewpoints. We train Splatt3R on the ScanNet++ dataset and demonstrate excellent generalisation to uncalibrated, in-the-wild images. Splatt3R can reconstruct scenes at 4FPS at 512 x 512 resolution, and the resultant splats can be rendered in real-time.", + "arxiv_url": "http://arxiv.org/abs/2408.13912v2", + "pdf_url": "http://arxiv.org/pdf/2408.13912v2", + "published_date": "2024-08-25", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DESI Peculiar Velocity Survey -- Fundamental Plane", + "authors": [ + "Khaled Said", + "Cullan Howlett", + "Tamara Davis", + "John Lucey", + "Christoph Saulder", + "Kelly Douglass", + "Alex G. Kim", + "Anthony Kremin", + "Caitlin Ross", + "Greg Aldering", + "Jessica Nicole Aguilar", + "Steven Ahlen", + "Segev BenZvi", + "Davide Bianchi", + "David Brooks", + "Todd Claybaugh", + "Kyle Dawson", + "Axel de la Macorra", + "Biprateep Dey", + "Peter Doel", + "Kevin Fanning", + "Simone Ferraro", + "Andreu Font-Ribera", + "Jaime E. Forero-Romero", + "Enrique Gaztañaga", + "Satya Gontcho A Gontcho", + "Julien Guy", + "Klaus Honscheid", + "Robert Kehoe", + "Theodore Kisner", + "Andrew Lambert", + "Martin Landriau", + "Laurent Le Guillou", + "Marc Manera", + "Aaron Meisner", + "Ramon Miquel", + "John Moustakas", + "Andrea Muñoz-Gutiérrez", + "Adam Myers", + "Jundan Nie", + "Nathalie Palanque-Delabrouille", + "Will Percival", + "Francisco Prada", + "Graziano Rossi", + "Eusebio Sanchez", + "David Schlegel", + "Michael Schubnell", + "Joseph Harry Silber", + "David Sprayberry", + "Gregory Tarlé", + "Mariana Vargas Magana", + "Benjamin Alan Weaver", + "Risa Wechsler", + "Zhimin Zhou", + "Hu Zou" + ], + "abstract": "The Dark Energy Spectroscopic Instrument (DESI) Peculiar Velocity Survey aims to measure the peculiar velocities of early and late type galaxies within the DESI footprint using both the Fundamental Plane and Tully-Fisher relations. Direct measurements of peculiar velocities can significantly improve constraints on the growth rate of structure, reducing uncertainty by a factor of approximately 2.5 at redshift 0.1 compared to the DESI Bright Galaxy Survey's redshift space distortion measurements alone. We assess the quality of stellar velocity dispersion measurements from DESI spectroscopic data. These measurements, along with photometric data from the Legacy Survey, establish the Fundamental Plane relation and determine distances and peculiar velocities of early-type galaxies. During Survey Validation, we obtain spectra for 6698 unique early-type galaxies, up to a photometric redshift of 0.15. 64\\% of observed galaxies (4267) have relative velocity dispersion errors below 10\\%. This percentage increases to 75\\% if we restrict our sample to galaxies with spectroscopic redshifts below 0.1. We use the measured central velocity dispersion, along with photometry from the DESI Legacy Imaging Surveys, to fit the Fundamental Plane parameters using a 3D Gaussian maximum likelihood algorithm that accounts for measurement uncertainties and selection cuts. In addition, we conduct zero-point calibration using the absolute distance measurements to the Coma cluster, leading to a value of the Hubble constant, $H_0 = 76.05 \\pm 0.35$(statistical) $\\pm 0.49$(systematic FP) $\\pm 4.86$(statistical due to calibration) $\\mathrm{km \\ s^{-1} Mpc^{-1}}$. This $H_0$ value is within $2\\sigma$ of Planck Cosmic Microwave Background results and within $1\\sigma$, of other low redshift distance indicator-based measurements.", + "arxiv_url": "http://arxiv.org/abs/2408.13842v1", + "pdf_url": "http://arxiv.org/pdf/2408.13842v1", + "published_date": "2024-08-25", + "categories": [ + "astro-ph.CO", + "astro-ph.GA" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers", + "authors": [ + "Chuanrui Zhang", + "Yingshuang Zou", + "Zhuoling Li", + "Minmin Yi", + "Haoqian Wang" + ], + "abstract": "Compared with previous 3D reconstruction methods like Nerf, recent Generalizable 3D Gaussian Splatting (G-3DGS) methods demonstrate impressive efficiency even in the sparse-view setting. However, the promising reconstruction performance of existing G-3DGS methods relies heavily on accurate multi-view feature matching, which is quite challenging. Especially for the scenes that have many non-overlapping areas between various views and contain numerous similar regions, the matching performance of existing methods is poor and the reconstruction precision is limited. To address this problem, we develop a strategy that utilizes a predicted depth confidence map to guide accurate local feature matching. In addition, we propose to utilize the knowledge of existing monocular depth estimation models as prior to boost the depth estimation precision in non-overlapping areas between views. Combining the proposed strategies, we present a novel G-3DGS method named TranSplat, which obtains the best performance on both the RealEstate10K and ACID benchmarks while maintaining competitive speed and presenting strong cross-dataset generalization ability. Our code, and demos will be available at: https://xingyoujun.github.io/transplat.", + "arxiv_url": "http://arxiv.org/abs/2408.13770v1", + "pdf_url": "http://arxiv.org/pdf/2408.13770v1", + "published_date": "2024-08-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting", + "authors": [ + "Wenrui Li", + "Fucheng Cai", + "Yapeng Mi", + "Zhe Yang", + "Wangmeng Zuo", + "Xingtao Wang", + "Xiaopeng Fan" + ], + "abstract": "Text-driven 3D scene generation has seen significant advancements recently. However, most existing methods generate single-view images using generative models and then stitch them together in 3D space. This independent generation for each view often results in spatial inconsistency and implausibility in the 3D scenes. To address this challenge, we proposed a novel text-driven 3D-consistent scene generation model: SceneDreamer360. Our proposed method leverages a text-driven panoramic image generation model as a prior for 3D scene generation and employs 3D Gaussian Splatting (3DGS) to ensure consistency across multi-view panoramic images. Specifically, SceneDreamer360 enhances the fine-tuned Panfusion generator with a three-stage panoramic enhancement, enabling the generation of high-resolution, detail-rich panoramic images. During the 3D scene construction, a novel point cloud fusion initialization method is used, producing higher quality and spatially consistent point clouds. Our extensive experiments demonstrate that compared to other methods, SceneDreamer360 with its panoramic image generation and 3DGS can produce higher quality, spatially consistent, and visually appealing 3D scenes from any text prompt. Our codes are available at \\url{https://github.com/liwrui/SceneDreamer360}.", + "arxiv_url": "http://arxiv.org/abs/2408.13711v2", + "pdf_url": "http://arxiv.org/pdf/2408.13711v2", + "published_date": "2024-08-25", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "https://github.com/liwrui/SceneDreamer360", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting", + "authors": [ + "Zhenyuan Liu", + "Yu Guo", + "Xinyuan Li", + "Bernd Bickel", + "Ran Zhang" + ], + "abstract": "We present Bidirectional Gaussian Primitives, an image-based novel view synthesis technique designed to represent and render 3D objects with surface and volumetric materials under dynamic illumination. Our approach integrates light intrinsic decomposition into the Gaussian splatting framework, enabling real-time relighting of 3D objects. To unify surface and volumetric material within a cohesive appearance model, we adopt a light- and view-dependent scattering representation via bidirectional spherical harmonics. Our model does not use a specific surface normal-related reflectance function, making it more compatible with volumetric representations like Gaussian splatting, where the normals are undefined. We demonstrate our method by reconstructing and rendering objects with complex materials. Using One-Light-At-a-Time (OLAT) data as input, we can reproduce photorealistic appearances under novel lighting conditions in real time.", + "arxiv_url": "http://arxiv.org/abs/2408.13370v1", + "pdf_url": "http://arxiv.org/pdf/2408.13370v1", + "published_date": "2024-08-23", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation", + "authors": [ + "Shuai Yang", + "Jing Tan", + "Mengchen Zhang", + "Tong Wu", + "Yixuan Li", + "Gordon Wetzstein", + "Ziwei Liu", + "Dahua Lin" + ], + "abstract": "3D immersive scene generation is a challenging yet critical task in computer vision and graphics. A desired virtual 3D scene should 1) exhibit omnidirectional view consistency, and 2) allow for free exploration in complex scene hierarchies. Existing methods either rely on successive scene expansion via inpainting or employ panorama representation to represent large FOV scene environments. However, the generated scene suffers from semantic drift during expansion and is unable to handle occlusion among scene hierarchies. To tackle these challenges, we introduce LayerPano3D, a novel framework for full-view, explorable panoramic 3D scene generation from a single text prompt. Our key insight is to decompose a reference 2D panorama into multiple layers at different depth levels, where each layer reveals the unseen space from the reference views via diffusion prior. LayerPano3D comprises multiple dedicated designs: 1) we introduce a novel text-guided anchor view synthesis pipeline for high-quality, consistent panorama generation. 2) We pioneer the Layered 3D Panorama as underlying representation to manage complex scene hierarchies and lift it into 3D Gaussians to splat detailed 360-degree omnidirectional scenes with unconstrained viewing paths. Extensive experiments demonstrate that our framework generates state-of-the-art 3D panoramic scene in both full view consistency and immersive exploratory experience. We believe that LayerPano3D holds promise for advancing 3D panoramic scene creation with numerous applications.", + "arxiv_url": "http://arxiv.org/abs/2408.13252v1", + "pdf_url": "http://arxiv.org/pdf/2408.13252v1", + "published_date": "2024-08-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting", + "authors": [ + "Zhiru Wang", + "Shiyun Xie", + "Chengwei Pan", + "Guoping Wang" + ], + "abstract": "Recently, the 3D Gaussian Splatting (3D-GS) method has achieved great success in novel view synthesis, providing real-time rendering while ensuring high-quality rendering results. However, this method faces challenges in modeling specular reflections and handling anisotropic appearance components, especially in dealing with view-dependent color under complex lighting conditions. Additionally, 3D-GS uses spherical harmonic to learn the color representation, which has limited ability to represent complex scenes. To overcome these challenges, we introduce Lantent-SpecGS, an approach that utilizes a universal latent neural descriptor within each 3D Gaussian. This enables a more effective representation of 3D feature fields, including appearance and geometry. Moreover, two parallel CNNs are designed to decoder the splatting feature maps into diffuse color and specular color separately. A mask that depends on the viewpoint is learned to merge these two colors, resulting in the final rendered image. Experimental results demonstrate that our method obtains competitive performance in novel view synthesis and extends the ability of 3D-GS to handle intricate scenarios with specular reflections.", + "arxiv_url": "http://arxiv.org/abs/2409.05868v1", + "pdf_url": "http://arxiv.org/pdf/2409.05868v1", + "published_date": "2024-08-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Atlas Gaussians Diffusion for 3D Generation", + "authors": [ + "Haitao Yang", + "Yuan Dong", + "Hanwen Jiang", + "Dejia Xu", + "Georgios Pavlakos", + "Qixing Huang" + ], + "abstract": "Using the latent diffusion model has proven effective in developing novel 3D generation techniques. To harness the latent diffusion model, a key challenge is designing a high-fidelity and efficient representation that links the latent space and the 3D space. In this paper, we introduce Atlas Gaussians, a novel representation for feed-forward native 3D generation. Atlas Gaussians represent a shape as the union of local patches, and each patch can decode 3D Gaussians. We parameterize a patch as a sequence of feature vectors and design a learnable function to decode 3D Gaussians from the feature vectors. In this process, we incorporate UV-based sampling, enabling the generation of a sufficiently large, and theoretically infinite, number of 3D Gaussian points. The large amount of 3D Gaussians enables the generation of high-quality details. Moreover, due to local awareness of the representation, the transformer-based decoding procedure operates on a patch level, ensuring efficiency. We train a variational autoencoder to learn the Atlas Gaussians representation, and then apply a latent diffusion model on its latent space for learning 3D Generation. Experiments show that our approach outperforms the prior arts of feed-forward native 3D generation.", + "arxiv_url": "http://arxiv.org/abs/2408.13055v2", + "pdf_url": "http://arxiv.org/pdf/2408.13055v2", + "published_date": "2024-08-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "S4D: Streaming 4D Real-World Reconstruction with Gaussians and 3D Control Points", + "authors": [ + "Bing He", + "Yunuo Chen", + "Guo Lu", + "Qi Wang", + "Qunshan Gu", + "Rong Xie", + "Li Song", + "Wenjun Zhang" + ], + "abstract": "Dynamic scene reconstruction using Gaussians has recently attracted increased interest. Mainstream approaches typically employ a global deformation field to warp a 3D scene in canonical space. However, the inherent low-frequency nature of implicit neural fields often leads to ineffective representations of complex motions. Moreover, their structural rigidity can hinder adaptation to scenes with varying resolutions and durations. To address these challenges, we introduce a novel approach for streaming 4D real-world reconstruction utilizing discrete 3D control points. This method physically models local rays and establishes a motion-decoupling coordinate system. By effectively merging traditional graphics with learnable pipelines, it provides a robust and efficient local 6-degrees-of-freedom (6-DoF) motion representation. Additionally, we have developed a generalized framework that integrates our control points with Gaussians. Starting from an initial 3D reconstruction, our workflow decomposes the streaming 4D reconstruction into four independent submodules: 3D segmentation, 3D control point generation, object-wise motion manipulation, and residual compensation. Experimental results demonstrate that our method outperforms existing state-of-the-art 4D Gaussian splatting techniques on both the Neu3DV and CMU-Panoptic datasets. Notably, the optimization of our 3D control points is achievable in 100 iterations and within just 2 seconds per frame on a single NVIDIA 4070 GPU.", + "arxiv_url": "http://arxiv.org/abs/2408.13036v2", + "pdf_url": "http://arxiv.org/pdf/2408.13036v2", + "published_date": "2024-08-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FLoD: Integrating Flexible Level of Detail into 3D Gaussian Splatting for Customizable Rendering", + "authors": [ + "Yunji Seo", + "Young Sun Choi", + "Hyun Seung Son", + "Youngjung Uh" + ], + "abstract": "3D Gaussian Splatting (3DGS) achieves fast and high-quality renderings by using numerous small Gaussians, which leads to significant memory consumption. This reliance on a large number of Gaussians restricts the application of 3DGS-based models on low-cost devices due to memory limitations. However, simply reducing the number of Gaussians to accommodate devices with less memory capacity leads to inferior quality compared to the quality that can be achieved on high-end hardware. To address this lack of scalability, we propose integrating a Flexible Level of Detail (FLoD) to 3DGS, to allow a scene to be rendered at varying levels of detail according to hardware capabilities. While existing 3DGSs with LoD focus on detailed reconstruction, our method provides reconstructions using a small number of Gaussians for reduced memory requirements, and a larger number of Gaussians for greater detail. Experiments demonstrate our various rendering options with tradeoffs between rendering quality and memory usage, thereby allowing real-time rendering across different memory constraints. Furthermore, we show that our method generalizes to different 3DGS frameworks, indicating its potential for integration into future state-of-the-art developments. Project page: https://3dgs-flod.github.io/flod.github.io/", + "arxiv_url": "http://arxiv.org/abs/2408.12894v1", + "pdf_url": "http://arxiv.org/pdf/2408.12894v1", + "published_date": "2024-08-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSFusion: Online RGB-D Mapping Where Gaussian Splatting Meets TSDF Fusion", + "authors": [ + "Jiaxin Wei", + "Stefan Leutenegger" + ], + "abstract": "Traditional volumetric fusion algorithms preserve the spatial structure of 3D scenes, which is beneficial for many tasks in computer vision and robotics. However, they often lack realism in terms of visualization. Emerging 3D Gaussian splatting bridges this gap, but existing Gaussian-based reconstruction methods often suffer from artifacts and inconsistencies with the underlying 3D structure, and struggle with real-time optimization, unable to provide users with immediate feedback in high quality. One of the bottlenecks arises from the massive amount of Gaussian parameters that need to be updated during optimization. Instead of using 3D Gaussian as a standalone map representation, we incorporate it into a volumetric mapping system to take advantage of geometric information and propose to use a quadtree data structure on images to drastically reduce the number of splats initialized. In this way, we simultaneously generate a compact 3D Gaussian map with fewer artifacts and a volumetric map on the fly. Our method, GSFusion, significantly enhances computational efficiency without sacrificing rendering quality, as demonstrated on both synthetic and real datasets. Code will be available at https://github.com/goldoak/GSFusion.", + "arxiv_url": "http://arxiv.org/abs/2408.12677v3", + "pdf_url": "http://arxiv.org/pdf/2408.12677v3", + "published_date": "2024-08-22", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/goldoak/GSFusion", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Subsurface Scattering for 3D Gaussian Splatting", + "authors": [ + "Jan-Niklas Dihlmann", + "Arjun Majumdar", + "Andreas Engelhardt", + "Raphael Braun", + "Hendrik P. A. Lensch" + ], + "abstract": "3D reconstruction and relighting of objects made from scattering materials present a significant challenge due to the complex light transport beneath the surface. 3D Gaussian Splatting introduced high-quality novel view synthesis at real-time speeds. While 3D Gaussians efficiently approximate an object's surface, they fail to capture the volumetric properties of subsurface scattering. We propose a framework for optimizing an object's shape together with the radiance transfer field given multi-view OLAT (one light at a time) data. Our method decomposes the scene into an explicit surface represented as 3D Gaussians, with a spatially varying BRDF, and an implicit volumetric representation of the scattering component. A learned incident light field accounts for shadowing. We optimize all parameters jointly via ray-traced differentiable rendering. Our approach enables material editing, relighting and novel view synthesis at interactive rates. We show successful application on synthetic data and introduce a newly acquired multi-view multi-light dataset of objects in a light-stage setup. Compared to previous work we achieve comparable or better results at a fraction of optimization and rendering time while enabling detailed control over material attributes. Project page https://sss.jdihlmann.com/", + "arxiv_url": "http://arxiv.org/abs/2408.12282v2", + "pdf_url": "http://arxiv.org/pdf/2408.12282v2", + "published_date": "2024-08-22", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Robust 3D Gaussian Splatting for Novel View Synthesis in Presence of Distractors", + "authors": [ + "Paul Ungermann", + "Armin Ettenhofer", + "Matthias Nießner", + "Barbara Roessle" + ], + "abstract": "3D Gaussian Splatting has shown impressive novel view synthesis results; nonetheless, it is vulnerable to dynamic objects polluting the input data of an otherwise static scene, so called distractors. Distractors have severe impact on the rendering quality as they get represented as view-dependent effects or result in floating artifacts. Our goal is to identify and ignore such distractors during the 3D Gaussian optimization to obtain a clean reconstruction. To this end, we take a self-supervised approach that looks at the image residuals during the optimization to determine areas that have likely been falsified by a distractor. In addition, we leverage a pretrained segmentation network to provide object awareness, enabling more accurate exclusion of distractors. This way, we obtain segmentation masks of distractors to effectively ignore them in the loss formulation. We demonstrate that our approach is robust to various distractors and strongly improves rendering quality on distractor-polluted scenes, improving PSNR by 1.86dB compared to 3D Gaussian Splatting.", + "arxiv_url": "http://arxiv.org/abs/2408.11697v1", + "pdf_url": "http://arxiv.org/pdf/2408.11697v1", + "published_date": "2024-08-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DeRainGS: Gaussian Splatting for Enhanced Scene Reconstruction in Rainy Environments", + "authors": [ + "Shuhong Liu", + "Xiang Chen", + "Hongming Chen", + "Quanfeng Xu", + "Mingrui Li" + ], + "abstract": "Reconstruction under adverse rainy conditions poses significant challenges due to reduced visibility and the distortion of visual perception. These conditions can severely impair the quality of geometric maps, which is essential for applications ranging from autonomous planning to environmental monitoring. In response to these challenges, this study introduces the novel task of 3D Reconstruction in Rainy Environments (3DRRE), specifically designed to address the complexities of reconstructing 3D scenes under rainy conditions. To benchmark this task, we construct the HydroViews dataset that comprises a diverse collection of both synthesized and real-world scene images characterized by various intensities of rain streaks and raindrops. Furthermore, we propose DeRainGS, the first 3DGS method tailored for reconstruction in adverse rainy environments. Extensive experiments across a wide range of rain scenarios demonstrate that our method delivers state-of-the-art performance, remarkably outperforming existing occlusion-free methods.", + "arxiv_url": "http://arxiv.org/abs/2408.11540v4", + "pdf_url": "http://arxiv.org/pdf/2408.11540v4", + "published_date": "2024-08-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting", + "authors": [ + "Wanshui Gan", + "Fang Liu", + "Hongbin Xu", + "Ningkai Mo", + "Naoto Yokoya" + ], + "abstract": "We introduce GaussianOcc, a systematic method that investigates the two usages of Gaussian splatting for fully self-supervised and efficient 3D occupancy estimation in surround views. First, traditional methods for self-supervised 3D occupancy estimation still require ground truth 6D poses from sensors during training. To address this limitation, we propose Gaussian Splatting for Projection (GSP) module to provide accurate scale information for fully self-supervised training from adjacent view projection. Additionally, existing methods rely on volume rendering for final 3D voxel representation learning using 2D signals (depth maps, semantic maps), which is both time-consuming and less effective. We propose Gaussian Splatting from Voxel space (GSV) to leverage the fast rendering properties of Gaussian splatting. As a result, the proposed GaussianOcc method enables fully self-supervised (no ground truth pose) 3D occupancy estimation in competitive performance with low computational cost (2.7 times faster in training and 5 times faster in rendering). The relevant code will be available in https://github.com/GANWANSHUI/GaussianOcc.git.", + "arxiv_url": "http://arxiv.org/abs/2408.11447v2", + "pdf_url": "http://arxiv.org/pdf/2408.11447v2", + "published_date": "2024-08-21", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/GANWANSHUI/GaussianOcc.git", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Pano2Room: Novel View Synthesis from a Single Indoor Panorama", + "authors": [ + "Guo Pu", + "Yiming Zhao", + "Zhouhui Lian" + ], + "abstract": "Recent single-view 3D generative methods have made significant advancements by leveraging knowledge distilled from extensive 3D object datasets. However, challenges persist in the synthesis of 3D scenes from a single view, primarily due to the complexity of real-world environments and the limited availability of high-quality prior resources. In this paper, we introduce a novel approach called Pano2Room, designed to automatically reconstruct high-quality 3D indoor scenes from a single panoramic image. These panoramic images can be easily generated using a panoramic RGBD inpainter from captures at a single location with any camera. The key idea is to initially construct a preliminary mesh from the input panorama, and iteratively refine this mesh using a panoramic RGBD inpainter while collecting photo-realistic 3D-consistent pseudo novel views. Finally, the refined mesh is converted into a 3D Gaussian Splatting field and trained with the collected pseudo novel views. This pipeline enables the reconstruction of real-world 3D scenes, even in the presence of large occlusions, and facilitates the synthesis of photo-realistic novel views with detailed geometry. Extensive qualitative and quantitative experiments have been conducted to validate the superiority of our method in single-panorama indoor novel synthesis compared to the state-of-the-art. Our code and data are available at \\url{https://github.com/TrickyGo/Pano2Room}.", + "arxiv_url": "http://arxiv.org/abs/2408.11413v2", + "pdf_url": "http://arxiv.org/pdf/2408.11413v2", + "published_date": "2024-08-21", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "https://github.com/TrickyGo/Pano2Room", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting", + "authors": [ + "Changkun Liu", + "Shuai Chen", + "Yash Bhalgat", + "Siyan Hu", + "Ming Cheng", + "Zirui Wang", + "Victor Adrian Prisacariu", + "Tristan Braud" + ], + "abstract": "We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences. GSLoc obviates the need for training feature extractors or descriptors by operating directly on RGB images, utilizing the 3D foundation model, MASt3R, for precise 2D matching. To improve the robustness of our model in challenging outdoor environments, we incorporate an exposure-adaptive module within the 3DGS framework. Consequently, GSLoc enables efficient one-shot pose refinement given a single RGB query and a coarse initial pose estimation. Our proposed approach surpasses leading NeRF-based optimization methods in both accuracy and runtime across indoor and outdoor visual localization benchmarks, achieving new state-of-the-art accuracy on two indoor datasets.", + "arxiv_url": "http://arxiv.org/abs/2408.11085v2", + "pdf_url": "http://arxiv.org/pdf/2408.11085v2", + "published_date": "2024-08-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Large Point-to-Gaussian Model for Image-to-3D Generation", + "authors": [ + "Longfei Lu", + "Huachen Gao", + "Tao Dai", + "Yaohua Zha", + "Zhi Hou", + "Junta Wu", + "Shu-Tao Xia" + ], + "abstract": "Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large Point-to-Gaussian model, that inputs the initial point cloud produced from large 3D diffusion model conditional on 2D image to generate the Gaussian parameters, for image-to-3D generation. The point cloud provides initial 3D geometry prior for Gaussian generation, thus significantly facilitating image-to-3D Generation. Moreover, we present the \\textbf{A}ttention mechanism, \\textbf{P}rojection mechanism, and \\textbf{P}oint feature extractor, dubbed as \\textbf{APP} block, for fusing the image features with point cloud features. The qualitative and quantitative experiments extensively demonstrate the effectiveness of the proposed approach on GSO and Objaverse datasets, and show the proposed method achieves state-of-the-art performance.", + "arxiv_url": "http://arxiv.org/abs/2408.10935v1", + "pdf_url": "http://arxiv.org/pdf/2408.10935v1", + "published_date": "2024-08-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining", + "authors": [ + "Qi Ma", + "Yue Li", + "Bin Ren", + "Nicu Sebe", + "Ender Konukoglu", + "Theo Gevers", + "Luc Van Gool", + "Danda Pani Paudel" + ], + "abstract": "3D Gaussian Splatting (3DGS) has become the de facto method of 3D representation in many vision tasks. This calls for the 3D understanding directly in this representation space. To facilitate the research in this direction, we first build a large-scale dataset of 3DGS using the commonly used ShapeNet and ModelNet datasets. Our dataset ShapeSplat consists of 65K objects from 87 unique categories, whose labels are in accordance with the respective datasets. The creation of this dataset utilized the compute equivalent of 2 GPU years on a TITAN XP GPU. We utilize our dataset for unsupervised pretraining and supervised finetuning for classification and segmentation tasks. To this end, we introduce \\textbf{\\textit{Gaussian-MAE}}, which highlights the unique benefits of representation learning from Gaussian parameters. Through exhaustive experiments, we provide several valuable insights. In particular, we show that (1) the distribution of the optimized GS centroids significantly differs from the uniformly sampled point cloud (used for initialization) counterpart; (2) this change in distribution results in degradation in classification but improvement in segmentation tasks when using only the centroids; (3) to leverage additional Gaussian parameters, we propose Gaussian feature grouping in a normalized feature space, along with splats pooling layer, offering a tailored solution to effectively group and embed similar Gaussians, which leads to notable improvement in finetuning tasks.", + "arxiv_url": "http://arxiv.org/abs/2408.10906v1", + "pdf_url": "http://arxiv.org/pdf/2408.10906v1", + "published_date": "2024-08-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PartGS:Learning Part-aware 3D Representations by Fusing 2D Gaussians and Superquadrics", + "authors": [ + "Zhirui Gao", + "Renjiao Yi", + "Yuhang Huang", + "Wei Chen", + "Chenyang Zhu", + "Kai Xu" + ], + "abstract": "Low-level 3D representations, such as point clouds, meshes, NeRFs, and 3D Gaussians, are commonly used to represent 3D objects or scenes. However, human perception typically understands 3D objects at a higher level as a composition of parts or structures rather than points or voxels. Representing 3D objects or scenes as semantic parts can benefit further understanding and applications. In this paper, we introduce $\\textbf{PartGS}$, $\\textbf{part}$-aware 3D reconstruction by a hybrid representation of 2D $\\textbf{G}$aussians and $\\textbf{S}$uperquadrics, which parses objects or scenes into semantic parts, digging 3D structural clues from multi-view image inputs. Accurate structured geometry reconstruction and high-quality rendering are achieved at the same time. Our method simultaneously optimizes superquadric meshes and Gaussians by coupling their parameters within our hybrid representation. On one hand, this hybrid representation inherits the advantage of superquadrics to represent different shape primitives, supporting flexible part decomposition of scenes. On the other hand, 2D Gaussians capture complex texture and geometry details, ensuring high-quality appearance and geometry reconstruction. Our method is fully unsupervised and outperforms existing state-of-the-art approaches in extensive experiments on DTU, ShapeNet, and real-life datasets.", + "arxiv_url": "http://arxiv.org/abs/2408.10789v2", + "pdf_url": "http://arxiv.org/pdf/2408.10789v2", + "published_date": "2024-08-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DEGAS: Detailed Expressions on Full-Body Gaussian Avatars", + "authors": [ + "Zhijing Shao", + "Duotun Wang", + "Qing-Yao Tian", + "Yao-Dong Yang", + "Hengyu Meng", + "Zeyu Cai", + "Bo Dong", + "Yu Zhang", + "Kang Zhang", + "Zeyu Wang" + ], + "abstract": "Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method learns a conditional variational autoencoder that takes both the body motion and facial expression as driving signals to generate Gaussian maps in the UV layout. To drive the facial expressions, instead of the commonly used 3D Morphable Models (3DMMs) in 3D head avatars, we propose to adopt the expression latent space trained solely on 2D portrait images, bridging the gap between 2D talking faces and 3D avatars. Leveraging the rendering capability of 3DGS and the rich expressiveness of the expression latent space, the learned avatars can be reenacted to reproduce photorealistic rendering images with subtle and accurate facial expressions. Experiments on an existing dataset and our newly proposed dataset of full-body talking avatars demonstrate the efficacy of our method. We also propose an audio-driven extension of our method with the help of 2D talking faces, opening new possibilities to interactive AI agents.", + "arxiv_url": "http://arxiv.org/abs/2408.10588v1", + "pdf_url": "http://arxiv.org/pdf/2408.10588v1", + "published_date": "2024-08-20", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LoopSplat: Loop Closure by Registering 3D Gaussian Splats", + "authors": [ + "Liyuan Zhu", + "Yue Li", + "Erik Sandström", + "Shengyu Huang", + "Konrad Schindler", + "Iro Armeni" + ], + "abstract": "Simultaneous Localization and Mapping (SLAM) based on 3D Gaussian Splats (3DGS) has recently shown promise towards more accurate, dense 3D scene maps. However, existing 3DGS-based methods fail to address the global consistency of the scene via loop closure and/or global bundle adjustment. To this end, we propose LoopSplat, which takes RGB-D images as input and performs dense mapping with 3DGS submaps and frame-to-model tracking. LoopSplat triggers loop closure online and computes relative loop edge constraints between submaps directly via 3DGS registration, leading to improvements in efficiency and accuracy over traditional global-to-local point cloud registration. It uses a robust pose graph optimization formulation and rigidly aligns the submaps to achieve global consistency. Evaluation on the synthetic Replica and real-world TUM-RGBD, ScanNet, and ScanNet++ datasets demonstrates competitive or superior tracking, mapping, and rendering compared to existing methods for dense RGB-D SLAM. Code is available at loopsplat.github.io.", + "arxiv_url": "http://arxiv.org/abs/2408.10154v2", + "pdf_url": "http://arxiv.org/pdf/2408.10154v2", + "published_date": "2024-08-19", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation", + "authors": [ + "Minye Wu", + "Tinne Tuytelaars" + ], + "abstract": "Recent advancements in photo-realistic novel view synthesis have been significantly driven by Gaussian Splatting (3DGS). Nevertheless, the explicit nature of 3DGS data entails considerable storage requirements, highlighting a pressing need for more efficient data representations. To address this, we present Implicit Gaussian Splatting (IGS), an innovative hybrid model that integrates explicit point clouds with implicit feature embeddings through a multi-level tri-plane architecture. This architecture features 2D feature grids at various resolutions across different levels, facilitating continuous spatial domain representation and enhancing spatial correlations among Gaussian primitives. Building upon this foundation, we introduce a level-based progressive training scheme, which incorporates explicit spatial regularization. This method capitalizes on spatial correlations to enhance both the rendering quality and the compactness of the IGS representation. Furthermore, we propose a novel compression pipeline tailored for both point clouds and 2D feature grids, considering the entropy variations across different levels. Extensive experimental evaluations demonstrate that our algorithm can deliver high-quality rendering using only a few MBs, effectively balancing storage efficiency and rendering fidelity, and yielding results that are competitive with the state-of-the-art.", + "arxiv_url": "http://arxiv.org/abs/2408.10041v2", + "pdf_url": "http://arxiv.org/pdf/2408.10041v2", + "published_date": "2024-08-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Topology-aware Human Avatars with Semantically-guided Gaussian Splatting", + "authors": [ + "Haoyu Zhao", + "Chen Yang", + "Hao Wang", + "Xingyue Zhao", + "Wei Shen" + ], + "abstract": "Reconstructing photo-realistic and topology-aware animatable human avatars from monocular videos remains challenging in computer vision and graphics. Recently, methods using 3D Gaussians to represent the human body have emerged, offering faster optimization and real-time rendering. However, due to ignoring the crucial role of human body semantic information which represents the explicit topological and intrinsic structure within human body, they fail to achieve fine-detail reconstruction of human avatars. To address this issue, we propose SG-GS, which uses semantics-embedded 3D Gaussians, skeleton-driven rigid deformation, and non-rigid cloth dynamics deformation to create photo-realistic human avatars. We then design a Semantic Human-Body Annotator (SHA) which utilizes SMPL's semantic prior for efficient body part semantic labeling. The generated labels are used to guide the optimization of semantic attributes of Gaussian. To capture the explicit topological structure of the human body, we employ a 3D network that integrates both topological and geometric associations for human avatar deformation. We further implement three key strategies to enhance the semantic accuracy of 3D Gaussians and rendering quality: semantic projection with 2D regularization, semantic-guided density regularization and semantic-aware regularization with neighborhood consistency. Extensive experiments demonstrate that SG-GS achieves state-of-the-art geometry and appearance reconstruction performance.", + "arxiv_url": "http://arxiv.org/abs/2408.09665v2", + "pdf_url": "http://arxiv.org/pdf/2408.09665v2", + "published_date": "2024-08-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning", + "authors": [ + "Haoyu Zhao", + "Hao Wang", + "Chen Yang", + "Wei Shen" + ], + "abstract": "Existing approaches for human avatar generation--both NeRF-based and 3D Gaussian Splatting (3DGS) based--struggle with maintaining 3D consistency and exhibit degraded detail reconstruction, particularly when training with sparse inputs. To address this challenge, we propose CHASE, a novel framework that achieves dense-input-level performance using only sparse inputs through two key innovations: cross-pose intrinsic 3D consistency supervision and 3D geometry contrastive learning. Building upon prior skeleton-driven approaches that combine rigid deformation with non-rigid cloth dynamics, we first establish baseline avatars with fundamental 3D consistency. To enhance 3D consistency under sparse inputs, we introduce a Dynamic Avatar Adjustment (DAA) module, which refines deformed Gaussians by leveraging similar poses from the training set. By minimizing the rendering discrepancy between adjusted Gaussians and reference poses, DAA provides additional supervision for avatar reconstruction. We further maintain global 3D consistency through a novel geometry-aware contrastive learning strategy. While designed for sparse inputs, CHASE surpasses state-of-the-art methods across both full and sparse settings on ZJU-MoCap and H36M datasets, demonstrating that our enhanced 3D consistency leads to superior rendering quality.", + "arxiv_url": "http://arxiv.org/abs/2408.09663v3", + "pdf_url": "http://arxiv.org/pdf/2408.09663v3", + "published_date": "2024-08-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting", + "authors": [ + "Sheng Ye", + "Zhen-Hui Dong", + "Yubin Hu", + "Yu-Hui Wen", + "Yong-Jin Liu" + ], + "abstract": "3D Gaussian Splatting has recently emerged as a powerful representation that can synthesize remarkable novel views using consistent multi-view images as input. However, we notice that images captured in dark environments where the scenes are not fully illuminated can exhibit considerable brightness variations and multi-view inconsistency, which poses great challenges to 3D Gaussian Splatting and severely degrades its performance. To tackle this problem, we propose Gaussian-DK. Observing that inconsistencies are mainly caused by camera imaging, we represent a consistent radiance field of the physical world using a set of anisotropic 3D Gaussians, and design a camera response module to compensate for multi-view inconsistencies. We also introduce a step-based gradient scaling strategy to constrain Gaussians near the camera, which turn out to be floaters, from splitting and cloning. Experiments on our proposed benchmark dataset demonstrate that Gaussian-DK produces high-quality renderings without ghosting and floater artifacts and significantly outperforms existing methods. Furthermore, we can also synthesize light-up images by controlling exposure levels that clearly show details in shadow areas.", + "arxiv_url": "http://arxiv.org/abs/2408.09130v2", + "pdf_url": "http://arxiv.org/pdf/2408.09130v2", + "published_date": "2024-08-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Correspondence-Guided SfM-Free 3D Gaussian Splatting for NVS", + "authors": [ + "Wei Sun", + "Xiaosong Zhang", + "Fang Wan", + "Yanzhao Zhou", + "Yuan Li", + "Qixiang Ye", + "Jianbin Jiao" + ], + "abstract": "Novel View Synthesis (NVS) without Structure-from-Motion (SfM) pre-processed camera poses--referred to as SfM-free methods--is crucial for promoting rapid response capabilities and enhancing robustness against variable operating conditions. Recent SfM-free methods have integrated pose optimization, designing end-to-end frameworks for joint camera pose estimation and NVS. However, most existing works rely on per-pixel image loss functions, such as L2 loss. In SfM-free methods, inaccurate initial poses lead to misalignment issue, which, under the constraints of per-pixel image loss functions, results in excessive gradients, causing unstable optimization and poor convergence for NVS. In this study, we propose a correspondence-guided SfM-free 3D Gaussian splatting for NVS. We use correspondences between the target and the rendered result to achieve better pixel alignment, facilitating the optimization of relative poses between frames. We then apply the learned poses to optimize the entire scene. Each 2D screen-space pixel is associated with its corresponding 3D Gaussians through approximated surface rendering to facilitate gradient back propagation. Experimental results underline the superior performance and time efficiency of the proposed approach compared to the state-of-the-art baselines.", + "arxiv_url": "http://arxiv.org/abs/2408.08723v1", + "pdf_url": "http://arxiv.org/pdf/2408.08723v1", + "published_date": "2024-08-16", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization", + "authors": [ + "Kang Du", + "Zhihao Liang", + "Zeyu Wang" + ], + "abstract": "We present GS-ID, a novel framework for illumination decomposition on Gaussian Splatting, achieving photorealistic novel view synthesis and intuitive light editing. Illumination decomposition is an ill-posed problem facing three main challenges: 1) priors for geometry and material are often lacking; 2) complex illumination conditions involve multiple unknown light sources; and 3) calculating surface shading with numerous light sources is computationally expensive. To address these challenges, we first introduce intrinsic diffusion priors to estimate the attributes for physically based rendering. Then we divide the illumination into environmental and direct components for joint optimization. Last, we employ deferred rendering to reduce the computational load. Our framework uses a learnable environment map and Spherical Gaussians (SGs) to represent light sources parametrically, therefore enabling controllable and photorealistic relighting on Gaussian Splatting. Extensive experiments and applications demonstrate that GS-ID produces state-of-the-art illumination decomposition results while achieving better geometry reconstruction and rendering performance.", + "arxiv_url": "http://arxiv.org/abs/2408.08524v1", + "pdf_url": "http://arxiv.org/pdf/2408.08524v1", + "published_date": "2024-08-16", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "WaterSplatting: Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting", + "authors": [ + "Huapeng Li", + "Wenxuan Song", + "Tianao Xu", + "Alexandre Elsig", + "Jonas Kulhanek" + ], + "abstract": "The underwater 3D scene reconstruction is a challenging, yet interesting problem with applications ranging from naval robots to VR experiences. The problem was successfully tackled by fully volumetric NeRF-based methods which can model both the geometry and the medium (water). Unfortunately, these methods are slow to train and do not offer real-time rendering. More recently, 3D Gaussian Splatting (3DGS) method offered a fast alternative to NeRFs. However, because it is an explicit method that renders only the geometry, it cannot render the medium and is therefore unsuited for underwater reconstruction. Therefore, we propose a novel approach that fuses volumetric rendering with 3DGS to handle underwater data effectively. Our method employs 3DGS for explicit geometry representation and a separate volumetric field (queried once per pixel) for capturing the scattering medium. This dual representation further allows the restoration of the scenes by removing the scattering medium. Our method outperforms state-of-the-art NeRF-based methods in rendering quality on the underwater SeaThru-NeRF dataset. Furthermore, it does so while offering real-time rendering performance, addressing the efficiency limitations of existing methods. Web: https://water-splatting.github.io", + "arxiv_url": "http://arxiv.org/abs/2408.08206v1", + "pdf_url": "http://arxiv.org/pdf/2408.08206v1", + "published_date": "2024-08-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering", + "authors": [ + "Guofeng Feng", + "Siyan Chen", + "Rong Fu", + "Zimu Liao", + "Yi Wang", + "Tao Liu", + "Zhilin Pei", + "Hengjie Li", + "Xingcheng Zhang", + "Bo Dai" + ], + "abstract": "This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper includes a suite of optimization strategies, encompassing redundancy elimination, efficient pipelining, refined control and scheduling mechanisms, and memory access optimizations, all of which are meticulously integrated to amplify the performance of the rasterization process. An extensive evaluation of FlashGS' performance has been conducted across a diverse spectrum of synthetic and real-world large-scale scenes, encompassing a variety of image resolutions. The empirical findings demonstrate that FlashGS consistently achieves an average 4x acceleration over mobile consumer GPUs, coupled with reduced memory consumption. These results underscore the superior performance and resource optimization capabilities of FlashGS, positioning it as a formidable tool in the domain of 3D rendering.", + "arxiv_url": "http://arxiv.org/abs/2408.07967v2", + "pdf_url": "http://arxiv.org/pdf/2408.07967v2", + "published_date": "2024-08-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Progressive Radiance Distillation for Inverse Rendering with Gaussian Splatting", + "authors": [ + "Keyang Ye", + "Qiming Hou", + "Kun Zhou" + ], + "abstract": "We propose progressive radiance distillation, an inverse rendering method that combines physically-based rendering with Gaussian-based radiance field rendering using a distillation progress map. Taking multi-view images as input, our method starts from a pre-trained radiance field guidance, and distills physically-based light and material parameters from the radiance field using an image-fitting process. The distillation progress map is initialized to a small value, which favors radiance field rendering. During early iterations when fitted light and material parameters are far from convergence, the radiance field fallback ensures the sanity of image loss gradients and avoids local minima that attracts under-fit states. As fitted parameters converge, the physical model gradually takes over and the distillation progress increases correspondingly. In presence of light paths unmodeled by the physical model, the distillation progress never finishes on affected pixels and the learned radiance field stays in the final rendering. With this designed tolerance for physical model limitations, we prevent unmodeled color components from leaking into light and material parameters, alleviating relighting artifacts. Meanwhile, the remaining radiance field compensates for the limitations of the physical model, guaranteeing high-quality novel views synthesis. Experimental results demonstrate that our method significantly outperforms state-of-the-art techniques quality-wise in both novel view synthesis and relighting. The idea of progressive radiance distillation is not limited to Gaussian splatting. We show that it also has positive effects for prominently specular scenes when adapted to a mesh-based inverse rendering method.", + "arxiv_url": "http://arxiv.org/abs/2408.07595v1", + "pdf_url": "http://arxiv.org/pdf/2408.07595v1", + "published_date": "2024-08-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Editing with A Single Image", + "authors": [ + "Guan Luo", + "Tian-Xing Xu", + "Ying-Tian Liu", + "Xiao-Xiong Fan", + "Fang-Lue Zhang", + "Song-Hai Zhang" + ], + "abstract": "The modeling and manipulation of 3D scenes captured from the real world are pivotal in various applications, attracting growing research interest. While previous works on editing have achieved interesting results through manipulating 3D meshes, they often require accurately reconstructed meshes to perform editing, which limits their application in 3D content generation. To address this gap, we introduce a novel single-image-driven 3D scene editing approach based on 3D Gaussian Splatting, enabling intuitive manipulation via directly editing the content on a 2D image plane. Our method learns to optimize the 3D Gaussians to align with an edited version of the image rendered from a user-specified viewpoint of the original scene. To capture long-range object deformation, we introduce positional loss into the optimization process of 3D Gaussian Splatting and enable gradient propagation through reparameterization. To handle occluded 3D Gaussians when rendering from the specified viewpoint, we build an anchor-based structure and employ a coarse-to-fine optimization strategy capable of handling long-range deformation while maintaining structural stability. Furthermore, we design a novel masking strategy to adaptively identify non-rigid deformation regions for fine-scale modeling. Extensive experiments show the effectiveness of our method in handling geometric details, long-range, and non-rigid deformation, demonstrating superior editing flexibility and quality compared to previous approaches.", + "arxiv_url": "http://arxiv.org/abs/2408.07540v1", + "pdf_url": "http://arxiv.org/pdf/2408.07540v1", + "published_date": "2024-08-14", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space", + "authors": [ + "Hyunjee Lee", + "Youngsik Yun", + "Jeongmin Bae", + "Seoha Kim", + "Youngjung Uh" + ], + "abstract": "Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are 2D masks and their supervision is anchored at 2D pixels. This paper revisits the problem set to pursue a better 3D understanding of a scene modeled by NeRFs and 3DGS as follows. 1) We directly supervise the 3D points to train the language embedding field. It achieves state-of-the-art accuracy without relying on multi-scale language embeddings. 2) We transfer the pre-trained language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. 3) We introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations will be available online. Project page: https://hyunji12.github.io/Open3DRF", + "arxiv_url": "http://arxiv.org/abs/2408.07416v2", + "pdf_url": "http://arxiv.org/pdf/2408.07416v2", + "published_date": "2024-08-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis", + "authors": [ + "Saptarshi Neil Sinha", + "Holger Graf", + "Michael Weinmann" + ], + "abstract": "We propose a novel cross-spectral rendering framework based on 3D Gaussian Splatting (3DGS) that generates realistic and semantically meaningful splats from registered multi-view spectrum and segmentation maps. This extension enhances the representation of scenes with multiple spectra, providing insights into the underlying materials and segmentation. We introduce an improved physically-based rendering approach for Gaussian splats, estimating reflectance and lights per spectra, thereby enhancing accuracy and realism. In a comprehensive quantitative and qualitative evaluation, we demonstrate the superior performance of our approach with respect to other recent learning-based spectral scene representation approaches (i.e., XNeRF and SpectralNeRF) as well as other non-spectral state-of-the-art learning-based approaches. Our work also demonstrates the potential of spectral scene understanding for precise scene editing techniques like style transfer, inpainting, and removal. Thereby, our contributions address challenges in multi-spectral scene representation, rendering, and editing, offering new possibilities for diverse applications.", + "arxiv_url": "http://arxiv.org/abs/2408.06975v1", + "pdf_url": "http://arxiv.org/pdf/2408.06975v1", + "published_date": "2024-08-13", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "I.2.10; I.3.7; I.4.8; I.4.1" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HDRGS: High Dynamic Range Gaussian Splatting", + "authors": [ + "Jiahao Wu", + "Lu Xiao", + "Rui Peng", + "Kaiqiang Xiong", + "Ronggang Wang" + ], + "abstract": "Recent years have witnessed substantial advancements in the field of 3D reconstruction from 2D images, particularly following the introduction of the neural radiance field (NeRF) technique. However, reconstructing a 3D high dynamic range (HDR) radiance field, which aligns more closely with real-world conditions, from 2D multi-exposure low dynamic range (LDR) images continues to pose significant challenges. Approaches to this issue fall into two categories: grid-based and implicit-based. Implicit methods, using multi-layer perceptrons (MLP), face inefficiencies, limited solvability, and overfitting risks. Conversely, grid-based methods require significant memory and struggle with image quality and long training times. In this paper, we introduce Gaussian Splatting-a recent, high-quality, real-time 3D reconstruction technique-into this domain. We further develop the High Dynamic Range Gaussian Splatting (HDR-GS) method, designed to address the aforementioned challenges. This method enhances color dimensionality by including luminance and uses an asymmetric grid for tone-mapping, swiftly and precisely converting pixel irradiance to color. Our approach improves HDR scene recovery accuracy and integrates a novel coarse-to-fine strategy to speed up model convergence, enhancing robustness against sparse viewpoints and exposure extremes, and preventing local optima. Extensive testing confirms that our method surpasses current state-of-the-art techniques in both synthetic and real-world scenarios.", + "arxiv_url": "http://arxiv.org/abs/2408.06543v3", + "pdf_url": "http://arxiv.org/pdf/2408.06543v3", + "published_date": "2024-08-13", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering", + "authors": [ + "Jiameng Li", + "Yue Shi", + "Jiezhang Cao", + "Bingbing Ni", + "Wenjun Zhang", + "Kai Zhang", + "Luc Van Gool" + ], + "abstract": "3D Gaussian Splatting (3DGS) has attracted great attention in novel view synthesis because of its superior rendering efficiency and high fidelity. However, the trained Gaussians suffer from severe zooming degradation due to non-adjustable representation derived from single-scale training. Though some methods attempt to tackle this problem via post-processing techniques such as selective rendering or filtering techniques towards primitives, the scale-specific information is not involved in Gaussians. In this paper, we propose a unified optimization method to make Gaussians adaptive for arbitrary scales by self-adjusting the primitive properties (e.g., color, shape and size) and distribution (e.g., position). Inspired by the mipmap technique, we design pseudo ground-truth for the target scale and propose a scale-consistency guidance loss to inject scale information into 3D Gaussians. Our method is a plug-in module, applicable for any 3DGS models to solve the zoom-in and zoom-out aliasing. Extensive experiments demonstrate the effectiveness of our method. Notably, our method outperforms 3DGS in PSNR by an average of 9.25 dB for zoom-in and 10.40 dB for zoom-out on the NeRF Synthetic dataset.", + "arxiv_url": "http://arxiv.org/abs/2408.06286v1", + "pdf_url": "http://arxiv.org/pdf/2408.06286v1", + "published_date": "2024-08-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Developing Smart MAVs for Autonomous Inspection in GPS-denied Constructions", + "authors": [ + "Paoqiang Pan", + "Kewei Hu", + "Xiao Huang", + "Wei Ying", + "Xiaoxuan Xie", + "Yue Ma", + "Naizhong Zhang", + "Hanwen Kang" + ], + "abstract": "Smart Micro Aerial Vehicles (MAVs) have transformed infrastructure inspection by enabling efficient, high-resolution monitoring at various stages of construction, including hard-to-reach areas. Traditional manual operation of drones in GPS-denied environments, such as industrial facilities and infrastructure, is labour-intensive, tedious and prone to error. This study presents an innovative framework for smart MAV inspections in such complex and GPS-denied indoor environments. The framework features a hierarchical perception and planning system that identifies regions of interest and optimises task paths. It also presents an advanced MAV system with enhanced localisation and motion planning capabilities, integrated with Neural Reconstruction technology for comprehensive 3D reconstruction of building structures. The effectiveness of the framework was empirically validated in a 4,000 square meters indoor infrastructure facility with an interior length of 80 metres, a width of 50 metres and a height of 7 metres. The main structure consists of columns and walls. Experimental results show that our MAV system performs exceptionally well in autonomous inspection tasks, achieving a 100\\% success rate in generating and executing scan paths. Extensive experiments validate the manoeuvrability of our developed MAV, achieving a 100\\% success rate in motion planning with a tracking error of less than 0.1 metres. In addition, the enhanced reconstruction method using 3D Gaussian Splatting technology enables the generation of high-fidelity rendering models from the acquired data. Overall, our novel method represents a significant advancement in the use of robotics for infrastructure inspection.", + "arxiv_url": "http://arxiv.org/abs/2408.06030v1", + "pdf_url": "http://arxiv.org/pdf/2408.06030v1", + "published_date": "2024-08-12", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors", + "authors": [ + "Xiaozheng Zheng", + "Chao Wen", + "Zhaohu Li", + "Weiyi Zhang", + "Zhuo Su", + "Xu Chang", + "Yang Zhao", + "Zheng Lv", + "Xiaoyuan Zhang", + "Yongjie Zhang", + "Guidong Wang", + "Lan Xu" + ], + "abstract": "In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. Given the underconstrained nature of this problem, incorporating prior knowledge is essential. Therefore, we propose a framework comprising prior learning and avatar creation phases. The prior learning phase leverages 3D head priors derived from a large-scale multi-view dynamic dataset, and the avatar creation phase applies these priors for few-shot personalization. Our approach effectively captures these priors by utilizing a Gaussian Splatting-based auto-decoder network with part-based dynamic modeling. Our method employs identity-shared encoding with personalized latent codes for individual identities to learn the attributes of Gaussian primitives. During the avatar creation phase, we achieve fast head avatar personalization by leveraging inversion and fine-tuning strategies. Extensive experiments demonstrate that our model effectively exploits head priors and successfully generalizes them to few-shot personalization, achieving photo-realistic rendering quality, multi-view consistency, and stable animation.", + "arxiv_url": "http://arxiv.org/abs/2408.06019v1", + "pdf_url": "http://arxiv.org/pdf/2408.06019v1", + "published_date": "2024-08-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Visual SLAM with 3D Gaussian Primitives and Depth Priors Enabling Novel View Synthesis", + "authors": [ + "Zhongche Qu", + "Zhi Zhang", + "Cong Liu", + "Jianhua Yin" + ], + "abstract": "Conventional geometry-based SLAM systems lack dense 3D reconstruction capabilities since their data association usually relies on feature correspondences. Additionally, learning-based SLAM systems often fall short in terms of real-time performance and accuracy. Balancing real-time performance with dense 3D reconstruction capabilities is a challenging problem. In this paper, we propose a real-time RGB-D SLAM system that incorporates a novel view synthesis technique, 3D Gaussian Splatting, for 3D scene representation and pose estimation. This technique leverages the real-time rendering performance of 3D Gaussian Splatting with rasterization and allows for differentiable optimization in real time through CUDA implementation. We also enable mesh reconstruction from 3D Gaussians for explicit dense 3D reconstruction. To estimate accurate camera poses, we utilize a rotation-translation decoupled strategy with inverse optimization. This involves iteratively updating both in several iterations through gradient-based optimization. This process includes differentiably rendering RGB, depth, and silhouette maps and updating the camera parameters to minimize a combined loss of photometric loss, depth geometry loss, and visibility loss, given the existing 3D Gaussian map. However, 3D Gaussian Splatting (3DGS) struggles to accurately represent surfaces due to the multi-view inconsistency of 3D Gaussians, which can lead to reduced accuracy in both camera pose estimation and scene reconstruction. To address this, we utilize depth priors as additional regularization to enforce geometric constraints, thereby improving the accuracy of both pose estimation and 3D reconstruction. We also provide extensive experimental results on public benchmark datasets to demonstrate the effectiveness of our proposed methods in terms of pose accuracy, geometric accuracy, and rendering performance.", + "arxiv_url": "http://arxiv.org/abs/2408.05635v2", + "pdf_url": "http://arxiv.org/pdf/2408.05635v2", + "published_date": "2024-08-10", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PRTGaussian: Efficient Relighting Using 3D Gaussians with Precomputed Radiance Transfer", + "authors": [ + "Libo Zhang", + "Yuxuan Han", + "Wenbin Lin", + "Jingwang Ling", + "Feng Xu" + ], + "abstract": "We present PRTGaussian, a realtime relightable novel-view synthesis method made possible by combining 3D Gaussians and Precomputed Radiance Transfer (PRT). By fitting relightable Gaussians to multi-view OLAT data, our method enables real-time, free-viewpoint relighting. By estimating the radiance transfer based on high-order spherical harmonics, we achieve a balance between capturing detailed relighting effects and maintaining computational efficiency. We utilize a two-stage process: in the first stage, we reconstruct a coarse geometry of the object from multi-view images. In the second stage, we initialize 3D Gaussians with the obtained point cloud, then simultaneously refine the coarse geometry and learn the light transport for each Gaussian. Extensive experiments on synthetic datasets show that our approach can achieve fast and high-quality relighting for general objects. Code and data are available at https://github.com/zhanglbthu/PRTGaussian.", + "arxiv_url": "http://arxiv.org/abs/2408.05631v1", + "pdf_url": "http://arxiv.org/pdf/2408.05631v1", + "published_date": "2024-08-10", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "https://github.com/zhanglbthu/PRTGaussian", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow", + "authors": [ + "Hangyu Li", + "Xiangxiang Chu", + "Dingyuan Shi", + "Wang Lin" + ], + "abstract": "Recent advances in text-to-3D generation have made significant progress. In particular, with the pretrained diffusion models, existing methods predominantly use Score Distillation Sampling (SDS) to train 3D models such as Neural RaRecent advances in text-to-3D generation have made significant progress. In particular, with the pretrained diffusion models, existing methods predominantly use Score Distillation Sampling (SDS) to train 3D models such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3D GS). However, a hurdle is that they often encounter difficulties with over-smoothing textures and over-saturating colors. The rectified flow model -- which utilizes a simple ordinary differential equation (ODE) to represent a straight trajectory -- shows promise as an alternative prior to text-to-3D generation. It learns a time-independent vector field, thereby reducing the ambiguity in 3D model update gradients that are calculated using time-dependent scores in the SDS framework. In light of this, we first develop a mathematical analysis to seamlessly integrate SDS with rectified flow model, paving the way for our initial framework known as Vector Field Distillation Sampling (VFDS). However, empirical findings indicate that VFDS still results in over-smoothing outcomes. Therefore, we analyze the grounding reasons for such a failure from the perspective of ODE trajectories. On top, we propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence. The key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise, rather than using randomly sampled noise as in VFDS. Accordingly, we introduce a novel Unique Couple Matching (UCM) loss, which guides the 3D model to optimize along the same trajectory.", + "arxiv_url": "http://arxiv.org/abs/2408.05008v3", + "pdf_url": "http://arxiv.org/pdf/2408.05008v3", + "published_date": "2024-08-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Self-augmented Gaussian Splatting with Structure-aware Masks for Sparse-view 3D Reconstruction", + "authors": [ + "Lingbei Meng", + "Bi'an Du", + "Wei Hu" + ], + "abstract": "Sparse-view 3D reconstruction stands as a formidable challenge in computer vision, aiming to build complete three-dimensional models from a limited array of viewing perspectives. This task confronts several difficulties: 1) the limited number of input images that lack consistent information; 2) dependence on the quality of input images; and 3) the substantial size of model parameters. To address these challenges, we propose a self-augmented coarse-to-fine Gaussian splatting paradigm, enhanced with a structure-aware mask, for sparse-view 3D reconstruction. In particular, our method initially employs a coarse Gaussian model to obtain a basic 3D representation from sparse-view inputs. Subsequently, we develop a fine Gaussian network to enhance consistent and detailed representation of the output with both 3D geometry augmentation and perceptual view augmentation. During training, we design a structure-aware masking strategy to further improve the model's robustness against sparse inputs and noise.Experimental results on the MipNeRF360 and OmniObject3D datasets demonstrate that the proposed method achieves state-of-the-art performances for sparse input views in both perceptual quality and efficiency.", + "arxiv_url": "http://arxiv.org/abs/2408.04831v2", + "pdf_url": "http://arxiv.org/pdf/2408.04831v2", + "published_date": "2024-08-09", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery", + "authors": [ + "Mengya Xu", + "Ziqi Guo", + "An Wang", + "Long Bai", + "Hongliang Ren" + ], + "abstract": "As a crucial and intricate task in robotic minimally invasive surgery, reconstructing surgical scenes using stereo or monocular endoscopic video holds immense potential for clinical applications. NeRF-based techniques have recently garnered attention for the ability to reconstruct scenes implicitly. On the other hand, Gaussian splatting-based 3D-GS represents scenes explicitly using 3D Gaussians and projects them onto a 2D plane as a replacement for the complex volume rendering in NeRF. However, these methods face challenges regarding surgical scene reconstruction, such as slow inference, dynamic scenes, and surgical tool occlusion. This work explores and reviews state-of-the-art (SOTA) approaches, discussing their innovations and implementation principles. Furthermore, we replicate the models and conduct testing and evaluation on two datasets. The test results demonstrate that with advancements in these techniques, achieving real-time, high-quality reconstructions becomes feasible.", + "arxiv_url": "http://arxiv.org/abs/2408.04426v1", + "pdf_url": "http://arxiv.org/pdf/2408.04426v1", + "published_date": "2024-08-08", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "InstantStyleGaussian: Efficient Art Style Transfer with 3D Gaussian Splatting", + "authors": [ + "Xin-Yi Yu", + "Jun-Xin Yu", + "Li-Bo Zhou", + "Yan Wei", + "Lin-Lin Ou" + ], + "abstract": "We present InstantStyleGaussian, an innovative 3D style transfer method based on the 3D Gaussian Splatting (3DGS) scene representation. By inputting a target-style image, it quickly generates new 3D GS scenes. Our method operates on pre-reconstructed GS scenes, combining diffusion models with an improved iterative dataset update strategy. It utilizes diffusion models to generate target style images, adds these new images to the training dataset, and uses this dataset to iteratively update and optimize the GS scenes, significantly accelerating the style editing process while ensuring the quality of the generated scenes. Extensive experimental results demonstrate that our method ensures high-quality stylized scenes while offering significant advantages in style transfer speed and consistency.", + "arxiv_url": "http://arxiv.org/abs/2408.04249v2", + "pdf_url": "http://arxiv.org/pdf/2408.04249v2", + "published_date": "2024-08-08", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM", + "authors": [ + "Yan Song Hu", + "Dayou Mao", + "Yuhao Chen", + "John Zelek" + ], + "abstract": "Initial applications of 3D Gaussian Splatting (3DGS) in Visual Simultaneous Localization and Mapping (VSLAM) demonstrate the generation of high-quality volumetric reconstructions from monocular video streams. However, despite these promising advancements, current 3DGS integrations have reduced tracking performance and lower operating speeds compared to traditional VSLAM. To address these issues, we propose integrating 3DGS with Direct Sparse Odometry, a monocular photometric SLAM system. We have done preliminary experiments showing that using Direct Sparse Odometry point cloud outputs, as opposed to standard structure-from-motion methods, significantly shortens the training time needed to achieve high-quality renders. Reducing 3DGS training time enables the development of 3DGS-integrated SLAM systems that operate in real-time on mobile hardware. These promising initial findings suggest further exploration is warranted in combining traditional VSLAM systems with 3DGS.", + "arxiv_url": "http://arxiv.org/abs/2408.03825v1", + "pdf_url": "http://arxiv.org/pdf/2408.03825v1", + "published_date": "2024-08-07", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields", + "authors": [ + "Joo Chan Lee", + "Daniel Rho", + "Xiangyu Sun", + "Jong Hwan Ko", + "Eunbyung Park" + ], + "abstract": "3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussian-based representation and introduces an approximated volumetric rendering, achieving very fast rendering speed and promising image quality. Furthermore, subsequent studies have successfully extended 3DGS to dynamic 3D scenes, demonstrating its wide range of applications. However, a significant drawback arises as 3DGS and its following methods entail a substantial number of Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric and temporal attributes by residual vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25x reduced storage and enhanced rendering speed compared to 3DGS for static scenes, while maintaining the quality of the scene representation. For dynamic scenes, our approach achieves more than 12x storage efficiency and retains a high-quality reconstruction compared to the existing state-of-the-art methods. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/.", + "arxiv_url": "http://arxiv.org/abs/2408.03822v1", + "pdf_url": "http://arxiv.org/pdf/2408.03822v1", + "published_date": "2024-08-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting", + "authors": [ + "Zhe Jun Tang", + "Tat-Jen Cham" + ], + "abstract": "The use of 3D Gaussians as representation of radiance fields has enabled high quality novel view synthesis at real-time rendering speed. However, the choice of optimising the outgoing radiance of each Gaussian independently as spherical harmonics results in unsatisfactory view dependent effects. In response to these limitations, our work, Factorised Tensorial Illumination for 3D Gaussian Splatting, or 3iGS, improves upon 3D Gaussian Splatting (3DGS) rendering quality. Instead of optimising a single outgoing radiance parameter, 3iGS enhances 3DGS view-dependent effects by expressing the outgoing radiance as a function of a local illumination field and Bidirectional Reflectance Distribution Function (BRDF) features. We optimise a continuous incident illumination field through a Tensorial Factorisation representation, while separately fine-tuning the BRDF features of each 3D Gaussian relative to this illumination field. Our methodology significantly enhances the rendering quality of specular view-dependent effects of 3DGS, while maintaining rapid training and rendering speeds.", + "arxiv_url": "http://arxiv.org/abs/2408.03753v1", + "pdf_url": "http://arxiv.org/pdf/2408.03753v1", + "published_date": "2024-08-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting", + "authors": [ + "Yijia Guo", + "Yuanxi Bai", + "Liwen Hu", + "Ziyi Guo", + "Mianzhi Liu", + "Yu Cai", + "Tiejun Huang", + "Lei Ma" + ], + "abstract": "We proposed Precomputed RadianceTransfer of GaussianSplats (PRTGS), a real-time high-quality relighting method for Gaussian splats in low-frequency lighting environments that captures soft shadows and interreflections by precomputing 3D Gaussian splats' radiance transfer. Existing studies have demonstrated that 3D Gaussian splatting (3DGS) outperforms neural fields' efficiency for dynamic lighting scenarios. However, the current relighting method based on 3DGS still struggles to compute high-quality shadow and indirect illumination in real time for dynamic light, leading to unrealistic rendering results. We solve this problem by precomputing the expensive transport simulations required for complex transfer functions like shadowing, the resulting transfer functions are represented as dense sets of vectors or matrices for every Gaussian splat. We introduce distinct precomputing methods tailored for training and rendering stages, along with unique ray tracing and indirect lighting precomputation techniques for 3D Gaussian splats to accelerate training speed and compute accurate indirect lighting related to environment light. Experimental analyses demonstrate that our approach achieves state-of-the-art visual quality while maintaining competitive training times and allows high-quality real-time (30+ fps) relighting for dynamic light and relatively complex scenes at 1080p resolution.", + "arxiv_url": "http://arxiv.org/abs/2408.03538v1", + "pdf_url": "http://arxiv.org/pdf/2408.03538v1", + "published_date": "2024-08-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving", + "authors": [ + "Amirhosein Chahe", + "Lifeng Zhou" + ], + "abstract": "This paper introduces a novel method for open-vocabulary 3D scene understanding in autonomous driving by combining Language Embedded 3D Gaussians with Large Language Models (LLMs) for enhanced inference. We propose utilizing LLMs to generate contextually relevant canonical phrases for segmentation and scene interpretation. Our method leverages the contextual and semantic capabilities of LLMs to produce a set of canonical phrases, which are then compared with the language features embedded in the 3D Gaussians. This LLM-guided approach significantly improves zero-shot scene understanding and detection of objects of interest, even in the most challenging or unfamiliar environments. Experimental results on the WayveScenes101 dataset demonstrate that our approach surpasses state-of-the-art methods in terms of accuracy and flexibility for open-vocabulary object detection and segmentation. This work represents a significant advancement towards more intelligent, context-aware autonomous driving systems, effectively bridging 3D scene representation with high-level semantic understanding.", + "arxiv_url": "http://arxiv.org/abs/2408.03516v1", + "pdf_url": "http://arxiv.org/pdf/2408.03516v1", + "published_date": "2024-08-07", + "categories": [ + "cs.CV", + "cs.LG", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LumiGauss: High-Fidelity Outdoor Relighting with 2D Gaussian Splatting", + "authors": [ + "Joanna Kaleta", + "Kacper Kania", + "Tomasz Trzcinski", + "Marek Kowalski" + ], + "abstract": "Decoupling lighting from geometry using unconstrained photo collections is notoriously challenging. Solving it would benefit many users, as creating complex 3D assets takes days of manual labor. Many previous works have attempted to address this issue, often at the expense of output fidelity, which questions the practicality of such methods. We introduce LumiGauss, a technique that tackles 3D reconstruction of scenes and environmental lighting through 2D Gaussian Splatting. Our approach yields high-quality scene reconstructions and enables realistic lighting synthesis under novel environment maps. We also propose a method for enhancing the quality of shadows, common in outdoor scenes, by exploiting spherical harmonics properties. Our approach facilitates seamless integration with game engines and enables the use of fast precomputed radiance transfer. We validate our method on the NeRF-OSR dataset, demonstrating superior performance over baseline methods. Moreover, LumiGauss can synthesize realistic images when applying novel environment maps.", + "arxiv_url": "http://arxiv.org/abs/2408.04474v1", + "pdf_url": "http://arxiv.org/pdf/2408.04474v1", + "published_date": "2024-08-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness", + "authors": [ + "Lutao Jiang", + "Hangyu Li", + "Lin Wang" + ], + "abstract": "Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies suffer from two critical yet challenging problems: 1) the final shapes are still similar to the initial ones even after training; 2) shapes can be produced only from simple texts, e.g., \"a dog\", not for lexically richer texts, e.g., \"a dog is sitting on the top of the airplane\". To address these problems, this paper proposes a novel general framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness. Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes while enabling the spatial interaction among the 3D Gaussians and semantic interaction between Gaussians and texts. Specifically, we first construct a voxelized representation, where each voxel holds a 3D Gaussian with its position, scale, and rotation fixed while setting opacity as the sole factor to determine a position's occupancy. We then design an initialization network mainly consisting of two novel components: 1) Global Information Perception (GIP) block and 2) Gaussians-Text Fusion (GTF) block. Such a design enables each 3D Gaussian to assimilate the spatial information from other areas and semantic information from texts. Extensive experiments show the superiority of our framework of high-quality 3D GS initialization against the existing methods, e.g., Shap-E, by taking lexically simple, medium, and hard texts. Also, our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.", + "arxiv_url": "http://arxiv.org/abs/2408.01269v1", + "pdf_url": "http://arxiv.org/pdf/2408.01269v1", + "published_date": "2024-08-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Reality Fusion: Robust Real-time Immersive Mobile Robot Teleoperation with Volumetric Visual Data Fusion", + "authors": [ + "Ke Li", + "Reinhard Bacher", + "Susanne Schmidt", + "Wim Leemans", + "Frank Steinicke" + ], + "abstract": "We introduce Reality Fusion, a novel robot teleoperation system that localizes, streams, projects, and merges a typical onboard depth sensor with a photorealistic, high resolution, high framerate, and wide field of view (FoV) rendering of the complex remote environment represented as 3D Gaussian splats (3DGS). Our framework enables robust egocentric and exocentric robot teleoperation in immersive VR, with the 3DGS effectively extending spatial information of a depth sensor with limited FoV and balancing the trade-off between data streaming costs and data visual quality. We evaluated our framework through a user study with 24 participants, which revealed that Reality Fusion leads to significantly better user performance, situation awareness, and user preferences. To support further research and development, we provide an open-source implementation with an easy-to-replicate custom-made telepresence robot, a high-performance virtual reality 3DGS renderer, and an immersive robot control package. (Source code: https://github.com/uhhhci/RealityFusion)", + "arxiv_url": "http://arxiv.org/abs/2408.01225v1", + "pdf_url": "http://arxiv.org/pdf/2408.01225v1", + "published_date": "2024-08-02", + "categories": [ + "cs.RO" + ], + "github_url": "https://github.com/uhhhci/RealityFusion", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "IG-SLAM: Instant Gaussian SLAM", + "authors": [ + "F. Aykut Sarikamis", + "A. Aydin Alatan" + ], + "abstract": "3D Gaussian Splatting has recently shown promising results as an alternative scene representation in SLAM systems to neural implicit representations. However, current methods either lack dense depth maps to supervise the mapping process or detailed training designs that consider the scale of the environment. To address these drawbacks, we present IG-SLAM, a dense RGB-only SLAM system that employs robust Dense-SLAM methods for tracking and combines them with Gaussian Splatting. A 3D map of the environment is constructed using accurate pose and dense depth provided by tracking. Additionally, we utilize depth uncertainty in map optimization to improve 3D reconstruction. Our decay strategy in map optimization enhances convergence and allows the system to run at 10 fps in a single process. We demonstrate competitive performance with state-of-the-art RGB-only SLAM systems while achieving faster operation speeds. We present our experiments on the Replica, TUM-RGBD, ScanNet, and EuRoC datasets. The system achieves photo-realistic 3D reconstruction in large-scale sequences, particularly in the EuRoC dataset.", + "arxiv_url": "http://arxiv.org/abs/2408.01126v2", + "pdf_url": "http://arxiv.org/pdf/2408.01126v2", + "published_date": "2024-08-02", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head", + "authors": [ + "Qianyun He", + "Xinya Ji", + "Yicheng Gong", + "Yuanxun Lu", + "Zhengyu Diao", + "Linjia Huang", + "Yao Yao", + "Siyu Zhu", + "Zhan Ma", + "Songcen Xu", + "Xiaofei Wu", + "Zixiao Zhang", + "Xun Cao", + "Hao Zhu" + ], + "abstract": "We present a novel approach for synthesizing 3D talking heads with controllable emotion, featuring enhanced lip synchronization and rendering quality. Despite significant progress in the field, prior methods still suffer from multi-view consistency and a lack of emotional expressiveness. To address these issues, we collect EmoTalk3D dataset with calibrated multi-view videos, emotional annotations, and per-frame 3D geometry. By training on the EmoTalk3D dataset, we propose a \\textit{`Speech-to-Geometry-to-Appearance'} mapping framework that first predicts faithful 3D geometry sequence from the audio features, then the appearance of a 3D talking head represented by 4D Gaussians is synthesized from the predicted geometry. The appearance is further disentangled into canonical and dynamic Gaussians, learned from multi-view videos, and fused to render free-view talking head animation. Moreover, our model enables controllable emotion in the generated talking heads and can be rendered in wide-range views. Our method exhibits improved rendering quality and stability in lip motion generation while capturing dynamic facial details such as wrinkles and subtle expressions. Experiments demonstrate the effectiveness of our approach in generating high-fidelity and emotion-controllable 3D talking heads. The code and EmoTalk3D dataset are released at https://nju-3dv.github.io/projects/EmoTalk3D.", + "arxiv_url": "http://arxiv.org/abs/2408.00297v1", + "pdf_url": "http://arxiv.org/pdf/2408.00297v1", + "published_date": "2024-08-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LoopSparseGS: Loop Based Sparse-View Friendly Gaussian Splatting", + "authors": [ + "Zhenyu Bao", + "Guibiao Liao", + "Kaichen Zhou", + "Kanglin Liu", + "Qing Li", + "Guoping Qiu" + ], + "abstract": "Despite the photorealistic novel view synthesis (NVS) performance achieved by the original 3D Gaussian splatting (3DGS), its rendering quality significantly degrades with sparse input views. This performance drop is mainly caused by the limited number of initial points generated from the sparse input, insufficient supervision during the training process, and inadequate regularization of the oversized Gaussian ellipsoids. To handle these issues, we propose the LoopSparseGS, a loop-based 3DGS framework for the sparse novel view synthesis task. In specific, we propose a loop-based Progressive Gaussian Initialization (PGI) strategy that could iteratively densify the initialized point cloud using the rendered pseudo images during the training process. Then, the sparse and reliable depth from the Structure from Motion, and the window-based dense monocular depth are leveraged to provide precise geometric supervision via the proposed Depth-alignment Regularization (DAR). Additionally, we introduce a novel Sparse-friendly Sampling (SFS) strategy to handle oversized Gaussian ellipsoids leading to large pixel errors. Comprehensive experiments on four datasets demonstrate that LoopSparseGS outperforms existing state-of-the-art methods for sparse-input novel view synthesis, across indoor, outdoor, and object-level scenes with various image resolutions.", + "arxiv_url": "http://arxiv.org/abs/2408.00254v1", + "pdf_url": "http://arxiv.org/pdf/2408.00254v1", + "published_date": "2024-08-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Localized Gaussian Splatting Editing with Contextual Awareness", + "authors": [ + "Hanyuan Xiao", + "Yingshu Chen", + "Huajian Huang", + "Haolin Xiong", + "Jing Yang", + "Pratusha Prasad", + "Yajie Zhao" + ], + "abstract": "Recent text-guided generation of individual 3D object has achieved great success using diffusion priors. However, these methods are not suitable for object insertion and replacement tasks as they do not consider the background, leading to illumination mismatches within the environment. To bridge the gap, we introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS) representation. Our key observation is that inpainting by the state-of-the-art conditional 2D diffusion model is consistent with background in lighting. To leverage the prior knowledge from the well-trained diffusion models for 3D object generation, our approach employs a coarse-to-fine objection optimization pipeline with inpainted views. In the first coarse step, we achieve image-to-3D lifting given an ideal inpainted view. The process employs 3D-aware diffusion prior from a view-conditioned diffusion model, which preserves illumination present in the conditioning image. To acquire an ideal inpainted image, we introduce an Anchor View Proposal (AVP) algorithm to find a single view that best represents the scene illumination in target region. In the second Texture Enhancement step, we introduce a novel Depth-guided Inpainting Score Distillation Sampling (DI-SDS), which enhances geometry and texture details with the inpainting diffusion prior, beyond the scope of the 3D-aware diffusion prior knowledge in the first coarse step. DI-SDS not only provides fine-grained texture enhancement, but also urges optimization to respect scene lighting. Our approach efficiently achieves local editing with global illumination consistency without explicitly modeling light transport. We demonstrate robustness of our method by evaluating editing in real scenes containing explicit highlight and shadows, and compare against the state-of-the-art text-to-3D editing methods.", + "arxiv_url": "http://arxiv.org/abs/2408.00083v1", + "pdf_url": "http://arxiv.org/pdf/2408.00083v1", + "published_date": "2024-07-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Expressive Whole-Body 3D Gaussian Avatar", + "authors": [ + "Gyeongsik Moon", + "Takaaki Shiratori", + "Shunsuke Saito" + ], + "abstract": "Facial expression and hand motions are necessary to express our emotions and interact with the world. Nevertheless, most of the 3D human avatars modeled from a casually captured video only support body motions without facial expressions and hand motions.In this work, we present ExAvatar, an expressive whole-body 3D human avatar learned from a short monocular video. We design ExAvatar as a combination of the whole-body parametric mesh model (SMPL-X) and 3D Gaussian Splatting (3DGS). The main challenges are 1) a limited diversity of facial expressions and poses in the video and 2) the absence of 3D observations, such as 3D scans and RGBD images. The limited diversity in the video makes animations with novel facial expressions and poses non-trivial. In addition, the absence of 3D observations could cause significant ambiguity in human parts that are not observed in the video, which can result in noticeable artifacts under novel motions. To address them, we introduce our hybrid representation of the mesh and 3D Gaussians. Our hybrid representation treats each 3D Gaussian as a vertex on the surface with pre-defined connectivity information (i.e., triangle faces) between them following the mesh topology of SMPL-X. It makes our ExAvatar animatable with novel facial expressions by driven by the facial expression space of SMPL-X. In addition, by using connectivity-based regularizers, we significantly reduce artifacts in novel facial expressions and poses.", + "arxiv_url": "http://arxiv.org/abs/2407.21686v1", + "pdf_url": "http://arxiv.org/pdf/2407.21686v1", + "published_date": "2024-07-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SceneTeller: Language-to-3D Scene Generation", + "authors": [ + "Başak Melis Öcal", + "Maxim Tatarchenko", + "Sezer Karaoglu", + "Theo Gevers" + ], + "abstract": "Designing high-quality indoor 3D scenes is important in many practical applications, such as room planning or game development. Conventionally, this has been a time-consuming process which requires both artistic skill and familiarity with professional software, making it hardly accessible for layman users. However, recent advances in generative AI have established solid foundation for democratizing 3D design. In this paper, we propose a pioneering approach for text-based 3D room design. Given a prompt in natural language describing the object placement in the room, our method produces a high-quality 3D scene corresponding to it. With an additional text prompt the users can change the appearance of the entire scene or of individual objects in it. Built using in-context learning, CAD model retrieval and 3D-Gaussian-Splatting-based stylization, our turnkey pipeline produces state-of-the-art 3D scenes, while being easy to use even for novices. Our project page is available at https://sceneteller.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2407.20727v1", + "pdf_url": "http://arxiv.org/pdf/2407.20727v1", + "published_date": "2024-07-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Improving 2D Feature Representations by 3D-Aware Fine-Tuning", + "authors": [ + "Yuanwen Yue", + "Anurag Das", + "Francis Engelmann", + "Siyu Tang", + "Jan Eric Lenssen" + ], + "abstract": "Current visual foundation models are trained purely on unstructured 2D data, limiting their understanding of 3D structure of objects and scenes. In this work, we show that fine-tuning on 3D-aware data improves the quality of emerging semantic features. We design a method to lift semantic 2D features into an efficient 3D Gaussian representation, which allows us to re-render them for arbitrary views. Using the rendered 3D-aware features, we design a fine-tuning strategy to transfer such 3D awareness into a 2D foundation model. We demonstrate that models fine-tuned in that way produce features that readily improve downstream task performance in semantic segmentation and depth estimation through simple linear probing. Notably, though fined-tuned on a single indoor dataset, the improvement is transferable to a variety of indoor datasets and out-of-domain datasets. We hope our study encourages the community to consider injecting 3D awareness when training 2D foundation models. Project page: https://ywyue.github.io/FiT3D.", + "arxiv_url": "http://arxiv.org/abs/2407.20229v1", + "pdf_url": "http://arxiv.org/pdf/2407.20229v1", + "published_date": "2024-07-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Registering Neural 4D Gaussians for Endoscopic Surgery", + "authors": [ + "Yiming Huang", + "Beilei Cui", + "Ikemura Kei", + "Jiekai Zhang", + "Long Bai", + "Hongliang Ren" + ], + "abstract": "The recent advance in neural rendering has enabled the ability to reconstruct high-quality 4D scenes using neural networks. Although 4D neural reconstruction is popular, registration for such representations remains a challenging task, especially for dynamic scene registration in surgical planning and simulation. In this paper, we propose a novel strategy for dynamic surgical neural scene registration. We first utilize 4D Gaussian Splatting to represent the surgical scene and capture both static and dynamic scenes effectively. Then, a spatial aware feature aggregation method, Spatially Weight Cluttering (SWC) is proposed to accurately align the feature between surgical scenes, enabling precise and realistic surgical simulations. Lastly, we present a novel strategy of deformable scene registration to register two dynamic scenes. By incorporating both spatial and temporal information for correspondence matching, our approach achieves superior performance compared to existing registration methods for implicit neural representation. The proposed method has the potential to improve surgical planning and training, ultimately leading to better patient outcomes.", + "arxiv_url": "http://arxiv.org/abs/2407.20213v1", + "pdf_url": "http://arxiv.org/pdf/2407.20213v1", + "published_date": "2024-07-29", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Radiance Fields for Robotic Teleoperation", + "authors": [ + "Maximum Wilder-Smith", + "Vaishakh Patil", + "Marco Hutter" + ], + "abstract": "Radiance field methods such as Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS), have revolutionized graphics and novel view synthesis. Their ability to synthesize new viewpoints with photo-realistic quality, as well as capture complex volumetric and specular scenes, makes them an ideal visualization for robotic teleoperation setups. Direct camera teleoperation provides high-fidelity operation at the cost of maneuverability, while reconstruction-based approaches offer controllable scenes with lower fidelity. With this in mind, we propose replacing the traditional reconstruction-visualization components of the robotic teleoperation pipeline with online Radiance Fields, offering highly maneuverable scenes with photorealistic quality. As such, there are three main contributions to state of the art: (1) online training of Radiance Fields using live data from multiple cameras, (2) support for a variety of radiance methods including NeRF and 3DGS, (3) visualization suite for these methods including a virtual reality scene. To enable seamless integration with existing setups, these components were tested with multiple robots in multiple configurations and were displayed using traditional tools as well as the VR headset. The results across methods and robots were compared quantitatively to a baseline of mesh reconstruction, and a user study was conducted to compare the different visualization methods. For videos and code, check out https://leggedrobotics.github.io/rffr.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2407.20194v1", + "pdf_url": "http://arxiv.org/pdf/2407.20194v1", + "published_date": "2024-07-29", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting", + "authors": [ + "Shen Chen", + "Jiale Zhou", + "Zhongyu Jiang", + "Tianfang Zhang", + "Zongkai Wu", + "Jenq-Neng Hwang", + "Lei Li" + ], + "abstract": "The creation of high-quality 3D assets is paramount for applications in digital heritage preservation, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for the modeling, texturing, and rendering of 3D objects. However, the rising demand for 3D assets in gaming and virtual reality (VR) has led to the creation of accessible image-to-3D technologies, allowing non-professionals to produce 3D content and decreasing dependence on expert input. Existing methods for 3D content generation struggle to simultaneously achieve detailed textures and strong geometric consistency. We introduce a novel 3D content creation framework, ScalingGaussian, which combines 3D and 2D diffusion models to achieve detailed textures and geometric consistency in generated 3D assets. Initially, a 3D diffusion model generates point clouds, which are then densified through a process of selecting local regions, introducing Gaussian noise, followed by using local density-weighted selection. To refine the 3D gaussians, we utilize a 2D diffusion model with Score Distillation Sampling (SDS) loss, guiding the 3D Gaussians to clone and split. Finally, the 3D Gaussians are converted into meshes, and the surface textures are optimized using Mean Square Error(MSE) and Gradient Profile Prior(GPP) losses. Our method addresses the common issue of sparse point clouds in 3D diffusion, resulting in improved geometric structure and detailed textures. Experiments on image-to-3D tasks demonstrate that our approach efficiently generates high-quality 3D assets.", + "arxiv_url": "http://arxiv.org/abs/2407.19035v1", + "pdf_url": "http://arxiv.org/pdf/2407.19035v1", + "published_date": "2024-07-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution", + "authors": [ + "Jintong Hu", + "Bin Xia", + "Bin Chen", + "Wenming Yang", + "Lei Zhang" + ], + "abstract": "Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance is constrained by the limited representation ability of discrete latent codes in the encoded features. In this paper, we propose a novel ASSR method named GaussianSR that overcomes this limitation through 2D Gaussian Splatting (2DGS). Unlike traditional methods that treat pixels as discrete points, GaussianSR represents each pixel as a continuous Gaussian field. The encoded features are simultaneously refined and upsampled by rendering the mutually stacked Gaussian fields. As a result, long-range dependencies are established to enhance representation ability. In addition, a classifier is developed to dynamically assign Gaussian kernels to all pixels to further improve flexibility. All components of GaussianSR (i.e., encoder, classifier, Gaussian kernels, and decoder) are jointly learned end-to-end. Experiments demonstrate that GaussianSR achieves superior ASSR performance with fewer parameters than existing methods while enjoying interpretable and content-aware feature aggregations.", + "arxiv_url": "http://arxiv.org/abs/2407.18046v1", + "pdf_url": "http://arxiv.org/pdf/2407.18046v1", + "published_date": "2024-07-25", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities", + "authors": [ + "Yanqi Bao", + "Tianyu Ding", + "Jing Huo", + "Yaoli Liu", + "Yuxin Li", + "Wenbin Li", + "Yang Gao", + "Jiebo Luo" + ], + "abstract": "3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to become a mainstream method for 3D representations. It can effectively transform multi-view images into explicit 3D Gaussian representations through efficient training, and achieve real-time rendering of novel views. This survey aims to analyze existing 3DGS-related works from multiple intersecting perspectives, including related tasks, technologies, challenges, and opportunities. The primary objective is to provide newcomers with a rapid understanding of the field and to assist researchers in methodically organizing existing technologies and challenges. Specifically, we delve into the optimization, application, and extension of 3DGS, categorizing them based on their focuses or motivations. Additionally, we summarize and classify nine types of technical modules and corresponding improvements identified in existing works. Based on these analyses, we further examine the common challenges and technologies across various tasks, proposing potential research opportunities.", + "arxiv_url": "http://arxiv.org/abs/2407.17418v1", + "pdf_url": "http://arxiv.org/pdf/2407.17418v1", + "published_date": "2024-07-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DHGS: Decoupled Hybrid Gaussian Splatting for Driving Scene", + "authors": [ + "Xi Shi", + "Lingli Chen", + "Peng Wei", + "Xi Wu", + "Tian Jiang", + "Yonggang Luo", + "Lecheng Xie" + ], + "abstract": "Existing Gaussian splatting methods often fall short in achieving satisfactory novel view synthesis in driving scenes, primarily due to the absence of crafty designs and geometric constraints for the involved elements. This paper introduces a novel neural rendering method termed Decoupled Hybrid Gaussian Splatting (DHGS), targeting at promoting the rendering quality of novel view synthesis for static driving scenes. The novelty of this work lies in the decoupled and hybrid pixel-level blender for road and non-road layers, without the conventional unified differentiable rendering logic for the entire scene. Still, consistency and continuity in superimposition are preserved through the proposed depth-ordered hybrid rendering strategy. Additionally, an implicit road representation comprised of a Signed Distance Function (SDF) is trained to supervise the road surface with subtle geometric attributes. Accompanied by the use of auxiliary transmittance loss and consistency loss, novel images with imperceptible boundary and elevated fidelity are ultimately obtained. Substantial experiments on the Waymo dataset prove that DHGS outperforms the state-of-the-art methods. The project page where more video evidences are given is: https://ironbrotherstyle.github.io/dhgs_web.", + "arxiv_url": "http://arxiv.org/abs/2407.16600v3", + "pdf_url": "http://arxiv.org/pdf/2407.16600v3", + "published_date": "2024-07-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images", + "authors": [ + "Shreyas Singh", + "Aryan Garg", + "Kaushik Mitra" + ], + "abstract": "The recent advent of 3D Gaussian Splatting (3DGS) has revolutionized the 3D scene reconstruction space enabling high-fidelity novel view synthesis in real-time. However, with the exception of RawNeRF, all prior 3DGS and NeRF-based methods rely on 8-bit tone-mapped Low Dynamic Range (LDR) images for scene reconstruction. Such methods struggle to achieve accurate reconstructions in scenes that require a higher dynamic range. Examples include scenes captured in nighttime or poorly lit indoor spaces having a low signal-to-noise ratio, as well as daylight scenes with shadow regions exhibiting extreme contrast. Our proposed method HDRSplat tailors 3DGS to train directly on 14-bit linear raw images in near darkness which preserves the scenes' full dynamic range and content. Our key contributions are two-fold: Firstly, we propose a linear HDR space-suited loss that effectively extracts scene information from noisy dark regions and nearly saturated bright regions simultaneously, while also handling view-dependent colors without increasing the degree of spherical harmonics. Secondly, through careful rasterization tuning, we implicitly overcome the heavy reliance and sensitivity of 3DGS on point cloud initialization. This is critical for accurate reconstruction in regions of low texture, high depth of field, and low illumination. HDRSplat is the fastest method to date that does 14-bit (HDR) 3D scene reconstruction in $\\le$15 minutes/scene ($\\sim$30x faster than prior state-of-the-art RawNeRF). It also boasts the fastest inference speed at $\\ge$120fps. We further demonstrate the applicability of our HDR scene reconstruction by showcasing various applications like synthetic defocus, dense depth map extraction, and post-capture control of exposure, tone-mapping and view-point.", + "arxiv_url": "http://arxiv.org/abs/2407.16503v1", + "pdf_url": "http://arxiv.org/pdf/2407.16503v1", + "published_date": "2024-07-23", + "categories": [ + "cs.CV", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Integrating Meshes and 3D Gaussians for Indoor Scene Reconstruction with SAM Mask Guidance", + "authors": [ + "Jiyeop Kim", + "Jongwoo Lim" + ], + "abstract": "We present a novel approach for 3D indoor scene reconstruction that combines 3D Gaussian Splatting (3DGS) with mesh representations. We use meshes for the room layout of the indoor scene, such as walls, ceilings, and floors, while employing 3D Gaussians for other objects. This hybrid approach leverages the strengths of both representations, offering enhanced flexibility and ease of editing. However, joint training of meshes and 3D Gaussians is challenging because it is not clear which primitive should affect which part of the rendered image. Objects close to the room layout often struggle during training, particularly when the room layout is textureless, which can lead to incorrect optimizations and unnecessary 3D Gaussians. To overcome these challenges, we employ Segment Anything Model (SAM) to guide the selection of primitives. The SAM mask loss enforces each instance to be represented by either Gaussians or meshes, ensuring clear separation and stable training. Furthermore, we introduce an additional densification stage without resetting the opacity after the standard densification. This stage mitigates the degradation of image quality caused by a limited number of 3D Gaussians after the standard densification.", + "arxiv_url": "http://arxiv.org/abs/2407.16173v1", + "pdf_url": "http://arxiv.org/pdf/2407.16173v1", + "published_date": "2024-07-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model", + "authors": [ + "Matteo Bortolon", + "Theodore Tsesmelis", + "Stuart James", + "Fabio Poiesi", + "Alessio Del Bue" + ], + "abstract": "We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (e.g. iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation. The proposed solution obviates the necessity of an \"a priori\" pose for initialization, and it solves 6DoF pose estimation in closed form, without the need for iterations. Moreover, compared to the existing Novel View Synthesis (NVS) baselines for pose estimation, 6DGS can improve the overall average rotational accuracy by 12% and translation accuracy by 22% on real scenes, despite not requiring any initialization pose. At the same time, our method operates near real-time, reaching 15fps on consumer hardware.", + "arxiv_url": "http://arxiv.org/abs/2407.15484v1", + "pdf_url": "http://arxiv.org/pdf/2407.15484v1", + "published_date": "2024-07-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Enhancement of 3D Gaussian Splatting using Raw Mesh for Photorealistic Recreation of Architectures", + "authors": [ + "Ruizhe Wang", + "Chunliang Hua", + "Tomakayev Shingys", + "Mengyuan Niu", + "Qingxin Yang", + "Lizhong Gao", + "Yi Zheng", + "Junyan Yang", + "Qiao Wang" + ], + "abstract": "The photorealistic reconstruction and rendering of architectural scenes have extensive applications in industries such as film, games, and transportation. It also plays an important role in urban planning, architectural design, and the city's promotion, especially in protecting historical and cultural relics. The 3D Gaussian Splatting, due to better performance over NeRF, has become a mainstream technology in 3D reconstruction. Its only input is a set of images but it relies heavily on geometric parameters computed by the SfM process. At the same time, there is an existing abundance of raw 3D models, that could inform the structural perception of certain buildings but cannot be applied. In this paper, we propose a straightforward method to harness these raw 3D models to guide 3D Gaussians in capturing the basic shape of the building and improve the visual quality of textures and details when photos are captured non-systematically. This exploration opens up new possibilities for improving the effectiveness of 3D reconstruction techniques in the field of architectural design.", + "arxiv_url": "http://arxiv.org/abs/2407.15435v2", + "pdf_url": "http://arxiv.org/pdf/2407.15435v2", + "published_date": "2024-07-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions", + "authors": [ + "Haiyang Zhou", + "Xinhua Cheng", + "Wangbo Yu", + "Yonghong Tian", + "Li Yuan" + ], + "abstract": "3D scene generation is in high demand across various domains, including virtual reality, gaming, and the film industry. Owing to the powerful generative capabilities of text-to-image diffusion models that provide reliable priors, the creation of 3D scenes using only text prompts has become viable, thereby significantly advancing researches in text-driven 3D scene generation. In order to obtain multiple-view supervision from 2D diffusion models, prevailing methods typically employ the diffusion model to generate an initial local image, followed by iteratively outpainting the local image using diffusion models to gradually generate scenes. Nevertheless, these outpainting-based approaches prone to produce global inconsistent scene generation results without high degree of completeness, restricting their broader applications. To tackle these problems, we introduce HoloDreamer, a framework that first generates high-definition panorama as a holistic initialization of the full 3D scene, then leverage 3D Gaussian Splatting (3D-GS) to quickly reconstruct the 3D scene, thereby facilitating the creation of view-consistent and fully enclosed 3D scenes. Specifically, we propose Stylized Equirectangular Panorama Generation, a pipeline that combines multiple diffusion models to enable stylized and detailed equirectangular panorama generation from complex text prompts. Subsequently, Enhanced Two-Stage Panorama Reconstruction is introduced, conducting a two-stage optimization of 3D-GS to inpaint the missing region and enhance the integrity of the scene. Comprehensive experiments demonstrated that our method outperforms prior works in terms of overall visual consistency and harmony as well as reconstruction quality and rendering robustness when generating fully enclosed scenes.", + "arxiv_url": "http://arxiv.org/abs/2407.15187v1", + "pdf_url": "http://arxiv.org/pdf/2407.15187v1", + "published_date": "2024-07-21", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GPHM: Gaussian Parametric Head Model for Monocular Head Avatar Reconstruction", + "authors": [ + "Yuelang Xu", + "Zhaoqi Su", + "Qingyao Wu", + "Yebin Liu" + ], + "abstract": "Creating high-fidelity 3D human head avatars is crucial for applications in VR/AR, digital human, and film production. Recent advances have leveraged morphable face models to generate animated head avatars from easily accessible data, representing varying identities and expressions within a low-dimensional parametric space. However, existing methods often struggle with modeling complex appearance details, e.g., hairstyles, and suffer from low rendering quality and efficiency. In this paper we introduce a novel approach, 3D Gaussian Parametric Head Model, which employs 3D Gaussians to accurately represent the complexities of the human head, allowing precise control over both identity and expression. The Gaussian model can handle intricate details, enabling realistic representations of varying appearances and complex expressions. Furthermore, we presents a well-designed training framework to ensure smooth convergence, providing a robust guarantee for learning the rich content. Our method achieves high-quality, photo-realistic rendering with real-time efficiency, making it a valuable contribution to the field of parametric head models. Finally, we apply the 3D Gaussian Parametric Head Model to monocular video or few-shot head avatar reconstruction tasks, which enables instant reconstruction of high-quality 3D head avatars even when input data is extremely limited, surpassing previous methods in terms of reconstruction quality and training speed.", + "arxiv_url": "http://arxiv.org/abs/2407.15070v2", + "pdf_url": "http://arxiv.org/pdf/2407.15070v2", + "published_date": "2024-07-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting", + "authors": [ + "Tianle Zeng", + "Gerardo Loza Galindo", + "Junlei Hu", + "Pietro Valdastri", + "Dominic Jones" + ], + "abstract": "Computer vision technologies markedly enhance the automation capabilities of robotic-assisted minimally invasive surgery (RAMIS) through advanced tool tracking, detection, and localization. However, the limited availability of comprehensive surgical datasets for training represents a significant challenge in this field. This research introduces a novel method that employs 3D Gaussian Splatting to generate synthetic surgical datasets. We propose a method for extracting and combining 3D Gaussian representations of surgical instruments and background operating environments, transforming and combining them to generate high-fidelity synthetic surgical scenarios. We developed a data recording system capable of acquiring images alongside tool and camera poses in a surgical scene. Using this pose data, we synthetically replicate the scene, thereby enabling direct comparisons of the synthetic image quality (29.592 PSNR). As a further validation, we compared two YOLOv5 models trained on the synthetic and real data, respectively, and assessed their performance in an unseen real-world test dataset. Comparing the performances, we observe an improvement in neural network performance, with the synthetic-trained model outperforming the real-world trained model by 12%, testing both on real-world data.", + "arxiv_url": "http://arxiv.org/abs/2407.14846v1", + "pdf_url": "http://arxiv.org/pdf/2407.14846v1", + "published_date": "2024-07-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Benchmark for Gaussian Splatting Compression and Quality Assessment Study", + "authors": [ + "Qi Yang", + "Kaifa Yang", + "Yuke Xing", + "Yiling Xu", + "Zhu Li" + ], + "abstract": "To fill the gap of traditional GS compression method, in this paper, we first propose a simple and effective GS data compression anchor called Graph-based GS Compression (GGSC). GGSC is inspired by graph signal processing theory and uses two branches to compress the primitive center and attributes. We split the whole GS sample via KDTree and clip the high-frequency components after the graph Fourier transform. Followed by quantization, G-PCC and adaptive arithmetic coding are used to compress the primitive center and attribute residual matrix to generate the bitrate file. GGSS is the first work to explore traditional GS compression, with advantages that can reveal the GS distortion characteristics corresponding to typical compression operation, such as high-frequency clipping and quantization. Second, based on GGSC, we create a GS Quality Assessment dataset (GSQA) with 120 samples. A subjective experiment is conducted in a laboratory environment to collect subjective scores after rendering GS into Processed Video Sequences (PVS). We analyze the characteristics of different GS distortions based on Mean Opinion Scores (MOS), demonstrating the sensitivity of different attributes distortion to visual quality. The GGSC code and the dataset, including GS samples, MOS, and PVS, are made publicly available at https://github.com/Qi-Yangsjtu/GGSC.", + "arxiv_url": "http://arxiv.org/abs/2407.14197v1", + "pdf_url": "http://arxiv.org/pdf/2407.14197v1", + "published_date": "2024-07-19", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/Qi-Yangsjtu/GGSC", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation", + "authors": [ + "Florian Chabot", + "Nicolas Granger", + "Guillaume Lapouge" + ], + "abstract": "The Bird's-eye View (BeV) representation is widely used for 3D perception from multi-view camera images. It allows to merge features from different cameras into a common space, providing a unified representation of the 3D scene. The key component is the view transformer, which transforms image views into the BeV. However, actual view transformer methods based on geometry or cross-attention do not provide a sufficiently detailed representation of the scene, as they use a sub-sampling of the 3D space that is non-optimal for modeling the fine structures of the environment. In this paper, we propose GaussianBeV, a novel method for transforming image features to BeV by finely representing the scene using a set of 3D gaussians located and oriented in 3D space. This representation is then splattered to produce the BeV feature map by adapting recent advances in 3D representation rendering based on gaussian splatting. GaussianBeV is the first approach to use this 3D gaussian modeling and 3D scene rendering process online, i.e. without optimizing it on a specific scene and directly integrated into a single stage model for BeV scene understanding. Experiments show that the proposed representation is highly effective and place GaussianBeV as the new state-of-the-art on the BeV semantic segmentation task on the nuScenes dataset.", + "arxiv_url": "http://arxiv.org/abs/2407.14108v1", + "pdf_url": "http://arxiv.org/pdf/2407.14108v1", + "published_date": "2024-07-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DirectL: Efficient Radiance Fields Rendering for 3D Light Field Displays", + "authors": [ + "Zongyuan Yang", + "Baolin Liu", + "Yingde Song", + "Yongping Xiong", + "Lan Yi", + "Zhaohe Zhang", + "Xunbo Yu" + ], + "abstract": "Autostereoscopic display, despite decades of development, has not achieved extensive application, primarily due to the daunting challenge of 3D content creation for non-specialists. The emergence of Radiance Field as an innovative 3D representation has markedly revolutionized the domains of 3D reconstruction and generation. This technology greatly simplifies 3D content creation for common users, broadening the applicability of Light Field Displays (LFDs). However, the combination of these two fields remains largely unexplored. The standard paradigm to create optimal content for parallax-based light field displays demands rendering at least 45 slightly shifted views preferably at high resolution per frame, a substantial hurdle for real-time rendering. We introduce DirectL, a novel rendering paradigm for Radiance Fields on 3D displays. We thoroughly analyze the interweaved mapping of spatial rays to screen subpixels, precisely determine the light rays entering the human eye, and propose subpixel repurposing to significantly reduce the pixel count required for rendering. Tailored for the two predominant radiance fields--Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS), we propose corresponding optimized rendering pipelines that directly render the light field images instead of multi-view images. Extensive experiments across various displays and user study demonstrate that DirectL accelerates rendering by up to 40 times compared to the standard paradigm without sacrificing visual quality. Its rendering process-only modification allows seamless integration into subsequent radiance field tasks. Finally, we integrate DirectL into diverse applications, showcasing the stunning visual experiences and the synergy between LFDs and Radiance Fields, which unveils tremendous potential for commercialization applications. \\href{direct-l.github.io}{\\textbf{Project Homepage}", + "arxiv_url": "http://arxiv.org/abs/2407.14053v1", + "pdf_url": "http://arxiv.org/pdf/2407.14053v1", + "published_date": "2024-07-19", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PlacidDreamer: Advancing Harmony in Text-to-3D Generation", + "authors": [ + "Shuo Huang", + "Shikun Sun", + "Zixuan Wang", + "Xiaoyu Qin", + "Yanmin Xiong", + "Yuan Zhang", + "Pengfei Wan", + "Di Zhang", + "Jia Jia" + ], + "abstract": "Recently, text-to-3D generation has attracted significant attention, resulting in notable performance enhancements. Previous methods utilize end-to-end 3D generation models to initialize 3D Gaussians, multi-view diffusion models to enforce multi-view consistency, and text-to-image diffusion models to refine details with score distillation algorithms. However, these methods exhibit two limitations. Firstly, they encounter conflicts in generation directions since different models aim to produce diverse 3D assets. Secondly, the issue of over-saturation in score distillation has not been thoroughly investigated and solved. To address these limitations, we propose PlacidDreamer, a text-to-3D framework that harmonizes initialization, multi-view generation, and text-conditioned generation with a single multi-view diffusion model, while simultaneously employing a novel score distillation algorithm to achieve balanced saturation. To unify the generation direction, we introduce the Latent-Plane module, a training-friendly plug-in extension that enables multi-view diffusion models to provide fast geometry reconstruction for initialization and enhanced multi-view images to personalize the text-to-image diffusion model. To address the over-saturation problem, we propose to view score distillation as a multi-objective optimization problem and introduce the Balanced Score Distillation algorithm, which offers a Pareto Optimal solution that achieves both rich details and balanced saturation. Extensive experiments validate the outstanding capabilities of our PlacidDreamer. The code is available at \\url{https://github.com/HansenHuang0823/PlacidDreamer}.", + "arxiv_url": "http://arxiv.org/abs/2407.13976v1", + "pdf_url": "http://arxiv.org/pdf/2407.13976v1", + "published_date": "2024-07-19", + "categories": [ + "cs.CV", + "I.4.0" + ], + "github_url": "https://github.com/HansenHuang0823/PlacidDreamer", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation", + "authors": [ + "Zongrui Li", + "Minghui Hu", + "Qian Zheng", + "Xudong Jiang" + ], + "abstract": "Although recent advancements in text-to-3D generation have significantly improved generation quality, issues like limited level of detail and low fidelity still persist, which requires further improvement. To understand the essence of those issues, we thoroughly analyze current score distillation methods by connecting theories of consistency distillation to score distillation. Based on the insights acquired through analysis, we propose an optimization framework, Guided Consistency Sampling (GCS), integrated with 3D Gaussian Splatting (3DGS) to alleviate those issues. Additionally, we have observed the persistent oversaturation in the rendered views of generated 3D assets. From experiments, we find that it is caused by unwanted accumulated brightness in 3DGS during optimization. To mitigate this issue, we introduce a Brightness-Equalized Generation (BEG) scheme in 3DGS rendering. Experimental results demonstrate that our approach generates 3D assets with more details and higher fidelity than state-of-the-art methods. The codes are released at https://github.com/LMozart/ECCV2024-GCS-BEG.", + "arxiv_url": "http://arxiv.org/abs/2407.13584v2", + "pdf_url": "http://arxiv.org/pdf/2407.13584v2", + "published_date": "2024-07-18", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/LMozart/ECCV2024-GCS-BEG", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EaDeblur-GS: Event assisted 3D Deblur Reconstruction with Gaussian Splatting", + "authors": [ + "Yuchen Weng", + "Zhengwen Shen", + "Ruofan Chen", + "Qi Wang", + "Jun Wang" + ], + "abstract": "3D deblurring reconstruction techniques have recently seen significant advancements with the development of Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although these techniques can recover relatively clear 3D reconstructions from blurry image inputs, they still face limitations in handling severe blurring and complex camera motion. To address these issues, we propose Event-assisted 3D Deblur Reconstruction with Gaussian Splatting (EaDeblur-GS), which integrates event camera data to enhance the robustness of 3DGS against motion blur. By employing an Adaptive Deviation Estimator (ADE) network to estimate Gaussian center deviations and using novel loss functions, EaDeblur-GS achieves sharp 3D reconstructions in real-time, demonstrating performance comparable to state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2407.13520v3", + "pdf_url": "http://arxiv.org/pdf/2407.13520v3", + "published_date": "2024-07-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Generalizable Human Gaussians for Sparse View Synthesis", + "authors": [ + "Youngjoong Kwon", + "Baole Fang", + "Yixing Lu", + "Haoye Dong", + "Cheng Zhang", + "Francisco Vicente Carrasco", + "Albert Mosella-Montoro", + "Jianjin Xu", + "Shingo Takagi", + "Daeil Kim", + "Aayush Prakash", + "Fernando De la Torre" + ], + "abstract": "Recent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D humans from sparse views presents formidable hurdles due to the inherent complexity of human geometry, resulting in inaccurate reconstructions of geometry and textures. To tackle this challenge, this paper leverages recent advancements in Gaussian Splatting and introduces a new method to learn generalizable human Gaussians that allows photorealistic and accurate view-rendering of a new human subject from a limited set of sparse views in a feed-forward manner. A pivotal innovation of our approach involves reformulating the learning of 3D Gaussian parameters into a regression process defined on the 2D UV space of a human template, which allows leveraging the strong geometry prior and the advantages of 2D convolutions. In addition, a multi-scaffold is proposed to effectively represent the offset details. Our method outperforms recent methods on both within-dataset generalization as well as cross-dataset generalization settings.", + "arxiv_url": "http://arxiv.org/abs/2407.12777v1", + "pdf_url": "http://arxiv.org/pdf/2407.12777v1", + "published_date": "2024-07-17", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections", + "authors": [ + "Congrong Xu", + "Justin Kerr", + "Angjoo Kanazawa" + ], + "abstract": "Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers faster training and real-time rendering, adapting it for unconstrained image collections is non-trivial due to the substantially different architecture. In this paper, we introduce Splatfacto-W, an approach that integrates per-Gaussian neural color features and per-image appearance embeddings into the rasterization process, along with a spherical harmonics-based background model to represent varying photometric appearances and better depict backgrounds. Our key contributions include latent appearance modeling, efficient transient object handling, and precise background modeling. Splatfacto-W delivers high-quality, real-time novel view synthesis with improved scene consistency in in-the-wild scenarios. Our method improves the Peak Signal-to-Noise Ratio (PSNR) by an average of 5.3 dB compared to 3DGS, enhances training speed by 150 times compared to NeRF-based methods, and achieves a similar rendering speed to 3DGS. Additional video results and code integrated into Nerfstudio are available at https://kevinxu02.github.io/splatfactow/.", + "arxiv_url": "http://arxiv.org/abs/2407.12306v2", + "pdf_url": "http://arxiv.org/pdf/2407.12306v2", + "published_date": "2024-07-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVG-Splatting: Multi-View Guided Gaussian Splatting with Adaptive Quantile-Based Geometric Consistency Densification", + "authors": [ + "Zhuoxiao Li", + "Shanliang Yao", + "Yijie Chu", + "Angel F. Garcia-Fernandez", + "Yong Yue", + "Eng Gee Lim", + "Xiaohui Zhu" + ], + "abstract": "In the rapidly evolving field of 3D reconstruction, 3D Gaussian Splatting (3DGS) and 2D Gaussian Splatting (2DGS) represent significant advancements. Although 2DGS compresses 3D Gaussian primitives into 2D Gaussian surfels to effectively enhance mesh extraction quality, this compression can potentially lead to a decrease in rendering quality. Additionally, unreliable densification processes and the calculation of depth through the accumulation of opacity can compromise the detail of mesh extraction. To address this issue, we introduce MVG-Splatting, a solution guided by Multi-View considerations. Specifically, we integrate an optimized method for calculating normals, which, combined with image gradients, helps rectify inconsistencies in the original depth computations. Additionally, utilizing projection strategies akin to those in Multi-View Stereo (MVS), we propose an adaptive quantile-based method that dynamically determines the level of additional densification guided by depth maps, from coarse to fine detail. Experimental evidence demonstrates that our method not only resolves the issues of rendering quality degradation caused by depth discrepancies but also facilitates direct mesh extraction from dense Gaussian point clouds using the Marching Cubes algorithm. This approach significantly enhances the overall fidelity and accuracy of the 3D reconstruction process, ensuring that both the geometric details and visual quality.", + "arxiv_url": "http://arxiv.org/abs/2407.11840v1", + "pdf_url": "http://arxiv.org/pdf/2407.11840v1", + "published_date": "2024-07-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Click-Gaussian: Interactive Segmentation to Any 3D Gaussians", + "authors": [ + "Seokhun Choi", + "Hyeonseop Song", + "Jaechul Kim", + "Taehyeong Kim", + "Hoseok Do" + ], + "abstract": "Interactive segmentation of 3D Gaussians opens a great opportunity for real-time manipulation of 3D scenes thanks to the real-time rendering capability of 3D Gaussian Splatting. However, the current methods suffer from time-consuming post-processing to deal with noisy segmentation output. Also, they struggle to provide detailed segmentation, which is important for fine-grained manipulation of 3D scenes. In this study, we propose Click-Gaussian, which learns distinguishable feature fields of two-level granularity, facilitating segmentation without time-consuming post-processing. We delve into challenges stemming from inconsistently learned feature fields resulting from 2D segmentation obtained independently from a 3D scene. 3D segmentation accuracy deteriorates when 2D segmentation results across the views, primary cues for 3D segmentation, are in conflict. To overcome these issues, we propose Global Feature-guided Learning (GFL). GFL constructs the clusters of global feature candidates from noisy 2D segments across the views, which smooths out noises when training the features of 3D Gaussians. Our method runs in 10 ms per click, 15 to 130 times as fast as the previous methods, while also significantly improving segmentation accuracy. Our project page is available at https://seokhunchoi.github.io/Click-Gaussian", + "arxiv_url": "http://arxiv.org/abs/2407.11793v1", + "pdf_url": "http://arxiv.org/pdf/2407.11793v1", + "published_date": "2024-07-16", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation", + "authors": [ + "Jiwook Kim", + "Seonho Lee", + "Jaeyo Shin", + "Jiho Choi", + "Hyunjung Shim" + ], + "abstract": "Score distillation sampling (SDS) has emerged as an effective framework in text-driven 3D editing tasks, leveraging diffusion models for 3D consistent editing. However, existing SDS-based 3D editing methods suffer from long training times and produce low-quality results. We identify that the root cause of this performance degradation is their conflict with the sampling dynamics of diffusion models. Addressing this conflict allows us to treat SDS as a diffusion reverse process for 3D editing via sampling from data space. In contrast, existing methods naively distill the score function using diffusion models. From these insights, we propose DreamCatalyst, a novel framework that considers these sampling dynamics in the SDS framework. Specifically, we devise the optimization process of our DreamCatalyst to approximate the diffusion reverse process in editing tasks, thereby aligning with diffusion sampling dynamics. As a result, DreamCatalyst successfully reduces training time and improves editing quality. Our method offers two modes: (1) a fast mode that edits Neural Radiance Fields (NeRF) scenes approximately 23 times faster than current state-of-the-art NeRF editing methods, and (2) a high-quality mode that produces superior results about 8 times faster than these methods. Notably, our high-quality mode outperforms current state-of-the-art NeRF editing methods in terms of both speed and quality. DreamCatalyst also surpasses the state-of-the-art 3D Gaussian Splatting (3DGS) editing methods, establishing itself as an effective and model-agnostic 3D editing solution. See more extensive results on our project page: https://dream-catalyst.github.io.", + "arxiv_url": "http://arxiv.org/abs/2407.11394v2", + "pdf_url": "http://arxiv.org/pdf/2407.11394v2", + "published_date": "2024-07-16", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM", + "authors": [ + "Gwangtak Bae", + "Changwoon Choi", + "Hyeongjun Heo", + "Sang Min Kim", + "Young Min Kim" + ], + "abstract": "We present an inverse image-formation module that can enhance the robustness of existing visual SLAM pipelines for casually captured scenarios. Casual video captures often suffer from motion blur and varying appearances, which degrade the final quality of coherent 3D visual representation. We propose integrating the physical imaging into the SLAM system, which employs linear HDR radiance maps to collect measurements. Specifically, individual frames aggregate images of multiple poses along the camera trajectory to explain prevalent motion blur in hand-held videos. Additionally, we accommodate per-frame appearance variation by dedicating explicit variables for image formation steps, namely white balance, exposure time, and camera response function. Through joint optimization of additional variables, the SLAM pipeline produces high-quality images with more accurate trajectories. Extensive experiments demonstrate that our approach can be incorporated into recent visual SLAM pipelines using various scene representations, such as neural radiance fields or Gaussian splatting.", + "arxiv_url": "http://arxiv.org/abs/2407.11347v1", + "pdf_url": "http://arxiv.org/pdf/2407.11347v1", + "published_date": "2024-07-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Ev-GS: Event-based Gaussian splatting for Efficient and Accurate Radiance Field Rendering", + "authors": [ + "Jingqian Wu", + "Shuo Zhu", + "Chutian Wang", + "Edmund Y. Lam" + ], + "abstract": "Computational neuromorphic imaging (CNI) with event cameras offers advantages such as minimal motion blur and enhanced dynamic range, compared to conventional frame-based methods. Existing event-based radiance field rendering methods are built on neural radiance field, which is computationally heavy and slow in reconstruction speed. Motivated by the two aspects, we introduce Ev-GS, the first CNI-informed scheme to infer 3D Gaussian splatting from a monocular event camera, enabling efficient novel view synthesis. Leveraging 3D Gaussians with pure event-based supervision, Ev-GS overcomes challenges such as the detection of fast-moving objects and insufficient lighting. Experimental results show that Ev-GS outperforms the method that takes frame-based signals as input by rendering realistic views with reduced blurring and improved visual quality. Moreover, it demonstrates competitive reconstruction quality and reduced computing occupancy compared to existing methods, which paves the way to a highly efficient CNI approach for signal processing.", + "arxiv_url": "http://arxiv.org/abs/2407.11343v1", + "pdf_url": "http://arxiv.org/pdf/2407.11343v1", + "published_date": "2024-07-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting LK", + "authors": [ + "Liuyue Xie", + "Joel Julin", + "Koichiro Niinuma", + "Laszlo A. Jeni" + ], + "abstract": "Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time presents a significant challenge due to the inherent complexity and temporal dynamics involved. While recent advancements in neural implicit models and dynamic Gaussian Splatting have shown promise, limitations persist, particularly in accurately capturing the underlying geometry of highly dynamic scenes. Some approaches address this by incorporating strong semantic and geometric priors through diffusion models. However, we explore a different avenue by investigating the potential of regularizing the native warp field within the dynamic Gaussian Splatting framework. Our method is grounded on the key intuition that an accurate warp field should produce continuous space-time motions. While enforcing the motion constraints on warp fields is non-trivial, we show that we can exploit knowledge innate to the forward warp field network to derive an analytical velocity field, then time integrate for scene flows to effectively constrain both the 2D motion and 3D positions of the Gaussians. This derived Lucas-Kanade style analytical regularization enables our method to achieve superior performance in reconstructing highly dynamic scenes, even under minimal camera movement, extending the boundaries of what existing dynamic Gaussian Splatting frameworks can achieve.", + "arxiv_url": "http://arxiv.org/abs/2407.11309v1", + "pdf_url": "http://arxiv.org/pdf/2407.11309v1", + "published_date": "2024-07-16", + "categories": [ + "cs.CV", + "cs.GR", + "I.3; I.4" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "iHuman: Instant Animatable Digital Humans From Monocular Videos", + "authors": [ + "Pramish Paudel", + "Anubhav Khanal", + "Ajad Chhatkuli", + "Danda Pani Paudel", + "Jyoti Tandukar" + ], + "abstract": "Personalized 3D avatars require an animatable representation of digital humans. Doing so instantly from monocular videos offers scalability to broad class of users and wide-scale applications. In this paper, we present a fast, simple, yet effective method for creating animatable 3D digital humans from monocular videos. Our method utilizes the efficiency of Gaussian splatting to model both 3D geometry and appearance. However, we observed that naively optimizing Gaussian splats results in inaccurate geometry, thereby leading to poor animations. This work achieves and illustrates the need of accurate 3D mesh-type modelling of the human body for animatable digitization through Gaussian splats. This is achieved by developing a novel pipeline that benefits from three key aspects: (a) implicit modelling of surface's displacements and the color's spherical harmonics; (b) binding of 3D Gaussians to the respective triangular faces of the body template; (c) a novel technique to render normals followed by their auxiliary supervision. Our exhaustive experiments on three different benchmark datasets demonstrates the state-of-the-art results of our method, in limited time settings. In fact, our method is faster by an order of magnitude (in terms of training time) than its closest competitor. At the same time, we achieve superior rendering and 3D reconstruction performance under the change of poses.", + "arxiv_url": "http://arxiv.org/abs/2407.11174v1", + "pdf_url": "http://arxiv.org/pdf/2407.11174v1", + "published_date": "2024-07-15", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Scaling 3D Reasoning with LMMs to Large Robot Mission Environments Using Datagraphs", + "authors": [ + "W. J. Meijer", + "A. C. Kemmeren", + "E. H. J. Riemens", + "J. E. Fransman", + "M. van Bekkum", + "G. J. Burghouts", + "J. D. van Mil" + ], + "abstract": "This paper addresses the challenge of scaling Large Multimodal Models (LMMs) to expansive 3D environments. Solving this open problem is especially relevant for robot deployment in many first-responder scenarios, such as search-and-rescue missions that cover vast spaces. The use of LMMs in these settings is currently hampered by the strict context windows that limit the LMM's input size. We therefore introduce a novel approach that utilizes a datagraph structure, which allows the LMM to iteratively query smaller sections of a large environment. Using the datagraph in conjunction with graph traversal algorithms, we can prioritize the most relevant locations to the query, thereby improving the scalability of 3D scene language tasks. We illustrate the datagraph using 3D scenes, but these can be easily substituted by other dense modalities that represent the environment, such as pointclouds or Gaussian splats. We demonstrate the potential to use the datagraph for two 3D scene language task use cases, in a search-and-rescue mission example.", + "arxiv_url": "http://arxiv.org/abs/2407.10743v1", + "pdf_url": "http://arxiv.org/pdf/2407.10743v1", + "published_date": "2024-07-15", + "categories": [ + "cs.RO", + "cs.AI" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Interactive Rendering of Relightable and Animatable Gaussian Avatars", + "authors": [ + "Youyi Zhan", + "Tianjia Shao", + "He Wang", + "Yin Yang", + "Kun Zhou" + ], + "abstract": "Creating relightable and animatable avatars from multi-view or monocular videos is a challenging task for digital human creation and virtual reality applications. Previous methods rely on neural radiance fields or ray tracing, resulting in slow training and rendering processes. By utilizing Gaussian Splatting, we propose a simple and efficient method to decouple body materials and lighting from sparse-view or monocular avatar videos, so that the avatar can be rendered simultaneously under novel viewpoints, poses, and lightings at interactive frame rates (6.9 fps). Specifically, we first obtain the canonical body mesh using a signed distance function and assign attributes to each mesh vertex. The Gaussians in the canonical space then interpolate from nearby body mesh vertices to obtain the attributes. We subsequently deform the Gaussians to the posed space using forward skinning, and combine the learnable environment light with the Gaussian attributes for shading computation. To achieve fast shadow modeling, we rasterize the posed body mesh from dense viewpoints to obtain the visibility. Our approach is not only simple but also fast enough to allow interactive rendering of avatar animation under environmental light changes. Experiments demonstrate that, compared to previous works, our method can render higher quality results at a faster speed on both synthetic and real datasets.", + "arxiv_url": "http://arxiv.org/abs/2407.10707v1", + "pdf_url": "http://arxiv.org/pdf/2407.10707v1", + "published_date": "2024-07-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Pathformer3D: A 3D Scanpath Transformer for 360° Images", + "authors": [ + "Rong Quan", + "Yantao Lai", + "Mengyu Qiu", + "Dong Liang" + ], + "abstract": "Scanpath prediction in 360{\\deg} images can help realize rapid rendering and better user interaction in Virtual/Augmented Reality applications. However, existing scanpath prediction models for 360{\\deg} images execute scanpath prediction on 2D equirectangular projection plane, which always result in big computation error owing to the 2D plane's distortion and coordinate discontinuity. In this work, we perform scanpath prediction for 360{\\deg} images in 3D spherical coordinate system and proposed a novel 3D scanpath Transformer named Pathformer3D. Specifically, a 3D Transformer encoder is first used to extract 3D contextual feature representation for the 360{\\deg} image. Then, the contextual feature representation and historical fixation information are input into a Transformer decoder to output current time step's fixation embedding, where the self-attention module is used to imitate the visual working memory mechanism of human visual system and directly model the time dependencies among the fixations. Finally, a 3D Gaussian distribution is learned from each fixation embedding, from which the fixation position can be sampled. Evaluation on four panoramic eye-tracking datasets demonstrates that Pathformer3D outperforms the current state-of-the-art methods. Code is available at https://github.com/lsztzp/Pathformer3D .", + "arxiv_url": "http://arxiv.org/abs/2407.10563v1", + "pdf_url": "http://arxiv.org/pdf/2407.10563v1", + "published_date": "2024-07-15", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/lsztzp/Pathformer3D", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RecGS: Removing Water Caustic with Recurrent Gaussian Splatting", + "authors": [ + "Tianyi Zhang", + "Weiming Zhi", + "Kaining Huang", + "Joshua Mangelson", + "Corina Barbalata", + "Matthew Johnson-Roberson" + ], + "abstract": "Water caustics are commonly observed in seafloor imaging data from shallow-water areas. Traditional methods that remove caustic patterns from images often rely on 2D filtering or pre-training on an annotated dataset, hindering the performance when generalizing to real-world seafloor data with 3D structures. In this paper, we present a novel method Recurrent Gaussian Splatting (RecGS), which takes advantage of today's photorealistic 3D reconstruction technology, 3DGS, to separate caustics from seafloor imagery. With a sequence of images taken by an underwater robot, we build 3DGS recurrently and decompose the caustic with low-pass filtering in each iteration. In the experiments, we analyze and compare with different methods, including joint optimization, 2D filtering, and deep learning approaches. The results show that our method can effectively separate the caustic from the seafloor, improving the visual appearance, and can be potentially applied on more problems with inconsistent illumination.", + "arxiv_url": "http://arxiv.org/abs/2407.10318v2", + "pdf_url": "http://arxiv.org/pdf/2407.10318v2", + "published_date": "2024-07-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DEgo: 3D Editing on the Go!", + "authors": [ + "Umar Khalid", + "Hasan Iqbal", + "Azib Farooq", + "Jing Hua", + "Chen Chen" + ], + "abstract": "We introduce 3DEgo to address a novel problem of directly synthesizing photorealistic 3D scenes from monocular videos guided by textual prompts. Conventional methods construct a text-conditioned 3D scene through a three-stage process, involving pose estimation using Structure-from-Motion (SfM) libraries like COLMAP, initializing the 3D model with unedited images, and iteratively updating the dataset with edited images to achieve a 3D scene with text fidelity. Our framework streamlines the conventional multi-stage 3D editing process into a single-stage workflow by overcoming the reliance on COLMAP and eliminating the cost of model initialization. We apply a diffusion model to edit video frames prior to 3D scene creation by incorporating our designed noise blender module for enhancing multi-view editing consistency, a step that does not require additional training or fine-tuning of T2I diffusion models. 3DEgo utilizes 3D Gaussian Splatting to create 3D scenes from the multi-view consistent edited frames, capitalizing on the inherent temporal continuity and explicit point cloud data. 3DEgo demonstrates remarkable editing precision, speed, and adaptability across a variety of video sources, as validated by extensive evaluations on six datasets, including our own prepared GS25 dataset. Project Page: https://3dego.github.io/", + "arxiv_url": "http://arxiv.org/abs/2407.10102v1", + "pdf_url": "http://arxiv.org/pdf/2407.10102v1", + "published_date": "2024-07-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion", + "authors": [ + "Jiyuan Zhang", + "Kang Chen", + "Shiyan Chen", + "Yajing Zheng", + "Tiejun Huang", + "Zhaofei Yu" + ], + "abstract": "Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-world objects or scenes in various fields, including Virtual Reality or embodied AI. Spike cameras, a novel type of neuromorphic sensor, continuously record scenes with an ultra-high temporal resolution, showing potential for accurate 3D reconstruction. Despite their promise, existing approaches, such as applying Neural Radiance Fields (NeRF) to spike cameras, encounter challenges due to the time-consuming rendering process. To address this issue, we make the first attempt to introduce the 3D Gaussian Splatting (3DGS) into spike cameras in high-speed capture, providing 3DGS as dense and continuous clues of views, then constructing SpikeGS. Specifically, to train SpikeGS, we establish computational equations between the rendering process of 3DGS and the processes of instantaneous imaging and exposing-like imaging of the continuous spike stream. Besides, we build a very lightweight but effective mapping process from spikes to instant images to support training. Furthermore, we introduced a new spike-based 3D rendering dataset for validation. Extensive experiments have demonstrated our method possesses the high quality of novel view rendering, proving the tremendous potential of spike cameras in modeling 3D scenes.", + "arxiv_url": "http://arxiv.org/abs/2407.10062v1", + "pdf_url": "http://arxiv.org/pdf/2407.10062v1", + "published_date": "2024-07-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Textured-GS: Gaussian Splatting with Spatially Defined Color and Opacity", + "authors": [ + "Zhentao Huang", + "Minglun Gong" + ], + "abstract": "In this paper, we introduce Textured-GS, an innovative method for rendering Gaussian splatting that incorporates spatially defined color and opacity variations using Spherical Harmonics (SH). This approach enables each Gaussian to exhibit a richer representation by accommodating varying colors and opacities across its surface, significantly enhancing rendering quality compared to traditional methods. To demonstrate the merits of our approach, we have adapted the Mini-Splatting architecture to integrate textured Gaussians without increasing the number of Gaussians. Our experiments across multiple real-world datasets show that Textured-GS consistently outperforms both the baseline Mini-Splatting and standard 3DGS in terms of visual fidelity. The results highlight the potential of Textured-GS to advance Gaussian-based rendering technologies, promising more efficient and high-quality scene reconstructions. Our implementation is available at https://github.com/ZhentaoHuang/Textured-GS.", + "arxiv_url": "http://arxiv.org/abs/2407.09733v3", + "pdf_url": "http://arxiv.org/pdf/2407.09733v3", + "published_date": "2024-07-13", + "categories": [ + "cs.CV", + "I.4.0" + ], + "github_url": "https://github.com/ZhentaoHuang/Textured-GS", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "StyleSplat: 3D Object Style Transfer with Gaussian Splatting", + "authors": [ + "Sahil Jain", + "Avik Kuthiala", + "Prabhdeep Singh Sethi", + "Prakanshul Saxena" + ], + "abstract": "Recent advancements in radiance fields have opened new avenues for creating high-quality 3D assets and scenes. Style transfer can enhance these 3D assets with diverse artistic styles, transforming creative expression. However, existing techniques are often slow or unable to localize style transfer to specific objects. We introduce StyleSplat, a lightweight method for stylizing 3D objects in scenes represented by 3D Gaussians from reference style images. Our approach first learns a photorealistic representation of the scene using 3D Gaussian splatting while jointly segmenting individual 3D objects. We then use a nearest-neighbor feature matching loss to finetune the Gaussians of the selected objects, aligning their spherical harmonic coefficients with the style image to ensure consistency and visual appeal. StyleSplat allows for quick, customizable style transfer and localized stylization of multiple objects within a scene, each with a different style. We demonstrate its effectiveness across various 3D scenes and styles, showcasing enhanced control and customization in 3D creation.", + "arxiv_url": "http://arxiv.org/abs/2407.09473v1", + "pdf_url": "http://arxiv.org/pdf/2407.09473v1", + "published_date": "2024-07-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "WildGaussians: 3D Gaussian Splatting in the Wild", + "authors": [ + "Jonas Kulhanek", + "Songyou Peng", + "Zuzana Kukelova", + "Marc Pollefeys", + "Torsten Sattler" + ], + "abstract": "While the field of 3D scene reconstruction is dominated by NeRFs due to their photorealistic quality, 3D Gaussian Splatting (3DGS) has recently emerged, offering similar quality with real-time rendering speeds. However, both methods primarily excel with well-controlled 3D scenes, while in-the-wild data - characterized by occlusions, dynamic objects, and varying illumination - remains challenging. NeRFs can adapt to such conditions easily through per-image embedding vectors, but 3DGS struggles due to its explicit representation and lack of shared parameters. To address this, we introduce WildGaussians, a novel approach to handle occlusions and appearance changes with 3DGS. By leveraging robust DINO features and integrating an appearance modeling module within 3DGS, our method achieves state-of-the-art results. We demonstrate that WildGaussians matches the real-time rendering speed of 3DGS while surpassing both 3DGS and NeRF baselines in handling in-the-wild data, all within a simple architectural framework.", + "arxiv_url": "http://arxiv.org/abs/2407.08447v2", + "pdf_url": "http://arxiv.org/pdf/2407.08447v2", + "published_date": "2024-07-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Survey on Fundamental Deep Learning 3D Reconstruction Techniques", + "authors": [ + "Yonge Bai", + "LikHang Wong", + "TszYin Twan" + ], + "abstract": "This survey aims to investigate fundamental deep learning (DL) based 3D reconstruction techniques that produce photo-realistic 3D models and scenes, highlighting Neural Radiance Fields (NeRFs), Latent Diffusion Models (LDM), and 3D Gaussian Splatting. We dissect the underlying algorithms, evaluate their strengths and tradeoffs, and project future research trajectories in this rapidly evolving field. We provide a comprehensive overview of the fundamental in DL-driven 3D scene reconstruction, offering insights into their potential applications and limitations.", + "arxiv_url": "http://arxiv.org/abs/2407.08137v1", + "pdf_url": "http://arxiv.org/pdf/2407.08137v1", + "published_date": "2024-07-11", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition", + "authors": [ + "Aggelina Chatziagapi", + "Grigorios G. Chrysos", + "Dimitris Samaras" + ], + "abstract": "We introduce MIGS (Multi-Identity Gaussian Splatting), a novel method that learns a single neural representation for multiple identities, using only monocular videos. Recent 3D Gaussian Splatting (3DGS) approaches for human avatars require per-identity optimization. However, learning a multi-identity representation presents advantages in robustly animating humans under arbitrary poses. We propose to construct a high-order tensor that combines all the learnable 3DGS parameters for all the training identities. By assuming a low-rank structure and factorizing the tensor, we model the complex rigid and non-rigid deformations of multiple subjects in a unified network, significantly reducing the total number of parameters. Our proposed approach leverages information from all the training identities and enables robust animation under challenging unseen poses, outperforming existing approaches. It can also be extended to learn unseen identities.", + "arxiv_url": "http://arxiv.org/abs/2407.07284v2", + "pdf_url": "http://arxiv.org/pdf/2407.07284v2", + "published_date": "2024-07-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Reference-based Controllable Scene Stylization with Gaussian Splatting", + "authors": [ + "Yiqun Mei", + "Jiacong Xu", + "Vishal M. Patel" + ], + "abstract": "Referenced-based scene stylization that edits the appearance based on a content-aligned reference image is an emerging research area. Starting with a pretrained neural radiance field (NeRF), existing methods typically learn a novel appearance that matches the given style. Despite their effectiveness, they inherently suffer from time-consuming volume rendering, and thus are impractical for many real-time applications. In this work, we propose ReGS, which adapts 3D Gaussian Splatting (3DGS) for reference-based stylization to enable real-time stylized view synthesis. Editing the appearance of a pretrained 3DGS is challenging as it uses discrete Gaussians as 3D representation, which tightly bind appearance with geometry. Simply optimizing the appearance as prior methods do is often insufficient for modeling continuous textures in the given reference image. To address this challenge, we propose a novel texture-guided control mechanism that adaptively adjusts local responsible Gaussians to a new geometric arrangement, serving for desired texture details. The proposed process is guided by texture clues for effective appearance editing, and regularized by scene depth for preserving original geometric structure. With these novel designs, we show ReGs can produce state-of-the-art stylization results that respect the reference texture while embracing real-time rendering speed for free-view navigation.", + "arxiv_url": "http://arxiv.org/abs/2407.07220v1", + "pdf_url": "http://arxiv.org/pdf/2407.07220v1", + "published_date": "2024-07-09", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Ray Tracing: Fast Tracing of Particle Scenes", + "authors": [ + "Nicolas Moenne-Loccoz", + "Ashkan Mirzaei", + "Or Perel", + "Riccardo de Lutio", + "Janick Martinez Esturo", + "Gavriel State", + "Sanja Fidler", + "Nicholas Sharp", + "Zan Gojcic" + ], + "abstract": "Particle-based representations of radiance fields such as 3D Gaussian Splatting have found great success for reconstructing and re-rendering of complex scenes. Most existing methods render particles via rasterization, projecting them to screen space tiles for processing in a sorted order. This work instead considers ray tracing the particles, building a bounding volume hierarchy and casting a ray for each pixel using high-performance GPU ray tracing hardware. To efficiently handle large numbers of semi-transparent particles, we describe a specialized rendering algorithm which encapsulates particles with bounding meshes to leverage fast ray-triangle intersections, and shades batches of intersections in depth-order. The benefits of ray tracing are well-known in computer graphics: processing incoherent rays for secondary lighting effects such as shadows and reflections, rendering from highly-distorted cameras common in robotics, stochastically sampling rays, and more. With our renderer, this flexibility comes at little cost compared to rasterization. Experiments demonstrate the speed and accuracy of our approach, as well as several applications in computer graphics and vision. We further propose related improvements to the basic Gaussian representation, including a simple use of generalized kernel functions which significantly reduces particle hit counts.", + "arxiv_url": "http://arxiv.org/abs/2407.07090v3", + "pdf_url": "http://arxiv.org/pdf/2407.07090v3", + "published_date": "2024-07-09", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PICA: Physics-Integrated Clothed Avatar", + "authors": [ + "Bo Peng", + "Yunfan Tao", + "Haoyu Zhan", + "Yudong Guo", + "Juyong Zhang" + ], + "abstract": "We introduce PICA, a novel representation for high-fidelity animatable clothed human avatars with physics-accurate dynamics, even for loose clothing. Previous neural rendering-based representations of animatable clothed humans typically employ a single model to represent both the clothing and the underlying body. While efficient, these approaches often fail to accurately represent complex garment dynamics, leading to incorrect deformations and noticeable rendering artifacts, especially for sliding or loose garments. Furthermore, previous works represent garment dynamics as pose-dependent deformations and facilitate novel pose animations in a data-driven manner. This often results in outcomes that do not faithfully represent the mechanics of motion and are prone to generating artifacts in out-of-distribution poses. To address these issues, we adopt two individual 3D Gaussian Splatting (3DGS) models with different deformation characteristics, modeling the human body and clothing separately. This distinction allows for better handling of their respective motion characteristics. With this representation, we integrate a graph neural network (GNN)-based clothed body physics simulation module to ensure an accurate representation of clothing dynamics. Our method, through its carefully designed features, achieves high-fidelity rendering of clothed human bodies in complex and novel driving poses, significantly outperforming previous methods under the same settings.", + "arxiv_url": "http://arxiv.org/abs/2407.05324v1", + "pdf_url": "http://arxiv.org/pdf/2407.05324v1", + "published_date": "2024-07-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussReg: Fast 3D Registration with Gaussian Splatting", + "authors": [ + "Jiahao Chang", + "Yinglin Xu", + "Yihao Li", + "Yuantao Chen", + "Xiaoguang Han" + ], + "abstract": "Point cloud registration is a fundamental problem for large-scale 3D scene scanning and reconstruction. With the help of deep learning, registration methods have evolved significantly, reaching a nearly-mature stage. As the introduction of Neural Radiance Fields (NeRF), it has become the most popular 3D scene representation as its powerful view synthesis capabilities. Regarding NeRF representation, its registration is also required for large-scale scene reconstruction. However, this topic extremly lacks exploration. This is due to the inherent challenge to model the geometric relationship among two scenes with implicit representations. The existing methods usually convert the implicit representation to explicit representation for further registration. Most recently, Gaussian Splatting (GS) is introduced, employing explicit 3D Gaussian. This method significantly enhances rendering speed while maintaining high rendering quality. Given two scenes with explicit GS representations, in this work, we explore the 3D registration task between them. To this end, we propose GaussReg, a novel coarse-to-fine framework, both fast and accurate. The coarse stage follows existing point cloud registration methods and estimates a rough alignment for point clouds from GS. We further newly present an image-guided fine registration approach, which renders images from GS to provide more detailed geometric information for precise alignment. To support comprehensive evaluation, we carefully build a scene-level dataset called ScanNet-GSReg with 1379 scenes obtained from the ScanNet dataset and collect an in-the-wild dataset called GSReg. Experimental results demonstrate our method achieves state-of-the-art performance on multiple datasets. Our GaussReg is 44 times faster than HLoc (SuperPoint as the feature extractor and SuperGlue as the matcher) with comparable accuracy.", + "arxiv_url": "http://arxiv.org/abs/2407.05254v1", + "pdf_url": "http://arxiv.org/pdf/2407.05254v1", + "published_date": "2024-07-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Panopticon: a telescope for our times", + "authors": [ + "Will Saunders", + "Timothy Chin", + "Michael Goodwin" + ], + "abstract": "We present a design for a wide-field spectroscopic telescope. The only large powered mirror is spherical, the resulting spherical aberration is corrected for each target separately, giving exceptional image quality. The telescope is a transit design, but still allows all-sky coverage. Three simultaneous modes are proposed: (a) natural seeing multi-object spectroscopy with 12m aperture over 3dg FoV with ~25,000 targets; (b) multi-object AO with 12m aperture over 3dg FoV with ~100 AO-corrected Integral Field Units each with 4 arcsec FoV; (c) ground layer AO-corrected integral field spectroscopy with 15m aperture and 13 arcmin FoV. Such a telescope would be uniquely powerful for large-area follow-up of imaging surveys; in each mode, the AOmega and survey speed exceed all existing facilities combined. The expected cost of this design is relatively modest, much closer to $500M than $1000M.", + "arxiv_url": "http://arxiv.org/abs/2407.05103v2", + "pdf_url": "http://arxiv.org/pdf/2407.05103v2", + "published_date": "2024-07-06", + "categories": [ + "astro-ph.IM" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction", + "authors": [ + "Weixing Xie", + "Junfeng Yao", + "Xianpeng Cao", + "Qiqin Lin", + "Zerui Tang", + "Xiao Dong", + "Xiaohu Guo" + ], + "abstract": "Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-mapping deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at https://surgicalgaussian.github.io.", + "arxiv_url": "http://arxiv.org/abs/2407.05023v1", + "pdf_url": "http://arxiv.org/pdf/2407.05023v1", + "published_date": "2024-07-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Eigen Models for Human Heads", + "authors": [ + "Wojciech Zielonka", + "Timo Bolkart", + "Thabo Beeler", + "Justus Thies" + ], + "abstract": "We present personalized Gaussian Eigen Models (GEMs) for human heads, a novel method that compresses dynamic 3D Gaussians into low-dimensional linear spaces. Our approach is inspired by the seminal work of Blanz and Vetter, where a mesh-based 3D morphable model (3DMM) is constructed from registered meshes. Based on dynamic 3D Gaussians, we create a lower-dimensional representation of primitives that applies to most 3DGS head avatars. Specifically, we propose a universal method to distill the appearance of a mesh-controlled UNet Gaussian avatar using an ensemble of linear eigenbasis. We replace heavy CNN-based architectures with a single linear layer improving speed and enabling a range of real-time downstream applications. To create a particular facial expression, one simply needs to perform a dot product between the eigen coefficients and the distilled basis. This efficient method removes the requirement for an input mesh during testing, enhancing simplicity and speed in expression generation. This process is highly efficient and supports real-time rendering on everyday devices, leveraging the effectiveness of standard Gaussian Splatting. In addition, we demonstrate how the GEM can be controlled using a ResNet-based regression architecture. We show and compare self-reenactment and cross-person reenactment to state-of-the-art 3D avatar methods, demonstrating higher quality and better control. A real-time demo showcases the applicability of the GEM representation.", + "arxiv_url": "http://arxiv.org/abs/2407.04545v1", + "pdf_url": "http://arxiv.org/pdf/2407.04545v1", + "published_date": "2024-07-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Segment Any 4D Gaussians", + "authors": [ + "Shengxiang Ji", + "Guanjun Wu", + "Jiemin Fang", + "Jiazhong Cen", + "Taoran Yi", + "Wenyu Liu", + "Qi Tian", + "Xinggang Wang" + ], + "abstract": "Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations. In this paper, we propose Segment Any 4D Gaussians (SA4D), one of the first frameworks to segment anything in the 4D digital world based on 4D Gaussians. In SA4D, an efficient temporal identity feature field is introduced to handle Gaussian drifting, with the potential to learn precise identity features from noisy and sparse input. Additionally, a 4D segmentation refinement process is proposed to remove artifacts. Our SA4D achieves precise, high-quality segmentation within seconds in 4D Gaussians and shows the ability to remove, recolor, compose, and render high-quality anything masks. More demos are available at: https://jsxzs.github.io/sa4d/.", + "arxiv_url": "http://arxiv.org/abs/2407.04504v2", + "pdf_url": "http://arxiv.org/pdf/2407.04504v2", + "published_date": "2024-07-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction", + "authors": [ + "Yuxuan Mu", + "Xinxin Zuo", + "Chuan Guo", + "Yilin Wang", + "Juwei Lu", + "Xiaofeng Wu", + "Songcen Xu", + "Peng Dai", + "Youliang Yan", + "Li Cheng" + ], + "abstract": "We present GSD, a diffusion model approach based on Gaussian Splatting (GS) representation for 3D object reconstruction from a single view. Prior works suffer from inconsistent 3D geometry or mediocre rendering quality due to improper representations. We take a step towards resolving these shortcomings by utilizing the recent state-of-the-art 3D explicit representation, Gaussian Splatting, and an unconditional diffusion model. This model learns to generate 3D objects represented by sets of GS ellipsoids. With these strong generative 3D priors, though learning unconditionally, the diffusion model is ready for view-guided reconstruction without further model fine-tuning. This is achieved by propagating fine-grained 2D features through the efficient yet flexible splatting function and the guided denoising sampling process. In addition, a 2D diffusion model is further employed to enhance rendering fidelity, and improve reconstructed GS quality by polishing and re-using the rendered images. The final reconstructed objects explicitly come with high-quality 3D structure and texture, and can be efficiently rendered in arbitrary views. Experiments on the challenging real-world CO3D dataset demonstrate the superiority of our approach. Project page: https://yxmu.foo/GSD/", + "arxiv_url": "http://arxiv.org/abs/2407.04237v4", + "pdf_url": "http://arxiv.org/pdf/2407.04237v4", + "published_date": "2024-07-05", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images", + "authors": [ + "Junghe Lee", + "Donghyeong Kim", + "Dogyoon Lee", + "Suhwan Cho", + "Sangyoun Lee" + ], + "abstract": "Neural radiance fields (NeRFs) have received significant attention due to their high-quality novel view rendering ability, prompting research to address various real-world cases. One critical challenge is the camera motion blur caused by camera movement during exposure time, which prevents accurate 3D scene reconstruction. In this study, we propose continuous rigid motion-aware gaussian splatting (CRiM-GS) to reconstruct accurate 3D scene from blurry images with real-time rendering speed. Considering the actual camera motion blurring process, which consists of complex motion patterns, we predict the continuous movement of the camera based on neural ordinary differential equations (ODEs). Specifically, we leverage rigid body transformations to model the camera motion with proper regularization, preserving the shape and size of the object. Furthermore, we introduce a continuous deformable 3D transformation in the \\textit{SE(3)} field to adapt the rigid body transformation to real-world problems by ensuring a higher degree of freedom. By revisiting fundamental camera theory and employing advanced neural network training techniques, we achieve accurate modeling of continuous camera trajectories. We conduct extensive experiments, demonstrating state-of-the-art performance both quantitatively and qualitatively on benchmark datasets.", + "arxiv_url": "http://arxiv.org/abs/2407.03923v1", + "pdf_url": "http://arxiv.org/pdf/2407.03923v1", + "published_date": "2024-07-04", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PFGS: High Fidelity Point Cloud Rendering via Feature Splatting", + "authors": [ + "Jiaxu Wang", + "Ziyi Zhang", + "Junhao He", + "Renjing Xu" + ], + "abstract": "Rendering high-fidelity images from sparse point clouds is still challenging. Existing learning-based approaches suffer from either hole artifacts, missing details, or expensive computations. In this paper, we propose a novel framework to render high-quality images from sparse points. This method first attempts to bridge the 3D Gaussian Splatting and point cloud rendering, which includes several cascaded modules. We first use a regressor to estimate Gaussian properties in a point-wise manner, the estimated properties are used to rasterize neural feature descriptors into 2D planes which are extracted from a multiscale extractor. The projected feature volume is gradually decoded toward the final prediction via a multiscale and progressive decoder. The whole pipeline experiences a two-stage training and is driven by our well-designed progressive and multiscale reconstruction loss. Experiments on different benchmarks show the superiority of our method in terms of rendering qualities and the necessities of our main components.", + "arxiv_url": "http://arxiv.org/abs/2407.03857v1", + "pdf_url": "http://arxiv.org/pdf/2407.03857v1", + "published_date": "2024-07-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpikeGS: Reconstruct 3D scene via fast-moving bio-inspired sensors", + "authors": [ + "Yijia Guo", + "Liwen Hu", + "Lei Ma", + "Tiejun Huang" + ], + "abstract": "3D Gaussian Splatting (3DGS) demonstrates unparalleled superior performance in 3D scene reconstruction. However, 3DGS heavily relies on the sharp images. Fulfilling this requirement can be challenging in real-world scenarios especially when the camera moves fast, which severely limits the application of 3DGS. To address these challenges, we proposed Spike Gausian Splatting (SpikeGS), the first framework that integrates the spike streams into 3DGS pipeline to reconstruct 3D scenes via a fast-moving bio-inspired camera. With accumulation rasterization, interval supervision, and a specially designed pipeline, SpikeGS extracts detailed geometry and texture from high temporal resolution but texture lacking spike stream, reconstructs 3D scenes captured in 1 second. Extensive experiments on multiple synthetic and real-world datasets demonstrate the superiority of SpikeGS compared with existing spike-based and deblur 3D scene reconstruction methods. Codes and data will be released soon.", + "arxiv_url": "http://arxiv.org/abs/2407.03771v2", + "pdf_url": "http://arxiv.org/pdf/2407.03771v2", + "published_date": "2024-07-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Expressive Gaussian Human Avatars from Monocular RGB Video", + "authors": [ + "Hezhen Hu", + "Zhiwen Fan", + "Tianhao Wu", + "Yihan Xi", + "Seoyoung Lee", + "Georgios Pavlakos", + "Zhangyang Wang" + ], + "abstract": "Nuanced expressiveness, particularly through fine-grained hand and facial expressions, is pivotal for enhancing the realism and vitality of digital human representations. In this work, we focus on investigating the expressiveness of human avatars when learned from monocular RGB video; a setting that introduces new challenges in capturing and animating fine-grained details. To this end, we introduce EVA, a drivable human model that meticulously sculpts fine details based on 3D Gaussians and SMPL-X, an expressive parametric human model. Focused on enhancing expressiveness, our work makes three key contributions. First, we highlight the critical importance of aligning the SMPL-X model with RGB frames for effective avatar learning. Recognizing the limitations of current SMPL-X prediction methods for in-the-wild videos, we introduce a plug-and-play module that significantly ameliorates misalignment issues. Second, we propose a context-aware adaptive density control strategy, which is adaptively adjusting the gradient thresholds to accommodate the varied granularity across body parts. Last but not least, we develop a feedback mechanism that predicts per-pixel confidence to better guide the learning of 3D Gaussians. Extensive experiments on two benchmarks demonstrate the superiority of our framework both quantitatively and qualitatively, especially on the fine-grained hand and facial details. See the project website at \\url{https://evahuman.github.io}", + "arxiv_url": "http://arxiv.org/abs/2407.03204v1", + "pdf_url": "http://arxiv.org/pdf/2407.03204v1", + "published_date": "2024-07-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors", + "authors": [ + "Sungwon Hwang", + "Min-Jung Kim", + "Taewoong Kang", + "Jayeon Kang", + "Jaegul Choo" + ], + "abstract": "Neural rendering-based urban scene reconstruction methods commonly rely on images collected from driving vehicles with cameras facing and moving forward. Although these methods can successfully synthesize from views similar to training camera trajectory, directing the novel view outside the training camera distribution does not guarantee on-par performance. In this paper, we tackle the Extrapolated View Synthesis (EVS) problem by evaluating the reconstructions on views such as looking left, right or downwards with respect to training camera distributions. To improve rendering quality for EVS, we initialize our model by constructing dense LiDAR map, and propose to leverage prior scene knowledge such as surface normal estimator and large-scale diffusion model. Qualitative and quantitative comparisons demonstrate the effectiveness of our methods on EVS. To the best of our knowledge, we are the first to address the EVS problem in urban scene reconstruction. Link to our project page: https://vegs3d.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2407.02945v3", + "pdf_url": "http://arxiv.org/pdf/2407.02945v3", + "published_date": "2024-07-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction", + "authors": [ + "Jiaxin Guo", + "Jiangliu Wang", + "Di Kang", + "Wenzhen Dong", + "Wenting Wang", + "Yun-hui Liu" + ], + "abstract": "Real-time 3D reconstruction of surgical scenes plays a vital role in computer-assisted surgery, holding a promise to enhance surgeons' visibility. Recent advancements in 3D Gaussian Splatting (3DGS) have shown great potential for real-time novel view synthesis of general scenes, which relies on accurate poses and point clouds generated by Structure-from-Motion (SfM) for initialization. However, 3DGS with SfM fails to recover accurate camera poses and geometry in surgical scenes due to the challenges of minimal textures and photometric inconsistencies. To tackle this problem, in this paper, we propose the first SfM-free 3DGS-based method for surgical scene reconstruction by jointly optimizing the camera poses and scene representation. Based on the video continuity, the key of our method is to exploit the immediate optical flow priors to guide the projection flow derived from 3D Gaussians. Unlike most previous methods relying on photometric loss only, we formulate the pose estimation problem as minimizing the flow loss between the projection flow and optical flow. A consistency check is further introduced to filter the flow outliers by detecting the rigid and reliable points that satisfy the epipolar geometry. During 3D Gaussian optimization, we randomly sample frames to optimize the scene representations to grow the 3D Gaussian progressively. Experiments on the SCARED dataset demonstrate our superior performance over existing methods in novel view synthesis and pose estimation with high efficiency. Code is available at https://github.com/wrld/Free-SurGS.", + "arxiv_url": "http://arxiv.org/abs/2407.02918v1", + "pdf_url": "http://arxiv.org/pdf/2407.02918v1", + "published_date": "2024-07-03", + "categories": [ + "cs.CV", + "eess.IV" + ], + "github_url": "https://github.com/wrld/Free-SurGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Spatially Coherent 3D Distributions of HI and CO in the Milky Way", + "authors": [ + "Laurin Söding", + "Gordian Edenhofer", + "Torsten A. Enßlin", + "Philipp Frank", + "Ralf Kissmann", + "Vo Hong Minh Phan", + "Andrés Ramírez", + "Hanieh Zhandinejad", + "Philipp Mertsch" + ], + "abstract": "The spatial distribution of the gaseous components of the Milky Way is of great importance for a number of different fields, e.g. Galactic structure, star formation and cosmic rays. However, obtaining distance information to gaseous clouds in the interstellar medium from Doppler-shifted line emission is notoriously difficult given our unique vantage point in the Galaxy. It requires precise knowledge of gas velocities and generally suffers from distance ambiguities. Previous works often assumed the optically thin limit (no absorption), a fixed velocity field, and lack resolution overall. We aim to overcome these issues and improve previous reconstructions of the gaseous constituents of the interstellar medium of the Galaxy. We use 3D Gaussian processes to model correlations in the interstellar medium, including correlations between different lines of sight, and enforce a spatially coherent structure in the prior. For modelling the transport of radiation from the emitting gas to us as observers, we take absorption effects into account. A special numerical grid ensures high resolution nearby. We infer the spatial distributions of HI, CO, their emission line-widths, and the Galactic velocity field in a joint Bayesian inference. We further constrain these fields with complementary data from Galactic masers and young stellar object clusters. Our main result consists of a set of samples that implicitly contain statistical uncertainties. The resulting maps are spatially coherent and reproduce the data with high fidelity. We confirm previous findings regarding the warping and flaring of the Galactic disc. A comparison with 3D dust maps reveals a good agreement on scales larger than approximately 400 pc. While our results are not free of artefacts, they present a big step forward in obtaining high quality 3D maps of the interstellar medium.", + "arxiv_url": "http://arxiv.org/abs/2407.02859v1", + "pdf_url": "http://arxiv.org/pdf/2407.02859v1", + "published_date": "2024-07-03", + "categories": [ + "astro-ph.GA" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction", + "authors": [ + "Mustafa Khan", + "Hamidreza Fazlali", + "Dhruv Sharma", + "Tongtong Cao", + "Dongfeng Bai", + "Yuan Ren", + "Bingbing Liu" + ], + "abstract": "Realistic scene reconstruction and view synthesis are essential for advancing autonomous driving systems by simulating safety-critical scenarios. 3D Gaussian Splatting excels in real-time rendering and static scene reconstructions but struggles with modeling driving scenarios due to complex backgrounds, dynamic objects, and sparse views. We propose AutoSplat, a framework employing Gaussian splatting to achieve highly realistic reconstructions of autonomous driving scenes. By imposing geometric constraints on Gaussians representing the road and sky regions, our method enables multi-view consistent simulation of challenging scenarios including lane changes. Leveraging 3D templates, we introduce a reflected Gaussian consistency constraint to supervise both the visible and unseen side of foreground objects. Moreover, to model the dynamic appearance of foreground objects, we estimate residual spherical harmonics for each foreground Gaussian. Extensive experiments on Pandaset and KITTI demonstrate that AutoSplat outperforms state-of-the-art methods in scene reconstruction and novel view synthesis across diverse driving scenarios. Visit our project page at https://autosplat.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2407.02598v2", + "pdf_url": "http://arxiv.org/pdf/2407.02598v2", + "published_date": "2024-07-02", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TrAME: Trajectory-Anchored Multi-View Editing for Text-Guided 3D Gaussian Splatting Manipulation", + "authors": [ + "Chaofan Luo", + "Donglin Di", + "Xun Yang", + "Yongjia Ma", + "Zhou Xue", + "Chen Wei", + "Yebin Liu" + ], + "abstract": "Despite significant strides in the field of 3D scene editing, current methods encounter substantial challenge, particularly in preserving 3D consistency in multi-view editing process. To tackle this challenge, we propose a progressive 3D editing strategy that ensures multi-view consistency via a Trajectory-Anchored Scheme (TAS) with a dual-branch editing mechanism. Specifically, TAS facilitates a tightly coupled iterative process between 2D view editing and 3D updating, preventing error accumulation yielded from text-to-image process. Additionally, we explore the relationship between optimization-based methods and reconstruction-based methods, offering a unified perspective for selecting superior design choice, supporting the rationale behind the designed TAS. We further present a tuning-free View-Consistent Attention Control (VCAC) module that leverages cross-view semantic and geometric reference from the source branch to yield aligned views from the target branch during the editing of 2D views. To validate the effectiveness of our method, we analyze 2D examples to demonstrate the improved consistency with the VCAC module. Further extensive quantitative and qualitative results in text-guided 3D scene editing indicate that our method achieves superior editing quality compared to state-of-the-art methods. We will make the complete codebase publicly available following the conclusion of the review process.", + "arxiv_url": "http://arxiv.org/abs/2407.02034v2", + "pdf_url": "http://arxiv.org/pdf/2407.02034v2", + "published_date": "2024-07-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction", + "authors": [ + "Yujin Ham", + "Mateusz Michalkiewicz", + "Guha Balakrishnan" + ], + "abstract": "3D building reconstruction from imaging data is an important task for many applications ranging from urban planning to reconnaissance. Modern Novel View synthesis (NVS) methods like NeRF and Gaussian Splatting offer powerful techniques for developing 3D models from natural 2D imagery in an unsupervised fashion. These algorithms generally require input training views surrounding the scene of interest, which, in the case of large buildings, is typically not available across all camera elevations. In particular, the most readily available camera viewpoints at scale across most buildings are at near-ground (e.g., with mobile phones) and aerial (drones) elevations. However, due to the significant difference in viewpoint between drone and ground image sets, camera registration - a necessary step for NVS algorithms - fails. In this work we propose a method, DRAGON, that can take drone and ground building imagery as input and produce a 3D NVS model. The key insight of DRAGON is that intermediate elevation imagery may be extrapolated by an NVS algorithm itself in an iterative procedure with perceptual regularization, thereby bridging the visual feature gap between the two elevations and enabling registration. We compiled a semi-synthetic dataset of 9 large building scenes using Google Earth Studio, and quantitatively and qualitatively demonstrate that DRAGON can generate compelling renderings on this dataset compared to baseline strategies.", + "arxiv_url": "http://arxiv.org/abs/2407.01761v1", + "pdf_url": "http://arxiv.org/pdf/2407.01761v1", + "published_date": "2024-07-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting", + "authors": [ + "Chenxin Li", + "Hengyu Liu", + "Zhiwen Fan", + "Wuyang Li", + "Yifan Liu", + "Panwang Pan", + "Yixuan Yuan" + ], + "abstract": "Recent advancements in large generative models and real-time neural rendering using point-based techniques pave the way for a future of widespread visual data distribution through sharing synthesized 3D assets. However, while standardized methods for embedding proprietary or copyright information, either overtly or subtly, exist for conventional visual content such as images and videos, this issue remains unexplored for emerging generative 3D formats like Gaussian Splatting. We present GaussianStego, a method for embedding steganographic information in the rendering of generated 3D assets. Our approach employs an optimization framework that enables the accurate extraction of hidden information from images rendered using Gaussian assets derived from large models, while maintaining their original visual quality. We conduct preliminary evaluations of our method across several potential deployment scenarios and discuss issues identified through analysis. GaussianStego represents an initial exploration into the novel challenge of embedding customizable, imperceptible, and recoverable information within the renders produced by current 3D generative models, while ensuring minimal impact on the rendered content's quality.", + "arxiv_url": "http://arxiv.org/abs/2407.01301v1", + "pdf_url": "http://arxiv.org/pdf/2407.01301v1", + "published_date": "2024-07-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Fast and Efficient: Mask Neural Fields for 3D Scene Segmentation", + "authors": [ + "Zihan Gao", + "Lingling Li", + "Licheng Jiao", + "Fang Liu", + "Xu Liu", + "Wenping Ma", + "Yuwei Guo", + "Shuyuan Yang" + ], + "abstract": "Understanding 3D scenes is a crucial challenge in computer vision research with applications spanning multiple domains. Recent advancements in distilling 2D vision-language foundation models into neural fields, like NeRF and 3DGS, enable open-vocabulary segmentation of 3D scenes from 2D multi-view images without the need for precise 3D annotations. However, while effective, these methods typically rely on the per-pixel distillation of high-dimensional CLIP features, introducing ambiguity and necessitating complex regularization strategies, which adds inefficiency during training. This paper presents MaskField, which enables efficient 3D open-vocabulary segmentation with neural fields from a novel perspective. Unlike previous methods, MaskField decomposes the distillation of mask and semantic features from foundation models by formulating a mask feature field and queries. MaskField overcomes ambiguous object boundaries by naturally introducing SAM segmented object shapes without extra regularization during training. By circumventing the direct handling of dense high-dimensional CLIP features during training, MaskField is particularly compatible with explicit scene representations like 3DGS. Our extensive experiments show that MaskField not only surpasses prior state-of-the-art methods but also achieves remarkably fast convergence. We hope that MaskField will inspire further exploration into how neural fields can be trained to comprehend 3D scenes from 2D models.", + "arxiv_url": "http://arxiv.org/abs/2407.01220v2", + "pdf_url": "http://arxiv.org/pdf/2407.01220v2", + "published_date": "2024-07-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Learning 3D Gaussians for Extremely Sparse-View Cone-Beam CT Reconstruction", + "authors": [ + "Yiqun Lin", + "Hualiang Wang", + "Jixiang Chen", + "Xiaomeng Li" + ], + "abstract": "Cone-Beam Computed Tomography (CBCT) is an indispensable technique in medical imaging, yet the associated radiation exposure raises concerns in clinical practice. To mitigate these risks, sparse-view reconstruction has emerged as an essential research direction, aiming to reduce the radiation dose by utilizing fewer projections for CT reconstruction. Although implicit neural representations have been introduced for sparse-view CBCT reconstruction, existing methods primarily focus on local 2D features queried from sparse projections, which is insufficient to process the more complicated anatomical structures, such as the chest. To this end, we propose a novel reconstruction framework, namely DIF-Gaussian, which leverages 3D Gaussians to represent the feature distribution in the 3D space, offering additional 3D spatial information to facilitate the estimation of attenuation coefficients. Furthermore, we incorporate test-time optimization during inference to further improve the generalization capability of the model. We evaluate DIF-Gaussian on two public datasets, showing significantly superior reconstruction performance than previous state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2407.01090v2", + "pdf_url": "http://arxiv.org/pdf/2407.01090v2", + "published_date": "2024-07-01", + "categories": [ + "eess.IV", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting", + "authors": [ + "Chenxin Li", + "Brandon Y. Feng", + "Yifan Liu", + "Hengyu Liu", + "Cheng Wang", + "Weihao Yu", + "Yixuan Yuan" + ], + "abstract": "3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually the case in real-world clinical scenarios. To tackle this {sparsity} challenge, we propose a framework leveraging the prior knowledge from multiple foundation models during the reconstruction process, dubbed as \\textit{EndoSparse}. Experimental results indicate that our proposed strategy significantly improves the geometric and appearance quality under challenging sparse-view conditions, including using only three views. In rigorous benchmarking experiments against state-of-the-art methods, \\textit{EndoSparse} achieves superior results in terms of accurate geometry, realistic appearance, and rendering efficiency, confirming the robustness to sparse-view limitations in endoscopic reconstruction. \\textit{EndoSparse} signifies a steady step towards the practical deployment of neural 3D reconstruction in real-world clinical scenarios. Project page: https://endo-sparse.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2407.01029v1", + "pdf_url": "http://arxiv.org/pdf/2407.01029v1", + "published_date": "2024-07-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "neural rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RTGS: Enabling Real-Time Gaussian Splatting on Mobile Devices Using Efficiency-Guided Pruning and Foveated Rendering", + "authors": [ + "Weikai Lin", + "Yu Feng", + "Yuhao Zhu" + ], + "abstract": "Point-Based Neural Rendering (PBNR), i.e., the 3D Gaussian Splatting-family algorithms, emerges as a promising class of rendering techniques, which are permeating all aspects of society, driven by a growing demand for real-time, photorealistic rendering in AR/VR and digital twins. Achieving real-time PBNR on mobile devices is challenging. This paper proposes RTGS, a PBNR system that for the first time delivers real-time neural rendering on mobile devices while maintaining human visual quality. RTGS combines two techniques. First, we present an efficiency-aware pruning technique to optimize rendering speed. Second, we introduce a Foveated Rendering (FR) method for PBNR, leveraging humans' low visual acuity in peripheral regions to relax rendering quality and improve rendering speed. Our system executes in real-time (above 100 FPS) on Nvidia Jetson Xavier board without sacrificing subjective visual quality, as confirmed by a user study. The code is open-sourced at [https://github.com/horizon-research/Fov-3DGS].", + "arxiv_url": "http://arxiv.org/abs/2407.00435v2", + "pdf_url": "http://arxiv.org/pdf/2407.00435v2", + "published_date": "2024-06-29", + "categories": [ + "cs.GR", + "I.3, I.2" + ], + "github_url": "https://github.com/horizon-research/Fov-3DGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OccFusion: Rendering Occluded Humans with Generative Diffusion Priors", + "authors": [ + "Adam Sun", + "Tiange Xiang", + "Scott Delp", + "Li Fei-Fei", + "Ehsan Adeli" + ], + "abstract": "Most existing human rendering methods require every part of the human to be fully visible throughout the input video. However, this assumption does not hold in real-life settings where obstructions are common, resulting in only partial visibility of the human. Considering this, we present OccFusion, an approach that utilizes efficient 3D Gaussian splatting supervised by pretrained 2D diffusion models for efficient and high-fidelity human rendering. We propose a pipeline consisting of three stages. In the Initialization stage, complete human masks are generated from partial visibility masks. In the Optimization stage, 3D human Gaussians are optimized with additional supervision by Score-Distillation Sampling (SDS) to create a complete geometry of the human. Finally, in the Refinement stage, in-context inpainting is designed to further improve rendering quality on the less observed human body parts. We evaluate OccFusion on ZJU-MoCap and challenging OcMotion sequences and find that it achieves state-of-the-art performance in the rendering of occluded humans.", + "arxiv_url": "http://arxiv.org/abs/2407.00316v1", + "pdf_url": "http://arxiv.org/pdf/2407.00316v1", + "published_date": "2024-06-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting", + "authors": [ + "Sara Sabour", + "Lily Goli", + "George Kopanas", + "Mark Matthews", + "Dmitry Lagun", + "Leonidas Guibas", + "Alec Jacobson", + "David J. Fleet", + "Andrea Tagliasacchi" + ], + "abstract": "3D Gaussian Splatting (3DGS) is a promising technique for 3D reconstruction, offering efficient training and rendering speeds, making it suitable for real-time applications.However, current methods require highly controlled environments (no moving people or wind-blown elements, and consistent lighting) to meet the inter-view consistency assumption of 3DGS. This makes reconstruction of real-world captures problematic. We present SpotLessSplats, an approach that leverages pre-trained and general-purpose features coupled with robust optimization to effectively ignore transient distractors. Our method achieves state-of-the-art reconstruction quality both visually and quantitatively, on casual captures. Additional results available at: https://spotlesssplats.github.io", + "arxiv_url": "http://arxiv.org/abs/2406.20055v2", + "pdf_url": "http://arxiv.org/pdf/2406.20055v2", + "published_date": "2024-06-28", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting", + "authors": [ + "Daiwei Zhang", + "Gengyan Li", + "Jiajie Li", + "Mickaël Bressieux", + "Otmar Hilliges", + "Marc Pollefeys", + "Luc Van Gool", + "Xi Wang" + ], + "abstract": "Human activities are inherently complex, often involving numerous object interactions. To better understand these activities, it is crucial to model their interactions with the environment captured through dynamic changes. The recent availability of affordable head-mounted cameras and egocentric data offers a more accessible and efficient means to understand human-object interactions in 3D environments. However, most existing methods for human activity modeling neglect the dynamic interactions with objects, resulting in only static representations. The few existing solutions often require inputs from multiple sources, including multi-camera setups, depth-sensing cameras, or kinesthetic sensors. To this end, we introduce EgoGaussian, the first method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone. We leverage the uniquely discrete nature of Gaussian Splatting and segment dynamic interactions from the background, with both having explicit representations. Our approach employs a clip-level online learning pipeline that leverages the dynamic nature of human activities, allowing us to reconstruct the temporal evolution of the scene in chronological order and track rigid object motion. EgoGaussian shows significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art. We also qualitatively demonstrate the high quality of the reconstructed models.", + "arxiv_url": "http://arxiv.org/abs/2406.19811v2", + "pdf_url": "http://arxiv.org/pdf/2406.19811v2", + "published_date": "2024-06-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Lightweight Predictive 3D Gaussian Splats", + "authors": [ + "Junli Cao", + "Vidit Goel", + "Chaoyang Wang", + "Anil Kag", + "Ju Hu", + "Sergei Korolev", + "Chenfanfu Jiang", + "Sergey Tulyakov", + "Jian Ren" + ], + "abstract": "Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space. This poses a very practical limitation, prohibiting widespread adoption.Several solutions have been proposed to strike a balance between disk size and rendering quality, noticeably reducing the visual quality. In this work, we propose a new representation that dramatically reduces the hard drive footprint while featuring similar or improved quality when compared to the standard 3D Gaussian splats. When compared to other compact solutions, ours offers higher quality renderings with significantly reduced storage, being able to efficiently run on a mobile device in real-time. Our key observation is that nearby points in the scene can share similar representations. Hence, only a small ratio of 3D points needs to be stored. We introduce an approach to identify such points which are called parent points. The discarded points called children points along with attributes can be efficiently predicted by tiny MLPs.", + "arxiv_url": "http://arxiv.org/abs/2406.19434v1", + "pdf_url": "http://arxiv.org/pdf/2406.19434v1", + "published_date": "2024-06-27", + "categories": [ + "cs.GR", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FAGhead: Fully Animate Gaussian Head from Monocular Videos", + "authors": [ + "Yixin Xuan", + "Xinyang Li", + "Gongxin Yao", + "Shiwei Zhou", + "Donghui Sun", + "Xiaoxin Chen", + "Yu Pan" + ], + "abstract": "High-fidelity reconstruction of 3D human avatars has a wild application in visual reality. In this paper, we introduce FAGhead, a method that enables fully controllable human portraits from monocular videos. We explicit the traditional 3D morphable meshes (3DMM) and optimize the neutral 3D Gaussians to reconstruct with complex expressions. Furthermore, we employ a novel Point-based Learnable Representation Field (PLRF) with learnable Gaussian point positions to enhance reconstruction performance. Meanwhile, to effectively manage the edges of avatars, we introduced the alpha rendering to supervise the alpha value of each pixel. Extensive experimental results on the open-source datasets and our capturing datasets demonstrate that our approach is able to generate high-fidelity 3D head avatars and fully control the expression and pose of the virtual avatars, which is outperforming than existing works.", + "arxiv_url": "http://arxiv.org/abs/2406.19070v2", + "pdf_url": "http://arxiv.org/pdf/2406.19070v2", + "published_date": "2024-06-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos", + "authors": [ + "Colton Stearns", + "Adam Harley", + "Mikaela Uy", + "Florian Dubost", + "Federico Tombari", + "Gordon Wetzstein", + "Leonidas Guibas" + ], + "abstract": "Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while also tracking scene geometry far better than alternative representations. Yet, these methods assume dense multi-view videos as supervision. In this work, we are interested in extending the capability of Gaussian scene representations to casually captured monocular videos. We show that existing 4D Gaussian methods dramatically fail in this setup because the monocular setting is underconstrained. Building off this finding, we propose a method we call Dynamic Gaussian Marbles, which consist of three core modifications that target the difficulties of the monocular setting. First, we use isotropic Gaussian \"marbles'', reducing the degrees of freedom of each Gaussian. Second, we employ a hierarchical divide and-conquer learning strategy to efficiently guide the optimization towards solutions with globally coherent motion. Finally, we add image-level and geometry-level priors into the optimization, including a tracking loss that takes advantage of recent progress in point tracking. By constraining the optimization, Dynamic Gaussian Marbles learns Gaussian trajectories that enable novel-view rendering and accurately capture the 3D motion of the scene elements. We evaluate on the Nvidia Dynamic Scenes dataset and the DyCheck iPhone dataset, and show that Gaussian Marbles significantly outperforms other Gaussian baselines in quality, and is on-par with non-Gaussian representations, all while maintaining the efficiency, compositionality, editability, and tracking benefits of Gaussians. Our project page can be found here https://geometry.stanford.edu/projects/dynamic-gaussian-marbles.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2406.18717v2", + "pdf_url": "http://arxiv.org/pdf/2406.18717v2", + "published_date": "2024-06-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "On Scaling Up 3D Gaussian Splatting Training", + "authors": [ + "Hexu Zhao", + "Haoyang Weng", + "Daohan Lu", + "Ang Li", + "Jinyang Li", + "Aurojit Panda", + "Saining Xie" + ], + "abstract": "3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, limiting its ability to handle high-resolution and large-scale 3D reconstruction tasks due to memory constraints. We introduce Grendel, a distributed system designed to partition 3DGS parameters and parallelize computation across multiple GPUs. As each Gaussian affects a small, dynamic subset of rendered pixels, Grendel employs sparse all-to-all communication to transfer the necessary Gaussians to pixel partitions and performs dynamic load balancing. Unlike existing 3DGS systems that train using one camera view image at a time, Grendel supports batched training with multiple views. We explore various optimization hyperparameter scaling strategies and find that a simple sqrt(batch size) scaling rule is highly effective. Evaluations using large-scale, high-resolution scenes show that Grendel enhances rendering quality by scaling up 3DGS parameters across multiple GPUs. On the Rubble dataset, we achieve a test PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs, compared to a PSNR of 26.28 using 11.2 million Gaussians on a single GPU. Grendel is an open-source project available at: https://github.com/nyu-systems/Grendel-GS", + "arxiv_url": "http://arxiv.org/abs/2406.18533v1", + "pdf_url": "http://arxiv.org/pdf/2406.18533v1", + "published_date": "2024-06-26", + "categories": [ + "cs.CV", + "I.4.5" + ], + "github_url": "https://github.com/nyu-systems/Grendel-GS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality", + "authors": [ + "Taoran Yi", + "Jiemin Fang", + "Zanwei Zhou", + "Junjie Wang", + "Guanjun Wu", + "Lingxi Xie", + "Xiaopeng Zhang", + "Wenyu Liu", + "Xinggang Wang", + "Qi Tian" + ], + "abstract": "Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without control as the generation process may cause indeterminacy. Aiming at highly enhancing the generation quality, we propose a novel framework named GaussianDreamerPro. The main idea is to bind Gaussians to reasonable geometry, which evolves over the whole generation process. Along different stages of our framework, both the geometry and appearance can be enriched progressively. The final output asset is constructed with 3D Gaussians bound to mesh, which shows significantly enhanced details and quality compared with previous methods. Notably, the generated asset can also be seamlessly integrated into downstream manipulation pipelines, e.g. animation, composition, and simulation etc., greatly promoting its potential in wide applications. Demos are available at https://taoranyi.com/gaussiandreamerpro/.", + "arxiv_url": "http://arxiv.org/abs/2406.18462v1", + "pdf_url": "http://arxiv.org/pdf/2406.18462v1", + "published_date": "2024-06-26", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning", + "authors": [ + "Muhammad Salman Ali", + "Maryam Qamar", + "Sung-Ho Bae", + "Enzo Tartaglione" + ], + "abstract": "In recent times, the utilization of 3D models has gained traction, owing to the capacity for end-to-end training initially offered by Neural Radiance Fields and more recently by 3D Gaussian Splatting (3DGS) models. The latter holds a significant advantage by inherently easing rapid convergence during training and offering extensive editability. However, despite rapid advancements, the literature still lives in its infancy regarding the scalability of these models. In this study, we take some initial steps in addressing this gap, showing an approach that enables both the memory and computational scalability of such models. Specifically, we propose \"Trimming the fat\", a post-hoc gradient-informed iterative pruning technique to eliminate redundant information encoded in the model. Our experimental findings on widely acknowledged benchmarks attest to the effectiveness of our approach, revealing that up to 75% of the Gaussians can be removed while maintaining or even improving upon baseline performance. Our approach achieves around 50$\\times$ compression while preserving performance similar to the baseline model, and is able to speed-up computation up to 600 FPS.", + "arxiv_url": "http://arxiv.org/abs/2406.18214v2", + "pdf_url": "http://arxiv.org/pdf/2406.18214v2", + "published_date": "2024-06-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting", + "authors": [ + "Jiaze Li", + "Zhengyu Wen", + "Luo Zhang", + "Jiangbei Hu", + "Fei Hou", + "Zhebin Zhang", + "Ying He" + ], + "abstract": "The 3D Gaussian Splatting technique has significantly advanced the construction of radiance fields from multi-view images, enabling real-time rendering. While point-based rasterization effectively reduces computational demands for rendering, it often struggles to accurately reconstruct the geometry of the target object, especially under strong lighting. To address this challenge, we introduce a novel approach that combines octree-based implicit surface representations with Gaussian splatting. Our method consists of four stages. Initially, it reconstructs a signed distance field (SDF) and a radiance field through volume rendering, encoding them in a low-resolution octree. The initial SDF represents the coarse geometry of the target object. Subsequently, it introduces 3D Gaussians as additional degrees of freedom, which are guided by the SDF. In the third stage, the optimized Gaussians further improve the accuracy of the SDF, allowing it to recover finer geometric details compared to the initial SDF obtained in the first stage. Finally, it adopts the refined SDF to further optimize the 3D Gaussians via splatting, eliminating those that contribute little to visual appearance. Experimental results show that our method, which leverages the distribution of 3D Gaussians with SDFs, reconstructs more accurate geometry, particularly in images with specular highlights caused by strong lighting.", + "arxiv_url": "http://arxiv.org/abs/2406.18199v1", + "pdf_url": "http://arxiv.org/pdf/2406.18199v1", + "published_date": "2024-06-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VDG: Vision-Only Dynamic Gaussian for Driving Simulation", + "authors": [ + "Hao Li", + "Jingfeng Li", + "Dingwen Zhang", + "Chenming Wu", + "Jieqi Shi", + "Chen Zhao", + "Haocheng Feng", + "Errui Ding", + "Jingdong Wang", + "Junwei Han" + ], + "abstract": "Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (VDG) to boost pose and depth initialization and static-dynamic decomposition. Moreover, VDG can work with only RGB image input and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method. We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods. Additional video and source code will be posted on our project page at https://3d-aigc.github.io/VDG.", + "arxiv_url": "http://arxiv.org/abs/2406.18198v1", + "pdf_url": "http://arxiv.org/pdf/2406.18198v1", + "published_date": "2024-06-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text", + "authors": [ + "Xinyang Li", + "Zhangyu Lai", + "Linning Xu", + "Yansong Qu", + "Liujuan Cao", + "Shengchuan Zhang", + "Bo Dai", + "Rongrong Ji" + ], + "abstract": "Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined cameras. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories. To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions. (2) Next, a Gaussian-driven Multi-view Latent Diffusion Model serves as the Decorator, modeling the image sequence distribution given the camera trajectories and texts. This model, fine-tuned from a 2D diffusion model, directly generates pixel-aligned 3D Gaussians as an immediate 3D scene representation for consistent denoising. (3) Lastly, the 3D Gaussians are refined by a novel SDS++ loss as the Detailer, which incorporates the prior of the 2D diffusion model. Extensive experiments demonstrate that Director3D outperforms existing methods, offering superior performance in real-world 3D generation.", + "arxiv_url": "http://arxiv.org/abs/2406.17601v1", + "pdf_url": "http://arxiv.org/pdf/2406.17601v1", + "published_date": "2024-06-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods", + "authors": [ + "Jonas Kulhanek", + "Torsten Sattler" + ], + "abstract": "Novel view synthesis is an important problem with many applications, including AR/VR, gaming, and simulations for robotics. With the recent rapid development of Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) methods, it is becoming difficult to keep track of the current state of the art (SoTA) due to methods using different evaluation protocols, codebases being difficult to install and use, and methods not generalizing well to novel 3D scenes. Our experiments support this claim by showing that tiny differences in evaluation protocols of various methods can lead to inconsistent reported metrics. To address these issues, we propose a framework called NerfBaselines, which simplifies the installation of various methods, provides consistent benchmarking tools, and ensures reproducibility. We validate our implementation experimentally by reproducing numbers reported in the original papers. To further improve the accessibility, we release a web platform where commonly used methods are compared on standard benchmarks. Web: https://jkulhanek.com/nerfbaselines", + "arxiv_url": "http://arxiv.org/abs/2406.17345v1", + "pdf_url": "http://arxiv.org/pdf/2406.17345v1", + "published_date": "2024-06-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Reducing the Memory Footprint of 3D Gaussian Splatting", + "authors": [ + "Panagiotis Papantonakis", + "Georgios Kopanas", + "Bernhard Kerbl", + "Alexandre Lanvin", + "George Drettakis" + ], + "abstract": "3D Gaussian splatting provides excellent visual quality for novel view synthesis, with fast training and real-time rendering; unfortunately, the memory requirements of this method for storing and transmission are unreasonably high. We first analyze the reasons for this, identifying three main areas where storage can be reduced: the number of 3D Gaussian primitives used to represent a scene, the number of coefficients for the spherical harmonics used to represent directional radiance, and the precision required to store Gaussian primitive attributes. We present a solution to each of these issues. First, we propose an efficient, resolution-aware primitive pruning approach, reducing the primitive count by half. Second, we introduce an adaptive adjustment method to choose the number of coefficients used to represent directional radiance for each Gaussian primitive, and finally a codebook-based quantization method, together with a half-float representation for further memory reduction. Taken together, these three components result in a 27 reduction in overall size on disk on the standard datasets we tested, along with a 1.7 speedup in rendering speed. We demonstrate our method on standard datasets and show how our solution results in significantly reduced download times when using the method on a mobile device.", + "arxiv_url": "http://arxiv.org/abs/2406.17074v1", + "pdf_url": "http://arxiv.org/pdf/2406.17074v1", + "published_date": "2024-06-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "From Perfect to Noisy World Simulation: Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking", + "authors": [ + "Xiaohao Xu", + "Tianyi Zhang", + "Sibo Wang", + "Xiang Li", + "Yongqi Chen", + "Ye Li", + "Bhiksha Raj", + "Matthew Johnson-Roberson", + "Xiaonan Huang" + ], + "abstract": "Embodied agents require robust navigation systems to operate in unstructured environments, making the robustness of Simultaneous Localization and Mapping (SLAM) models critical to embodied agent autonomy. While real-world datasets are invaluable, simulation-based benchmarks offer a scalable approach for robustness evaluations. However, the creation of a challenging and controllable noisy world with diverse perturbations remains under-explored. To this end, we propose a novel, customizable pipeline for noisy data synthesis, aimed at assessing the resilience of multi-modal SLAM models against various perturbations. The pipeline comprises a comprehensive taxonomy of sensor and motion perturbations for embodied multi-modal (specifically RGB-D) sensing, categorized by their sources and propagation order, allowing for procedural composition. We also provide a toolbox for synthesizing these perturbations, enabling the transformation of clean environments into challenging noisy simulations. Utilizing the pipeline, we instantiate the large-scale Noisy-Replica benchmark, which includes diverse perturbation types, to evaluate the risk tolerance of existing advanced RGB-D SLAM models. Our extensive analysis uncovers the susceptibilities of both neural (NeRF and Gaussian Splatting -based) and non-neural SLAM models to disturbances, despite their demonstrated accuracy in standard benchmarks. Our code is publicly available at https://github.com/Xiaohao-Xu/SLAM-under-Perturbation.", + "arxiv_url": "http://arxiv.org/abs/2406.16850v1", + "pdf_url": "http://arxiv.org/pdf/2406.16850v1", + "published_date": "2024-06-24", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "https://github.com/Xiaohao-Xu/SLAM-under-Perturbation", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ClotheDreamer: Text-Guided Garment Generation with 3D Gaussians", + "authors": [ + "Yufei Liu", + "Junshu Tang", + "Chu Zheng", + "Shijie Zhang", + "Jinkun Hao", + "Junwei Zhu", + "Dongjin Huang" + ], + "abstract": "High-fidelity 3D garment synthesis from text is desirable yet challenging for digital avatar creation. Recent diffusion-based approaches via Score Distillation Sampling (SDS) have enabled new possibilities but either intricately couple with human body or struggle to reuse. We introduce ClotheDreamer, a 3D Gaussian-based method for generating wearable, production-ready 3D garment assets from text prompts. We propose a novel representation Disentangled Clothe Gaussian Splatting (DCGS) to enable separate optimization. DCGS represents clothed avatar as one Gaussian model but freezes body Gaussian splats. To enhance quality and completeness, we incorporate bidirectional SDS to supervise clothed avatar and garment RGBD renderings respectively with pose conditions and propose a new pruning strategy for loose clothing. Our approach can also support custom clothing templates as input. Benefiting from our design, the synthetic 3D garment can be easily applied to virtual try-on and support physically accurate animation. Extensive experiments showcase our method's superior and competitive performance. Our project page is at https://ggxxii.github.io/clothedreamer.", + "arxiv_url": "http://arxiv.org/abs/2406.16815v1", + "pdf_url": "http://arxiv.org/pdf/2406.16815v1", + "published_date": "2024-06-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LGS: A Light-weight 4D Gaussian Splatting for Efficient Surgical Scene Reconstruction", + "authors": [ + "Hengyu Liu", + "Yifan Liu", + "Chenxin Li", + "Wuyang Li", + "Yixuan Yuan" + ], + "abstract": "The advent of 3D Gaussian Splatting (3D-GS) techniques and their dynamic scene modeling variants, 4D-GS, offers promising prospects for real-time rendering of dynamic surgical scenarios. However, the prerequisite for modeling dynamic scenes by a large number of Gaussian units, the high-dimensional Gaussian attributes and the high-resolution deformation fields, all lead to serve storage issues that hinder real-time rendering in resource-limited surgical equipment. To surmount these limitations, we introduce a Lightweight 4D Gaussian Splatting framework (LGS) that can liberate the efficiency bottlenecks of both rendering and storage for dynamic endoscopic reconstruction. Specifically, to minimize the redundancy of Gaussian quantities, we propose Deformation-Aware Pruning by gauging the impact of each Gaussian on deformation. Concurrently, to reduce the redundancy of Gaussian attributes, we simplify the representation of textures and lighting in non-crucial areas by pruning the dimensions of Gaussian attributes. We further resolve the feature field redundancy caused by the high resolution of 4D neural spatiotemporal encoder for modeling dynamic scenes via a 4D feature field condensation. Experiments on public benchmarks demonstrate efficacy of LGS in terms of a compression rate exceeding 9 times while maintaining the pleasing visual quality and real-time rendering efficiency. LGS confirms a substantial step towards its application in robotic surgical services.", + "arxiv_url": "http://arxiv.org/abs/2406.16073v1", + "pdf_url": "http://arxiv.org/pdf/2406.16073v1", + "published_date": "2024-06-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Taming 3DGS: High-Quality Radiance Fields with Limited Resources", + "authors": [ + "Saswat Subhajyoti Mallick", + "Rahul Goel", + "Bernhard Kerbl", + "Francisco Vicente Carrasco", + "Markus Steinberger", + "Fernando De La Torre" + ], + "abstract": "3D Gaussian Splatting (3DGS) has transformed novel-view synthesis with its fast, interpretable, and high-fidelity rendering. However, its resource requirements limit its usability. Especially on constrained devices, training performance degrades quickly and often cannot complete due to excessive memory consumption of the model. The method converges with an indefinite number of Gaussians -- many of them redundant -- making rendering unnecessarily slow and preventing its usage in downstream tasks that expect fixed-size inputs. To address these issues, we tackle the challenges of training and rendering 3DGS models on a budget. We use a guided, purely constructive densification process that steers densification toward Gaussians that raise the reconstruction quality. Model size continuously increases in a controlled manner towards an exact budget, using score-based densification of Gaussians with training-time priors that measure their contribution. We further address training speed obstacles: following a careful analysis of 3DGS' original pipeline, we derive faster, numerically equivalent solutions for gradient computation and attribute updates, including an alternative parallelization for efficient backpropagation. We also propose quality-preserving approximations where suitable to reduce training time even further. Taken together, these enhancements yield a robust, scalable solution with reduced training times, lower compute and memory requirements, and high quality. Our evaluation shows that in a budgeted setting, we obtain competitive quality metrics with 3DGS while achieving a 4--5x reduction in both model size and training time. With more generous budgets, our measured quality surpasses theirs. These advances open the door for novel-view synthesis in constrained environments, e.g., mobile devices.", + "arxiv_url": "http://arxiv.org/abs/2406.15643v1", + "pdf_url": "http://arxiv.org/pdf/2406.15643v1", + "published_date": "2024-06-21", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation", + "authors": [ + "Chubin Zhang", + "Hongliang Song", + "Yi Wei", + "Yu Chen", + "Jiwen Lu", + "Yansong Tang" + ], + "abstract": "In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. This limits these methods to a low-resolution representation and makes it difficult to scale up to the dense views for better quality. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms to effectively integrate image features into 3D representations. We implement this solution through a two-stage pipeline: initially, a lightweight proposal network generates a sparse set of 3D anchor points from the posed image inputs; subsequently, a specialized reconstruction transformer refines the geometry and retrieves textural details. Extensive experimental results demonstrate that GeoLRM significantly outperforms existing models, especially for dense view inputs. We also demonstrate the practical applicability of our model with 3D generation tasks, showcasing its versatility and potential for broader adoption in real-world applications. The project page: https://linshan-bin.github.io/GeoLRM/.", + "arxiv_url": "http://arxiv.org/abs/2406.15333v2", + "pdf_url": "http://arxiv.org/pdf/2406.15333v2", + "published_date": "2024-06-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks", + "authors": [ + "Alex Quach", + "Makram Chahine", + "Alexander Amini", + "Ramin Hasani", + "Daniela Rus" + ], + "abstract": "Simulators are powerful tools for autonomous robot learning as they offer scalable data generation, flexible design, and optimization of trajectories. However, transferring behavior learned from simulation data into the real world proves to be difficult, usually mitigated with compute-heavy domain randomization methods or further model fine-tuning. We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks. To this end, we first build a simulator by integrating Gaussian Splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks. In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, crafty programming of expert demonstration training data, and the task understanding capabilities of Liquid networks. Through a series of quantitative flight tests, we demonstrate the robust transfer of navigation skills learned in a single simulation scene directly to the real world. We further show the ability to maintain performance beyond the training environment under drastic distribution and physical environment changes. Our learned Liquid policies, trained on single target manoeuvres curated from a photorealistic simulated indoor flight only, generalize to multi-step hikes onboard a real hardware platform outdoors.", + "arxiv_url": "http://arxiv.org/abs/2406.15149v2", + "pdf_url": "http://arxiv.org/pdf/2406.15149v2", + "published_date": "2024-06-21", + "categories": [ + "cs.RO", + "cs.AI", + "cs.CV", + "68T40, 68U20, 93C85", + "I.2.9; I.2.6" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "E2GS: Event Enhanced Gaussian Splatting", + "authors": [ + "Hiroyuki Deguchi", + "Mana Masuda", + "Takuya Nakabayashi", + "Hideo Saito" + ], + "abstract": "Event cameras, known for their high dynamic range, absence of motion blur, and low energy usage, have recently found a wide range of applications thanks to these attributes. In the past few years, the field of event-based 3D reconstruction saw remarkable progress, with the Neural Radiance Field (NeRF) based approach demonstrating photorealistic view synthesis results. However, the volume rendering paradigm of NeRF necessitates extensive training and rendering times. In this paper, we introduce Event Enhanced Gaussian Splatting (E2GS), a novel method that incorporates event data into Gaussian Splatting, which has recently made significant advances in the field of novel view synthesis. Our E2GS effectively utilizes both blurry images and event data, significantly improving image deblurring and producing high-quality novel view synthesis. Our comprehensive experiments on both synthetic and real-world datasets demonstrate our E2GS can generate visually appealing renderings while offering faster training and rendering speed (140 FPS). Our code is available at https://github.com/deguchihiroyuki/E2GS.", + "arxiv_url": "http://arxiv.org/abs/2406.14978v1", + "pdf_url": "http://arxiv.org/pdf/2406.14978v1", + "published_date": "2024-06-21", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/deguchihiroyuki/E2GS", + "keywords": [ + "gaussian splatting", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GIC: Gaussian-Informed Continuum for Physical Property Identification and Simulation", + "authors": [ + "Junhao Cai", + "Yuji Yang", + "Weihao Yuan", + "Yisheng He", + "Zilong Dong", + "Liefeng Bo", + "Hui Cheng", + "Qifeng Chen" + ], + "abstract": "This paper studies the problem of estimating physical properties (system identification) through visual observations. To facilitate geometry-aware guidance in physical property estimation, we introduce a novel hybrid framework that leverages 3D Gaussian representation to not only capture explicit shapes but also enable the simulated continuum to render object masks as 2D shape surrogates during training. We propose a new dynamic 3D Gaussian framework based on motion factorization to recover the object as 3D Gaussian point sets across different time states. Furthermore, we develop a coarse-to-fine filling strategy to generate the density fields of the object from the Gaussian reconstruction, allowing for the extraction of object continuums along with their surfaces and the integration of Gaussian attributes into these continuum. In addition to the extracted object surfaces, the Gaussian-informed continuum also enables the rendering of object masks during simulations, serving as 2D-shape guidance for physical property estimation. Extensive experimental evaluations demonstrate that our pipeline achieves state-of-the-art performance across multiple benchmarks and metrics. Additionally, we illustrate the effectiveness of the proposed method through real-world demonstrations, showcasing its practical utility. Our project page is at https://jukgei.github.io/project/gic.", + "arxiv_url": "http://arxiv.org/abs/2406.14927v3", + "pdf_url": "http://arxiv.org/pdf/2406.14927v3", + "published_date": "2024-06-21", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Splatter a Video: Video Gaussian Representation for Versatile Processing", + "authors": [ + "Yang-Tian Sun", + "Yi-Hua Huang", + "Lin Ma", + "Xiaoyang Lyu", + "Yan-Pei Cao", + "Xiaojuan Qi" + ], + "abstract": "Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a novel explicit 3D representation-video Gaussian representation -- that embeds a video into 3D Gaussians. Our proposed representation models video appearance in a 3D canonical space using explicit Gaussians as proxies and associates each Gaussian with 3D motions for video motion. This approach offers a more intrinsic and explicit representation than layered atlas or volumetric pixel matrices. To obtain such a representation, we distill 2D priors, such as optical flow and depth, from foundation models to regularize learning in this ill-posed setting. Extensive applications demonstrate the versatility of our new video representation. It has been proven effective in numerous video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereoscopic video generation. Project page: https://sunyangtian.github.io/spatter_a_video_web/", + "arxiv_url": "http://arxiv.org/abs/2406.13870v2", + "pdf_url": "http://arxiv.org/pdf/2406.13870v2", + "published_date": "2024-06-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sampling 3D Gaussian Scenes in Seconds with Latent Diffusion Models", + "authors": [ + "Paul Henderson", + "Melonie de Almeida", + "Daniela Ivanova", + "Titas Anciukevičius" + ], + "abstract": "We present a latent diffusion model over 3D scenes, that can be trained using only 2D image data. To achieve this, we first design an autoencoder that maps multi-view images to 3D Gaussian splats, and simultaneously builds a compressed latent representation of these splats. Then, we train a multi-view diffusion model over the latent space to learn an efficient generative model. This pipeline does not require object masks nor depths, and is suitable for complex scenes with arbitrary camera positions. We conduct careful experiments on two large-scale datasets of complex real-world scenes -- MVImgNet and RealEstate10K. We show that our approach enables generating 3D scenes in as little as 0.2 seconds, either from scratch, from a single input view, or from sparse input views. It produces diverse and high-quality results while running an order of magnitude faster than non-latent diffusion models and earlier NeRF-based generative models", + "arxiv_url": "http://arxiv.org/abs/2406.13099v1", + "pdf_url": "http://arxiv.org/pdf/2406.13099v1", + "published_date": "2024-06-18", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors", + "authors": [ + "Panwang Pan", + "Zhuo Su", + "Chenguo Lin", + "Zhen Fan", + "Yongjie Zhang", + "Zeming Li", + "Tingting Shen", + "Yadong Mu", + "Yebin Liu" + ], + "abstract": "Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In particular, HumanSplat comprises a 2D multi-view diffusion model and a latent reconstruction transformer with human structure priors that adeptly integrate geometric priors and semantic features within a unified framework. A hierarchical loss that incorporates human semantic information is further designed to achieve high-fidelity texture modeling and better constrain the estimated multiple views. Comprehensive experiments on standard benchmarks and in-the-wild images demonstrate that HumanSplat surpasses existing state-of-the-art methods in achieving photorealistic novel-view synthesis.", + "arxiv_url": "http://arxiv.org/abs/2406.12459v2", + "pdf_url": "http://arxiv.org/pdf/2406.12459v2", + "published_date": "2024-06-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Hierarchical 3D Gaussian Representation for Real-Time Rendering of Very Large Datasets", + "authors": [ + "Bernhard Kerbl", + "Andréas Meuleman", + "Georgios Kopanas", + "Michael Wimmer", + "Alexandre Lanvin", + "George Drettakis" + ], + "abstract": "Novel view synthesis has seen major advances in recent years, with 3D Gaussian splatting offering an excellent level of visual quality, fast training and real-time rendering. However, the resources needed for training and rendering inevitably limit the size of the captured scenes that can be represented with good visual quality. We introduce a hierarchy of 3D Gaussians that preserves visual quality for very large scenes, while offering an efficient Level-of-Detail (LOD) solution for efficient rendering of distant content with effective level selection and smooth transitions between levels.We introduce a divide-and-conquer approach that allows us to train very large scenes in independent chunks. We consolidate the chunks into a hierarchy that can be optimized to further improve visual quality of Gaussians merged into intermediate nodes. Very large captures typically have sparse coverage of the scene, presenting many challenges to the original 3D Gaussian splatting training method; we adapt and regularize training to account for these issues. We present a complete solution, that enables real-time rendering of very large scenes and can adapt to available resources thanks to our LOD method. We show results for captured scenes with up to tens of thousands of images with a simple and affordable rig, covering trajectories of up to several kilometers and lasting up to one hour. Project Page: https://repo-sam.inria.fr/fungraph/hierarchical-3d-gaussians/", + "arxiv_url": "http://arxiv.org/abs/2406.12080v1", + "pdf_url": "http://arxiv.org/pdf/2406.12080v1", + "published_date": "2024-06-17", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians", + "authors": [ + "Bingling Li", + "Shengyi Chen", + "Luchao Wang", + "Kaimin Liao", + "Sijie Yan", + "Yuanjun Xiong" + ], + "abstract": "In this work, we explore the possibility of training high-parameter 3D Gaussian splatting (3DGS) models on large-scale, high-resolution datasets. We design a general model parallel training method for 3DGS, named RetinaGS, which uses a proper rendering equation and can be applied to any scene and arbitrary distribution of Gaussian primitives. It enables us to explore the scaling behavior of 3DGS in terms of primitive numbers and training resolutions that were difficult to explore before and surpass previous state-of-the-art reconstruction quality. We observe a clear positive trend of increasing visual quality when increasing primitive numbers with our method. We also demonstrate the first attempt at training a 3DGS model with more than one billion primitives on the full MatrixCity dataset that attains a promising visual quality.", + "arxiv_url": "http://arxiv.org/abs/2406.11836v2", + "pdf_url": "http://arxiv.org/pdf/2406.11836v2", + "published_date": "2024-06-17", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Effective Rank Analysis and Regularization for Enhanced 3D Gaussian Splatting", + "authors": [ + "Junha Hyung", + "Susung Hong", + "Sungwon Hwang", + "Jaeseong Lee", + "Jaegul Choo", + "Jin-Hwa Kim" + ], + "abstract": "3D reconstruction from multi-view images is one of the fundamental challenges in computer vision and graphics. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising technique capable of real-time rendering with high-quality 3D reconstruction. This method utilizes 3D Gaussian representation and tile-based splatting techniques, bypassing the expensive neural field querying. Despite its potential, 3DGS encounters challenges, including needle-like artifacts, suboptimal geometries, and inaccurate normals, due to the Gaussians converging into anisotropic Gaussians with one dominant variance. We propose using effective rank analysis to examine the shape statistics of 3D Gaussian primitives, and identify the Gaussians indeed converge into needle-like shapes with the effective rank 1. To address this, we introduce effective rank as a regularization, which constrains the structure of the Gaussians. Our new regularization method enhances normal and geometry reconstruction while reducing needle-like artifacts. The approach can be integrated as an add-on module to other 3DGS variants, improving their quality without compromising visual fidelity.", + "arxiv_url": "http://arxiv.org/abs/2406.11672v2", + "pdf_url": "http://arxiv.org/pdf/2406.11672v2", + "published_date": "2024-06-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Projecting Radiance Fields to Mesh Surfaces", + "authors": [ + "Adrian Xuan Wei Lim", + "Lynnette Hui Xian Ng", + "Nicholas Kyger", + "Tomo Michigami", + "Faraz Baghernezhad" + ], + "abstract": "Radiance fields produce high fidelity images with high rendering speed, but are difficult to manipulate. We effectively perform avatar texture transfer across different appearances by combining benefits from radiance fields and mesh surfaces. We represent the source as a radiance field using 3D Gaussian Splatter, then project the Gaussians on the target mesh. Our pipeline consists of Source Preconditioning, Target Vectorization and Texture Projection. The projection completes in 1.12s in a pure CPU compute, compared to baselines techniques of Per Face Texture Projection and Ray Casting (31s, 4.1min). This method lowers the computational requirements, which makes it applicable to a broader range of devices from low-end mobiles to high end computers.", + "arxiv_url": "http://arxiv.org/abs/2406.11570v1", + "pdf_url": "http://arxiv.org/pdf/2406.11570v1", + "published_date": "2024-06-17", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods", + "authors": [ + "Milena T. Bagdasarian", + "Paul Knoll", + "Yi-Hsin Li", + "Florian Barthel", + "Anna Hilsmann", + "Peter Eisert", + "Wieland Morgenstern" + ], + "abstract": "3D Gaussian Splatting (3DGS) has emerged as a cutting-edge technique for real-time radiance field rendering, offering state-of-the-art performance in terms of both quality and speed. 3DGS models a scene as a collection of three-dimensional Gaussians, or splats, with additional attributes optimized to conform to the scene's geometric and visual properties. Despite its advantages in rendering speed and image fidelity, 3DGS is limited by its significant storage and memory demands. These high demands make 3DGS impractical for mobile devices or headsets, reducing its applicability in important areas of computer graphics. To address these challenges and advance the practicality of 3DGS, this survey provides a comprehensive and detailed examination of compression and compaction techniques developed to make 3DGS more efficient. We categorize current approaches into compression techniques, which aim at achieving the highest quality at minimal data size, and compaction techniques, which aim for optimal quality with the fewest Gaussians. We introduce the basic mathematical concepts underlying the analyzed methods, as well as key implementation details and design choices. Our report thoroughly discusses similarities and differences among the methods, as well as their respective advantages and disadvantages. We establish a consistent standard for comparing these methods based on key performance metrics and datasets. Specifically, since these methods have been developed in parallel and over a short period of time, currently, no comprehensive comparison exists. This survey, for the first time, presents a unified standard to evaluate 3DGS compression techniques. To facilitate the continuous monitoring of emerging methodologies, we maintain a dedicated website that will be regularly updated with new techniques and revisions of existing findings https://w-m.github.io/3dgs-compression-survey/ .", + "arxiv_url": "http://arxiv.org/abs/2407.09510v4", + "pdf_url": "http://arxiv.org/pdf/2407.09510v4", + "published_date": "2024-06-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics", + "authors": [ + "Jad Abou-Chakra", + "Krishan Rana", + "Feras Dayoub", + "Niko Sünderhauf" + ], + "abstract": "For robots to robustly understand and interact with the physical world, it is highly beneficial to have a comprehensive representation - modelling geometry, physics, and visual observations - that informs perception, planning, and control algorithms. We propose a novel dual Gaussian-Particle representation that models the physical world while (i) enabling predictive simulation of future states and (ii) allowing online correction from visual observations in a dynamic world. Our representation comprises particles that capture the geometrical aspect of objects in the world and can be used alongside a particle-based physics system to anticipate physically plausible future states. Attached to these particles are 3D Gaussians that render images from any viewpoint through a splatting process thus capturing the visual state. By comparing the predicted and observed images, our approach generates visual forces that correct the particle positions while respecting known physical constraints. By integrating predictive physical modelling with continuous visually-derived corrections, our unified representation reasons about the present and future while synchronizing with reality. Our system runs in realtime at 30Hz using only 3 cameras. We validate our approach on 2D and 3D tracking tasks as well as photometric reconstruction quality. Videos are found at https://embodied-gaussians.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2406.10788v1", + "pdf_url": "http://arxiv.org/pdf/2406.10788v1", + "published_date": "2024-06-16", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections", + "authors": [ + "Jiacong Xu", + "Yiqun Mei", + "Vishal M. Patel" + ], + "abstract": "Photographs captured in unstructured tourist environments frequently exhibit variable appearances and transient occlusions, challenging accurate scene reconstruction and inducing artifacts in novel view synthesis. Although prior approaches have integrated the Neural Radiance Field (NeRF) with additional learnable modules to handle the dynamic appearances and eliminate transient objects, their extensive training demands and slow rendering speeds limit practical deployments. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising alternative to NeRF, offering superior training and inference efficiency along with better rendering quality. This paper presents Wild-GS, an innovative adaptation of 3DGS optimized for unconstrained photo collections while preserving its efficiency benefits. Wild-GS determines the appearance of each 3D Gaussian by their inherent material attributes, global illumination and camera properties per image, and point-level local variance of reflectance. Unlike previous methods that model reference features in image space, Wild-GS explicitly aligns the pixel appearance features to the corresponding local Gaussians by sampling the triplane extracted from the reference image. This novel design effectively transfers the high-frequency detailed appearance of the reference view to 3D space and significantly expedites the training process. Furthermore, 2D visibility maps and depth regularization are leveraged to mitigate the transient effects and constrain the geometry, respectively. Extensive experiments demonstrate that Wild-GS achieves state-of-the-art rendering performance and the highest efficiency in both training and inference among all the existing techniques.", + "arxiv_url": "http://arxiv.org/abs/2406.10373v1", + "pdf_url": "http://arxiv.org/pdf/2406.10373v1", + "published_date": "2024-06-14", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting", + "authors": [ + "Alex Hanson", + "Allen Tu", + "Vasu Singla", + "Mayuka Jayawardhana", + "Matthias Zwicker", + "Tom Goldstein" + ], + "abstract": "Recent advancements in novel view synthesis have enabled real-time rendering speeds and high reconstruction accuracy. 3D Gaussian Splatting (3D-GS), a foundational point-based parametric 3D scene representation, models scenes as large sets of 3D Gaussians. Complex scenes can comprise of millions of Gaussians, amounting to large storage and memory requirements that limit the viability of 3D-GS on devices with limited resources. Current techniques for compressing these pretrained models by pruning Gaussians rely on combining heuristics to determine which ones to remove. In this paper, we propose a principled spatial sensitivity pruning score that outperforms these approaches. It is computed as a second-order approximation of the reconstruction error on the training views with respect to the spatial parameters of each Gaussian. Additionally, we propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model without changing the training pipeline. After pruning 88.44% of the Gaussians, we observe that our PUP 3D-GS pipeline increases the average rendering speed of 3D-GS by 2.65$\\times$ while retaining more salient foreground information and achieving higher image quality metrics than previous pruning techniques on scenes from the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets.", + "arxiv_url": "http://arxiv.org/abs/2406.10219v1", + "pdf_url": "http://arxiv.org/pdf/2406.10219v1", + "published_date": "2024-06-14", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "L4GM: Large 4D Gaussian Reconstruction Model", + "authors": [ + "Jiawei Ren", + "Kevin Xie", + "Ashkan Mirzaei", + "Hanxue Liang", + "Xiaohui Zeng", + "Karsten Kreis", + "Ziwei Liu", + "Antonio Torralba", + "Sanja Fidler", + "Seung Wook Kim", + "Huan Ling" + ], + "abstract": "We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.", + "arxiv_url": "http://arxiv.org/abs/2406.10324v1", + "pdf_url": "http://arxiv.org/pdf/2406.10324v1", + "published_date": "2024-06-14", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors", + "authors": [ + "Xiqian Yu", + "Hanxin Zhu", + "Tianyu He", + "Zhibo Chen" + ], + "abstract": "Achieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed. In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-quality images at a faster rendering speed. To alleviate the shortage of data for higher-resolution synthesis, we propose to leverage off-the-shelf 2D diffusion priors by distilling the 2D knowledge into 3D with Score Distillation Sampling (SDS). Nevertheless, applying SDS directly to Gaussian-based 3D super-resolution leads to undesirable and redundant 3D Gaussian primitives, due to the randomness brought by generative priors. To mitigate this issue, we introduce two simple yet effective techniques to reduce stochastic disturbances introduced by SDS. Specifically, we 1) shrink the range of diffusion timestep in SDS with an annealing strategy; 2) randomly discard redundant Gaussian primitives during densification. Extensive experiments have demonstrated that our proposed GaussainSR can attain high-quality results for HRNVS with only low-resolution inputs on both synthetic and real-world datasets. Project page: https://chchnii.github.io/GaussianSR/", + "arxiv_url": "http://arxiv.org/abs/2406.10111v1", + "pdf_url": "http://arxiv.org/pdf/2406.10111v1", + "published_date": "2024-06-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion", + "authors": [ + "Trapoom Ukarapol", + "Kevin Pruvost" + ], + "abstract": "Text-to-3D generation has shown promising results, yet common challenges such as the Multi-face Janus problem and extended generation time for high-quality assets. In this paper, we address these issues by introducing a novel three-stage training pipeline called GradeADreamer. This pipeline is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU. Our proposed method employs a Multi-view Diffusion Model, MVDream, to generate Gaussian Splats as a prior, followed by refining geometry and texture using StableDiffusion. Experimental results demonstrate that our approach significantly mitigates the Multi-face Janus problem and achieves the highest average user preference ranking compared to previous state-of-the-art methods. The project code is available at https://github.com/trapoom555/GradeADreamer.", + "arxiv_url": "http://arxiv.org/abs/2406.09850v1", + "pdf_url": "http://arxiv.org/pdf/2406.09850v1", + "published_date": "2024-06-14", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/trapoom555/GradeADreamer", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Unified Gaussian Primitives for Scene Representation and Rendering", + "authors": [ + "Yang Zhou", + "Songyin Wu", + "Ling-Qi Yan" + ], + "abstract": "Searching for a unified scene representation remains a research challenge in computer graphics. Traditional mesh-based representations are unsuitable for dense, fuzzy elements, and introduce additional complexity for filtering and differentiable rendering. Conversely, voxel-based representations struggle to model hard surfaces and suffer from intensive memory requirement. We propose a general-purpose rendering primitive based on 3D Gaussian distribution for unified scene representation, featuring versatile appearance ranging from glossy surfaces to fuzzy elements, as well as physically based scattering to enable accurate global illumination. We formulate the rendering theory for the primitive based on non-exponential transport and derive efficient rendering operations to be compatible with Monte Carlo path tracing. The new representation can be converted from different sources, including meshes and 3D Gaussian splatting, and further refined via transmittance optimization thanks to its differentiability. We demonstrate the versatility of our representation in various rendering applications such as global illumination and appearance editing, while supporting arbitrary lighting conditions by nature. Additionally, we compare our representation to existing volumetric representations, highlighting its efficiency to reproduce details.", + "arxiv_url": "http://arxiv.org/abs/2406.09733v2", + "pdf_url": "http://arxiv.org/pdf/2406.09733v2", + "published_date": "2024-06-14", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Modeling Ambient Scene Dynamics for Free-view Synthesis", + "authors": [ + "Meng-Li Shih", + "Jia-Bin Huang", + "Changil Kim", + "Rajvi Shah", + "Johannes Kopf", + "Chen Gao" + ], + "abstract": "We introduce a novel method for dynamic free-view synthesis of an ambient scenes from a monocular capture bringing a immersive quality to the viewing experience. Our method builds upon the recent advancements in 3D Gaussian Splatting (3DGS) that can faithfully reconstruct complex static scenes. Previous attempts to extend 3DGS to represent dynamics have been confined to bounded scenes or require multi-camera captures, and often fail to generalize to unseen motions, limiting their practical application. Our approach overcomes these constraints by leveraging the periodicity of ambient motions to learn the motion trajectory model, coupled with careful regularization. We also propose important practical strategies to improve the visual quality of the baseline 3DGS static reconstructions and to improve memory efficiency critical for GPU-memory intensive learning. We demonstrate high-quality photorealistic novel view synthesis of several ambient natural scenes with intricate textures and fine structural elements.", + "arxiv_url": "http://arxiv.org/abs/2406.09395v1", + "pdf_url": "http://arxiv.org/pdf/2406.09395v1", + "published_date": "2024-06-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GGHead: Fast and Generalizable 3D Gaussian Heads", + "authors": [ + "Tobias Kirschstein", + "Simon Giebenhain", + "Jiapeng Tang", + "Markos Georgopoulos", + "Matthias Nießner" + ], + "abstract": "Learning 3D head priors from large 2D image collections is an important step towards high-quality 3D-aware human modeling. A core requirement is an efficient architecture that scales well to large-scale datasets and large image resolutions. Unfortunately, existing 3D GANs struggle to scale to generate samples at high resolutions due to their relatively slow train and render speeds, and typically have to rely on 2D superresolution networks at the expense of global 3D consistency. To address these challenges, we propose Generative Gaussian Heads (GGHead), which adopts the recent 3D Gaussian Splatting representation within a 3D GAN framework. To generate a 3D representation, we employ a powerful 2D CNN generator to predict Gaussian attributes in the UV space of a template head mesh. This way, GGHead exploits the regularity of the template's UV layout, substantially facilitating the challenging task of predicting an unstructured set of 3D Gaussians. We further improve the geometric fidelity of the generated 3D representations with a novel total variation loss on rendered UV coordinates. Intuitively, this regularization encourages that neighboring rendered pixels should stem from neighboring Gaussians in the template's UV space. Taken together, our pipeline can efficiently generate 3D heads trained only from single-view 2D image observations. Our proposed framework matches the quality of existing 3D head GANs on FFHQ while being both substantially faster and fully 3D consistent. As a result, we demonstrate real-time generation and rendering of high-quality 3D-consistent heads at $1024^2$ resolution for the first time. Project Website: https://tobias-kirschstein.github.io/gghead", + "arxiv_url": "http://arxiv.org/abs/2406.09377v2", + "pdf_url": "http://arxiv.org/pdf/2406.09377v2", + "published_date": "2024-06-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis", + "authors": [ + "Swapnil Bhosale", + "Haosen Yang", + "Diptesh Kanojia", + "Jiankang Deng", + "Xiatian Zhu" + ], + "abstract": "Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing binaural audio. However, in addition to low efficiency originating from heavy NeRF rendering, these methods all have a limited ability of characterizing the entire scene environment such as room geometry, material properties, and the spatial relation between the listener and sound source. To address these issues, we propose a novel Audio-Visual Gaussian Splatting (AV-GS) model. To obtain a material-aware and geometry-aware condition for audio synthesis, we learn an explicit point-based scene representation with an audio-guidance parameter on locally initialized Gaussian points, taking into account the space relation from the listener and sound source. To make the visual scene model audio adaptive, we propose a point densification and pruning strategy to optimally distribute the Gaussian points, with the per-point contribution in sound propagation (e.g., more points needed for texture-less wall surfaces as they affect sound path diversion). Extensive experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.", + "arxiv_url": "http://arxiv.org/abs/2406.08920v2", + "pdf_url": "http://arxiv.org/pdf/2406.08920v2", + "published_date": "2024-06-13", + "categories": [ + "cs.SD", + "cs.AI", + "eess.AS" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianForest: Hierarchical-Hybrid 3D Gaussian Splatting for Compressed Scene Modeling", + "authors": [ + "Fengyi Zhang", + "Yadan Luo", + "Tianjun Zhang", + "Lin Zhang", + "Zi Huang" + ], + "abstract": "The field of novel-view synthesis has recently witnessed the emergence of 3D Gaussian Splatting, which represents scenes in a point-based manner and renders through rasterization. This methodology, in contrast to Radiance Fields that rely on ray tracing, demonstrates superior rendering quality and speed. However, the explicit and unstructured nature of 3D Gaussians poses a significant storage challenge, impeding its broader application. To address this challenge, we introduce the Gaussian-Forest modeling framework, which hierarchically represents a scene as a forest of hybrid 3D Gaussians. Each hybrid Gaussian retains its unique explicit attributes while sharing implicit ones with its sibling Gaussians, thus optimizing parameterization with significantly fewer variables. Moreover, adaptive growth and pruning strategies are designed, ensuring detailed representation in complex regions and a notable reduction in the number of required Gaussians. Extensive experiments demonstrate that Gaussian-Forest not only maintains comparable speed and quality but also achieves a compression rate surpassing 10 times, marking a significant advancement in efficient scene modeling. Codes will be available at https://github.com/Xian-Bei/GaussianForest.", + "arxiv_url": "http://arxiv.org/abs/2406.08759v2", + "pdf_url": "http://arxiv.org/pdf/2406.08759v2", + "published_date": "2024-06-13", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "https://github.com/Xian-Bei/GaussianForest", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ICE-G: Image Conditional Editing of 3D Gaussian Splats", + "authors": [ + "Vishnu Jaganathan", + "Hannah Hanyun Huang", + "Muhammad Zubair Irshad", + "Varun Jampani", + "Amit Raj", + "Zsolt Kira" + ], + "abstract": "Recently many techniques have emerged to create high quality 3D assets and scenes. When it comes to editing of these objects, however, existing approaches are either slow, compromise on quality, or do not provide enough customization. We introduce a novel approach to quickly edit a 3D model from a single reference view. Our technique first segments the edit image, and then matches semantically corresponding regions across chosen segmented dataset views using DINO features. A color or texture change from a particular region of the edit image can then be applied to other views automatically in a semantically sensible manner. These edited views act as an updated dataset to further train and re-style the 3D scene. The end-result is therefore an edited 3D model. Our framework enables a wide variety of editing tasks such as manual local edits, correspondence based style transfer from any example image, and a combination of different styles from multiple example images. We use Gaussian Splats as our primary 3D representation due to their speed and ease of local editing, but our technique works for other methods such as NeRFs as well. We show through multiple examples that our method produces higher quality results while offering fine-grained control of editing. Project page: ice-gaussian.github.io", + "arxiv_url": "http://arxiv.org/abs/2406.08488v1", + "pdf_url": "http://arxiv.org/pdf/2406.08488v1", + "published_date": "2024-06-12", + "categories": [ + "cs.CV", + "cs.AI", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models", + "authors": [ + "Yuxuan Xue", + "Xianghui Xie", + "Riccardo Marin", + "Gerard Pons-Moll" + ], + "abstract": "Creating realistic avatars from a single RGB image is an attractive yet challenging problem. Due to its ill-posed nature, recent works leverage powerful prior from 2D diffusion models pretrained on large datasets. Although 2D diffusion models demonstrate strong generalization capability, they cannot provide multi-view shape priors with guaranteed 3D consistency. We propose Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion. Our key insight is that 2D multi-view diffusion and 3D reconstruction models provide complementary information for each other, and by coupling them in a tight manner, we can fully leverage the potential of both models. We introduce a novel image-conditioned generative 3D Gaussian Splats reconstruction model that leverages the priors from 2D multi-view diffusion models, and provides an explicit 3D representation, which further guides the 2D reverse sampling process to have better 3D consistency. Experiments show that our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Extensive ablations also validate the efficacy of our design, (1) multi-view 2D priors conditioning in generative 3D reconstruction and (2) consistency refinement of sampling trajectory via the explicit 3D representation. Our code and models will be released on https://yuxuan-xue.com/human-3diffusion.", + "arxiv_url": "http://arxiv.org/abs/2406.08475v1", + "pdf_url": "http://arxiv.org/pdf/2406.08475v1", + "published_date": "2024-06-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "From Chaos to Clarity: 3DGS in the Dark", + "authors": [ + "Zhihao Li", + "Yufei Wang", + "Alex Kot", + "Bihan Wen" + ], + "abstract": "Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shapes that overfit the noise, thereby significantly degrading reconstruction quality and reducing inference speed, especially in scenarios with limited views. To address these issues, we introduce a novel self-supervised learning framework designed to reconstruct HDR 3DGS from a limited number of noisy raw images. This framework enhances 3DGS by integrating a noise extractor and employing a noise-robust reconstruction loss that leverages a noise distribution prior. Experimental results show that our method outperforms LDR/HDR 3DGS and previous state-of-the-art (SOTA) self-supervised and supervised pre-trained models in both reconstruction quality and inference speed on the RawNeRF dataset across a broad range of training views. Code can be found in \\url{https://lizhihao6.github.io/Raw3DGS}.", + "arxiv_url": "http://arxiv.org/abs/2406.08300v1", + "pdf_url": "http://arxiv.org/pdf/2406.08300v1", + "published_date": "2024-06-12", + "categories": [ + "eess.IV", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Trim 3D Gaussian Splatting for Accurate Geometry Representation", + "authors": [ + "Lue Fan", + "Yuxue Yang", + "Minxing Li", + "Hongsheng Li", + "Zhaoxiang Zhang" + ], + "abstract": "In this paper, we introduce Trim 3D Gaussian Splatting (TrimGS) to reconstruct accurate 3D geometry from images. Previous arts for geometry reconstruction from 3D Gaussians mainly focus on exploring strong geometry regularization. Instead, from a fresh perspective, we propose to obtain accurate 3D geometry of a scene by Gaussian trimming, which selectively removes the inaccurate geometry while preserving accurate structures. To achieve this, we analyze the contributions of individual 3D Gaussians and propose a contribution-based trimming strategy to remove the redundant or inaccurate Gaussians. Furthermore, our experimental and theoretical analyses reveal that a relatively small Gaussian scale is a non-negligible factor in representing and optimizing the intricate details. Therefore the proposed TrimGS maintains relatively small Gaussian scales. In addition, TrimGS is also compatible with the effective geometry regularization strategies in previous arts. When combined with the original 3DGS and the state-of-the-art 2DGS, TrimGS consistently yields more accurate geometry and higher perceptual quality. Our project page is https://trimgs.github.io", + "arxiv_url": "http://arxiv.org/abs/2406.07499v1", + "pdf_url": "http://arxiv.org/pdf/2406.07499v1", + "published_date": "2024-06-11", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Cinematic Gaussians: Real-Time HDR Radiance Fields with Depth of Field", + "authors": [ + "Chao Wang", + "Krzysztof Wolski", + "Bernhard Kerbl", + "Ana Serrano", + "Mojtaba Bemana", + "Hans-Peter Seidel", + "Karol Myszkowski", + "Thomas Leimkühler" + ], + "abstract": "Radiance field methods represent the state of the art in reconstructing complex scenes from multi-view photos. However, these reconstructions often suffer from one or both of the following limitations: First, they typically represent scenes in low dynamic range (LDR), which restricts their use to evenly lit environments and hinders immersive viewing experiences. Secondly, their reliance on a pinhole camera model, assuming all scene elements are in focus in the input images, presents practical challenges and complicates refocusing during novel-view synthesis. Addressing these limitations, we present a lightweight method based on 3D Gaussian Splatting that utilizes multi-view LDR images of a scene with varying exposure times, apertures, and focus distances as input to reconstruct a high-dynamic-range (HDR) radiance field. By incorporating analytical convolutions of Gaussians based on a thin-lens camera model as well as a tonemapping module, our reconstructions enable the rendering of HDR content with flexible refocusing capabilities. We demonstrate that our combined treatment of HDR and depth of field facilitates real-time cinematic rendering, outperforming the state of the art.", + "arxiv_url": "http://arxiv.org/abs/2406.07329v4", + "pdf_url": "http://arxiv.org/pdf/2406.07329v4", + "published_date": "2024-06-11", + "categories": [ + "cs.CV", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianCity: Generative Gaussian Splatting for Unbounded 3D City Generation", + "authors": [ + "Haozhe Xie", + "Zhaoxi Chen", + "Fangzhou Hong", + "Ziwei Liu" + ], + "abstract": "3D city generation with NeRF-based methods shows promising generation results but is computationally inefficient. Recently 3D Gaussian Splatting (3D-GS) has emerged as a highly efficient alternative for object-level 3D generation. However, adapting 3D-GS from finite-scale 3D objects and humans to infinite-scale 3D cities is non-trivial. Unbounded 3D city generation entails significant storage overhead (out-of-memory issues), arising from the need to expand points to billions, often demanding hundreds of Gigabytes of VRAM for a city scene spanning 10km^2. In this paper, we propose GaussianCity, a generative Gaussian Splatting framework dedicated to efficiently synthesizing unbounded 3D cities with a single feed-forward pass. Our key insights are two-fold: 1) Compact 3D Scene Representation: We introduce BEV-Point as a highly compact intermediate representation, ensuring that the growth in VRAM usage for unbounded scenes remains constant, thus enabling unbounded city generation. 2) Spatial-aware Gaussian Attribute Decoder: We present spatial-aware BEV-Point decoder to produce 3D Gaussian attributes, which leverages Point Serializer to integrate the structural and contextual characteristics of BEV points. Extensive experiments demonstrate that GaussianCity achieves state-of-the-art results in both drone-view and street-view 3D city generation. Notably, compared to CityDreamer, GaussianCity exhibits superior performance with a speedup of 60 times (10.72 FPS v.s. 0.18 FPS).", + "arxiv_url": "http://arxiv.org/abs/2406.06526v2", + "pdf_url": "http://arxiv.org/pdf/2406.06526v2", + "published_date": "2024-06-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction", + "authors": [ + "Danpeng Chen", + "Hai Li", + "Weicai Ye", + "Yifan Wang", + "Weijian Xie", + "Shangjin Zhai", + "Nan Wang", + "Haomin Liu", + "Hujun Bao", + "Guofeng Zhang" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has attracted widespread attention due to its high-quality rendering, and ultra-fast training and rendering speed. However, due to the unstructured and irregular nature of Gaussian point clouds, it is difficult to guarantee geometric reconstruction accuracy and multi-view consistency simply by relying on image reconstruction loss. Although many studies on surface reconstruction based on 3DGS have emerged recently, the quality of their meshes is generally unsatisfactory. To address this problem, we propose a fast planar-based Gaussian splatting reconstruction representation (PGSR) to achieve high-fidelity surface reconstruction while ensuring high-quality rendering. Specifically, we first introduce an unbiased depth rendering method, which directly renders the distance from the camera origin to the Gaussian plane and the corresponding normal map based on the Gaussian distribution of the point cloud, and divides the two to obtain the unbiased depth. We then introduce single-view geometric, multi-view photometric, and geometric regularization to preserve global geometric accuracy. We also propose a camera exposure compensation model to cope with scenes with large illumination variations. Experiments on indoor and outdoor scenes show that our method achieves fast training and rendering while maintaining high-fidelity rendering and geometric reconstruction, outperforming 3DGS-based and NeRF-based methods.", + "arxiv_url": "http://arxiv.org/abs/2406.06521v1", + "pdf_url": "http://arxiv.org/pdf/2406.06521v1", + "published_date": "2024-06-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVGamba: Unify 3D Content Generation as State Space Sequence Modeling", + "authors": [ + "Xuanyu Yi", + "Zike Wu", + "Qiuhong Shen", + "Qingshan Xu", + "Pan Zhou", + "Joo-Hwee Lim", + "Shuicheng Yan", + "Xinchao Wang", + "Hanwang Zhang" + ], + "abstract": "Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only $0.1\\times$ of the model size.", + "arxiv_url": "http://arxiv.org/abs/2406.06367v2", + "pdf_url": "http://arxiv.org/pdf/2406.06367v2", + "published_date": "2024-06-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Lighting Every Darkness with 3DGS: Fast Training and Real-Time Rendering for HDR View Synthesis", + "authors": [ + "Xin Jin", + "Pengyi Jiao", + "Zheng-Peng Duan", + "Xingchao Yang", + "Chun-Le Guo", + "Bo Ren", + "Chongyi Li" + ], + "abstract": "Volumetric rendering based methods, like NeRF, excel in HDR view synthesis from RAWimages, especially for nighttime scenes. While, they suffer from long training times and cannot perform real-time rendering due to dense sampling requirements. The advent of 3D Gaussian Splatting (3DGS) enables real-time rendering and faster training. However, implementing RAW image-based view synthesis directly using 3DGS is challenging due to its inherent drawbacks: 1) in nighttime scenes, extremely low SNR leads to poor structure-from-motion (SfM) estimation in distant views; 2) the limited representation capacity of spherical harmonics (SH) function is unsuitable for RAW linear color space; and 3) inaccurate scene structure hampers downstream tasks such as refocusing. To address these issues, we propose LE3D (Lighting Every darkness with 3DGS). Our method proposes Cone Scatter Initialization to enrich the estimation of SfM, and replaces SH with a Color MLP to represent the RAW linear color space. Additionally, we introduce depth distortion and near-far regularizations to improve the accuracy of scene structure for downstream tasks. These designs enable LE3D to perform real-time novel view synthesis, HDR rendering, refocusing, and tone-mapping changes. Compared to previous volumetric rendering based methods, LE3D reduces training time to 1% and improves rendering speed by up to 4,000 times for 2K resolution images in terms of FPS. Code and viewer can be found in https://github.com/Srameo/LE3D .", + "arxiv_url": "http://arxiv.org/abs/2406.06216v1", + "pdf_url": "http://arxiv.org/pdf/2406.06216v1", + "published_date": "2024-06-10", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/Srameo/LE3D", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Bits-to-Photon: End-to-End Learned Scalable Point Cloud Compression for Direct Rendering", + "authors": [ + "Yueyu Hu", + "Ran Gong", + "Yao Wang" + ], + "abstract": "Point cloud is a promising 3D representation for volumetric streaming in emerging AR/VR applications. Despite recent advances in point cloud compression, decoding and rendering high-quality images from lossy compressed point clouds is still challenging in terms of quality and complexity, making it a major roadblock to achieve real-time 6-Degree-of-Freedom video streaming. In this paper, we address this problem by developing a point cloud compression scheme that generates a bit stream that can be directly decoded to renderable 3D Gaussians. The encoder and decoder are jointly optimized to consider both bit-rates and rendering quality. It significantly improves the rendering quality while substantially reducing decoding and rendering time, compared to existing point cloud compression methods. Furthermore, the proposed scheme generates a scalable bit stream, allowing multiple levels of details at different bit-rate ranges. Our method supports real-time color decoding and rendering of high quality point clouds, thus paving the way for interactive 3D streaming applications with free view points.", + "arxiv_url": "http://arxiv.org/abs/2406.05915v2", + "pdf_url": "http://arxiv.org/pdf/2406.05915v2", + "published_date": "2024-06-09", + "categories": [ + "cs.CV", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "InfoGaussian: Structure-Aware Dynamic Gaussians through Lightweight Information Shaping", + "authors": [ + "Yunchao Zhang", + "Guandao Yang", + "Leonidas Guibas", + "Yanchao Yang" + ], + "abstract": "3D Gaussians, as a low-level scene representation, typically involve thousands to millions of Gaussians. This makes it difficult to control the scene in ways that reflect the underlying dynamic structure, where the number of independent entities is typically much smaller. In particular, it can be challenging to animate and move objects in the scene, which requires coordination among many Gaussians. To address this issue, we develop a mutual information shaping technique that enforces movement resonance between correlated Gaussians in a motion network. Such correlations can be learned from putative 2D object masks in different views. By approximating the mutual information with the Jacobians of the motions, our method ensures consistent movements of the Gaussians composing different objects under various perturbations. In particular, we develop an efficient contrastive training pipeline with lightweight optimization to shape the motion network, avoiding the need for re-shaping throughout the motion sequence. Notably, our training only touches a small fraction of all Gaussians in the scene yet attains the desired compositional behavior according to the underlying dynamic structure. The proposed technique is evaluated on challenging scenes and demonstrates significant performance improvement in promoting consistent movements and 3D object segmentation while inducing low computation and memory requirements.", + "arxiv_url": "http://arxiv.org/abs/2406.05897v1", + "pdf_url": "http://arxiv.org/pdf/2406.05897v1", + "published_date": "2024-06-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Simplicits: Mesh-Free, Geometry-Agnostic, Elastic Simulation", + "authors": [ + "Vismay Modi", + "Nicholas Sharp", + "Or Perel", + "Shinjiro Sueda", + "David I. W. Levin" + ], + "abstract": "The proliferation of 3D representations, from explicit meshes to implicit neural fields and more, motivates the need for simulators agnostic to representation. We present a data-, mesh-, and grid-free solution for elastic simulation for any object in any geometric representation undergoing large, nonlinear deformations. We note that every standard geometric representation can be reduced to an occupancy function queried at any point in space, and we define a simulator atop this common interface. For each object, we fit a small implicit neural network encoding spatially varying weights that act as a reduced deformation basis. These weights are trained to learn physically significant motions in the object via random perturbations. Our loss ensures we find a weight-space basis that best minimizes deformation energy by stochastically evaluating elastic energies through Monte Carlo sampling of the deformation volume. At runtime, we simulate in the reduced basis and sample the deformations back to the original domain. Our experiments demonstrate the versatility, accuracy, and speed of this approach on data including signed distance functions, point clouds, neural primitives, tomography scans, radiance fields, Gaussian splats, surface meshes, and volume meshes, as well as showing a variety of material energies, contact models, and time integration schemes.", + "arxiv_url": "http://arxiv.org/abs/2407.09497v1", + "pdf_url": "http://arxiv.org/pdf/2407.09497v1", + "published_date": "2024-06-09", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RefGaussian: Disentangling Reflections from 3D Gaussian Splatting for Realistic Rendering", + "authors": [ + "Rui Zhang", + "Tianyue Luo", + "Weidong Yang", + "Ben Fei", + "Jingyi Xu", + "Qingyuan Zhou", + "Keyi Liu", + "Ying He" + ], + "abstract": "3D Gaussian Splatting (3D-GS) has made a notable advancement in the field of neural rendering, 3D scene reconstruction, and novel view synthesis. Nevertheless, 3D-GS encounters the main challenge when it comes to accurately representing physical reflections, especially in the case of total reflection and semi-reflection that are commonly found in real-world scenes. This limitation causes reflections to be mistakenly treated as independent elements with physical presence, leading to imprecise reconstructions. Herein, to tackle this challenge, we propose RefGaussian to disentangle reflections from 3D-GS for realistically modeling reflections. Specifically, we propose to split a scene into transmitted and reflected components and represent these components using two Spherical Harmonics (SH). Given that this decomposition is not fully determined, we employ local regularization techniques to ensure local smoothness for both the transmitted and reflected components, thereby achieving more plausible decomposition outcomes than 3D-GS. Experimental results demonstrate that our approach achieves superior novel view synthesis and accurate depth estimation outcomes. Furthermore, it enables the utilization of scene editing applications, ensuring both high-quality results and physical coherence.", + "arxiv_url": "http://arxiv.org/abs/2406.05852v1", + "pdf_url": "http://arxiv.org/pdf/2406.05852v1", + "published_date": "2024-06-09", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VCR-GauS: View Consistent Depth-Normal Regularizer for Gaussian Surface Reconstruction", + "authors": [ + "Hanlin Chen", + "Fangyin Wei", + "Chen Li", + "Tianxin Huang", + "Yunsong Wang", + "Gim Hee Lee" + ], + "abstract": "Although 3D Gaussian Splatting has been widely studied because of its realistic and efficient novel-view synthesis, it is still challenging to extract a high-quality surface from the point-based representation. Previous works improve the surface by incorporating geometric priors from the off-the-shelf normal estimator. However, there are two main limitations: 1) Supervising normals rendered from 3D Gaussians effectively updates the rotation parameter but is less effective for other geometric parameters; 2) The inconsistency of predicted normal maps across multiple views may lead to severe reconstruction artifacts. In this paper, we propose a Depth-Normal regularizer that directly couples normal with other geometric parameters, leading to full updates of the geometric parameters from normal regularization. We further propose a confidence term to mitigate inconsistencies of normal predictions across multiple views. Moreover, we also introduce a densification and splitting strategy to regularize the size and distribution of 3D Gaussians for more accurate surface modeling. Compared with Gaussian-based baselines, experiments show that our approach obtains better reconstruction quality and maintains competitive appearance quality at faster training speed and 100+ FPS rendering.", + "arxiv_url": "http://arxiv.org/abs/2406.05774v2", + "pdf_url": "http://arxiv.org/pdf/2406.05774v2", + "published_date": "2024-06-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image", + "authors": [ + "Stanislaw Szymanowicz", + "Eldar Insafutdinov", + "Chuanxia Zheng", + "Dylan Campbell", + "João F. Henriques", + "Christian Rupprecht", + "Andrea Vedaldi" + ], + "abstract": "In this paper, we propose Flash3D, a method for scene reconstruction and novel view synthesis from a single image which is both very generalisable and efficient. For generalisability, we start from a \"foundation\" model for monocular depth estimation and extend it to a full 3D shape and appearance reconstructor. For efficiency, we base this extension on feed-forward Gaussian Splatting. Specifically, we predict a first layer of 3D Gaussians at the predicted depth, and then add additional layers of Gaussians that are offset in space, allowing the model to complete the reconstruction behind occlusions and truncations. Flash3D is very efficient, trainable on a single GPU in a day, and thus accessible to most researchers. It achieves state-of-the-art results when trained and tested on RealEstate10k. When transferred to unseen datasets like NYU it outperforms competitors by a large margin. More impressively, when transferred to KITTI, Flash3D achieves better PSNR than methods trained specifically on that dataset. In some instances, it even outperforms recent methods that use multiple views as input. Code, models, demo, and more results are available at https://www.robots.ox.ac.uk/~vgg/research/flash3d/.", + "arxiv_url": "http://arxiv.org/abs/2406.04343v1", + "pdf_url": "http://arxiv.org/pdf/2406.04343v1", + "published_date": "2024-06-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion", + "authors": [ + "Fangfu Liu", + "Hanyang Wang", + "Shunyu Yao", + "Shengjun Zhang", + "Jie Zhou", + "Yueqi Duan" + ], + "abstract": "In recent years, there has been rapid development in 3D generation models, opening up new possibilities for applications such as simulating the dynamic movements of 3D objects and customizing their behaviors. However, current 3D generative models tend to focus only on surface features such as color and shape, neglecting the inherent physical properties that govern the behavior of objects in the real world. To accurately simulate physics-aligned dynamics, it is essential to predict the physical properties of materials and incorporate them into the behavior prediction process. Nonetheless, predicting the diverse materials of real-world objects is still challenging due to the complex nature of their physical attributes. In this paper, we propose \\textbf{Physics3D}, a novel method for learning various physical properties of 3D objects through a video diffusion model. Our approach involves designing a highly generalizable physical simulation system based on a viscoelastic material model, which enables us to simulate a wide range of materials with high-fidelity capabilities. Moreover, we distill the physical priors from a video diffusion model that contains more understanding of realistic object materials. Extensive experiments demonstrate the effectiveness of our method with both elastic and plastic materials. Physics3D shows great potential for bridging the gap between the physical world and virtual neural space, providing a better integration and application of realistic physical principles in virtual environments. Project page: https://liuff19.github.io/Physics3D.", + "arxiv_url": "http://arxiv.org/abs/2406.04338v3", + "pdf_url": "http://arxiv.org/pdf/2406.04338v3", + "published_date": "2024-06-06", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Survey on 3D Human Avatar Modeling -- From Reconstruction to Generation", + "authors": [ + "Ruihe Wang", + "Yukang Cao", + "Kai Han", + "Kwan-Yee K. Wong" + ], + "abstract": "3D modeling has long been an important area in computer vision and computer graphics. Recently, thanks to the breakthroughs in neural representations and generative models, we witnessed a rapid development of 3D modeling. 3D human modeling, lying at the core of many real-world applications, such as gaming and animation, has attracted significant attention. Over the past few years, a large body of work on creating 3D human avatars has been introduced, forming a new and abundant knowledge base for 3D human modeling. The scale of the literature makes it difficult for individuals to keep track of all the works. This survey aims to provide a comprehensive overview of these emerging techniques for 3D human avatar modeling, from both reconstruction and generation perspectives. Firstly, we review representative methods for 3D human reconstruction, including methods based on pixel-aligned implicit function, neural radiance field, and 3D Gaussian Splatting, etc. We then summarize representative methods for 3D human generation, especially those using large language models like CLIP, diffusion models, and various 3D representations, which demonstrate state-of-the-art performance. Finally, we discuss our reflection on existing methods and open challenges for 3D human avatar modeling, shedding light on future research.", + "arxiv_url": "http://arxiv.org/abs/2406.04253v1", + "pdf_url": "http://arxiv.org/pdf/2406.04253v1", + "published_date": "2024-06-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting with Localized Points Management", + "authors": [ + "Haosen Yang", + "Chenhao Zhang", + "Wenqing Wang", + "Marco Volino", + "Adrian Hilton", + "Li Zhang", + "Xiatian Zhu" + ], + "abstract": "Point management is a critical component in optimizing 3D Gaussian Splatting (3DGS) models, as the point initiation (e.g., via structure from motion) is distributionally inappropriate. Typically, the Adaptive Density Control (ADC) algorithm is applied, leveraging view-averaged gradient magnitude thresholding for point densification, opacity thresholding for pruning, and regular all-points opacity reset. However, we reveal that this strategy is limited in tackling intricate/special image regions (e.g., transparent) as it is unable to identify all the 3D zones that require point densification, and lacking an appropriate mechanism to handle the ill-conditioned points with negative impacts (occlusion due to false high opacity). To address these limitations, we propose a Localized Point Management (LPM) strategy, capable of identifying those error-contributing zones in the highest demand for both point addition and geometry calibration. Zone identification is achieved by leveraging the underlying multiview geometry constraints, with the guidance of image rendering errors. We apply point densification in the identified zone, whilst resetting the opacity of those points residing in front of these regions so that a new opportunity is created to correct ill-conditioned points. Serving as a versatile plugin, LPM can be seamlessly integrated into existing 3D Gaussian Splatting models. Experimental evaluation across both static 3D and dynamic 4D scenes validate the efficacy of our LPM strategy in boosting a variety of existing 3DGS models both quantitatively and qualitatively. Notably, LPM improves both vanilla 3DGS and SpaceTimeGS to achieve state-of-the-art rendering quality while retaining real-time speeds, outperforming on challenging datasets such as Tanks & Temples and the Neural 3D Video Dataset.", + "arxiv_url": "http://arxiv.org/abs/2406.04251v2", + "pdf_url": "http://arxiv.org/pdf/2406.04251v2", + "published_date": "2024-06-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Superpoint Gaussian Splatting for Real-Time High-Fidelity Dynamic Scene Reconstruction", + "authors": [ + "Diwen Wan", + "Ruijie Lu", + "Gang Zeng" + ], + "abstract": "Rendering novel view images in dynamic scenes is a crucial yet challenging task. Current methods mainly utilize NeRF-based methods to represent the static scene and an additional time-variant MLP to model scene deformations, resulting in relatively low rendering quality as well as slow inference speed. To tackle these challenges, we propose a novel framework named Superpoint Gaussian Splatting (SP-GS). Specifically, our framework first employs explicit 3D Gaussians to reconstruct the scene and then clusters Gaussians with similar properties (e.g., rotation, translation, and location) into superpoints. Empowered by these superpoints, our method manages to extend 3D Gaussian splatting to dynamic scenes with only a slight increase in computational expense. Apart from achieving state-of-the-art visual quality and real-time rendering under high resolutions, the superpoint representation provides a stronger manipulation capability. Extensive experiments demonstrate the practicality and effectiveness of our approach on both synthetic and real-world datasets. Please see our project page at https://dnvtmf.github.io/SP_GS.github.io.", + "arxiv_url": "http://arxiv.org/abs/2406.03697v1", + "pdf_url": "http://arxiv.org/pdf/2406.03697v1", + "published_date": "2024-06-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Primitives for Deformable Image Registration", + "authors": [ + "Jihe Li", + "Xiang Liu", + "Fabian Zhang", + "Xia Li", + "Xixin Cao", + "Ye Zhang", + "Joachim Buhmann" + ], + "abstract": "Deformable Image Registration (DIR) is essential for aligning medical images that exhibit anatomical variations, facilitating applications such as disease tracking and radiotherapy planning. While classical iterative methods and deep learning approaches have achieved success in DIR, they are often hindered by computational inefficiency or poor generalization. In this paper, we introduce GaussianDIR, a novel, case-specific optimization DIR method inspired by 3D Gaussian splatting. In general, GaussianDIR represents image deformations using a sparse set of mobile and flexible Gaussian primitives, each defined by a center position, covariance, and local rigid transformation. This compact and explicit representation reduces noise and computational overhead while improving interpretability. Furthermore, the movement of individual voxel is derived via blending the local rigid transformation of the neighboring Gaussian primitives. By this, GaussianDIR captures both global smoothness and local rigidity as well as reduces the computational burden. To address varying levels of deformation complexity, GaussianDIR also integrates an adaptive density control mechanism that dynamically adjusts the density of Gaussian primitives. Additionally, we employ multi-scale Gaussian primitives to capture both coarse and fine deformations, reducing optimization to local minima. Experimental results on brain MRI, lung CT, and cardiac MRI datasets demonstrate that GaussianDIR outperforms existing DIR methods in both accuracy and efficiency, highlighting its potential for clinical applications. Finally, as a training-free approach, it challenges the stereotype that iterative methods are inherently slow and transcend the limitations of poor generalization.", + "arxiv_url": "http://arxiv.org/abs/2406.03394v2", + "pdf_url": "http://arxiv.org/pdf/2406.03394v2", + "published_date": "2024-06-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dynamic 3D Gaussian Fields for Urban Areas", + "authors": [ + "Tobias Fischer", + "Jonas Kulhanek", + "Samuel Rota Bulò", + "Lorenzo Porzi", + "Marc Pollefeys", + "Peter Kontschieder" + ], + "abstract": "We present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas. Existing works are not well suited for applications like mixed-reality or closed-loop simulation due to their limited visual quality and non-interactive rendering speeds. Recently, rasterization-based approaches have achieved high-quality NVS at impressive speeds. However, these methods are limited to small-scale, homogeneous data, i.e. they cannot handle severe appearance and geometry variations due to weather, season, and lighting and do not scale to larger, dynamic areas with thousands of images. We propose 4DGF, a neural scene representation that scales to large-scale dynamic urban areas, handles heterogeneous input data, and substantially improves rendering speeds. We use 3D Gaussians as an efficient geometry scaffold while relying on neural fields as a compact and flexible appearance model. We integrate scene dynamics via a scene graph at global scale while modeling articulated motions on a local level via deformations. This decomposed approach enables flexible scene composition suitable for real-world applications. In experiments, we surpass the state-of-the-art by over 3 dB in PSNR and more than 200 times in rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2406.03175v2", + "pdf_url": "http://arxiv.org/pdf/2406.03175v2", + "published_date": "2024-06-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion", + "authors": [ + "Tianyi Xiong", + "Jiayi Wu", + "Botao He", + "Cornelia Fermuller", + "Yiannis Aloimonos", + "Heng Huang", + "Christopher A. Metzler" + ], + "abstract": "By combining differentiable rendering with explicit point-based scene representations, 3D Gaussian Splatting (3DGS) has demonstrated breakthrough 3D reconstruction capabilities. However, to date 3DGS has had limited impact on robotics, where high-speed egomotion is pervasive: Egomotion introduces motion blur and leads to artifacts in existing frame-based 3DGS reconstruction methods. To address this challenge, we introduce Event3DGS, an {\\em event-based} 3DGS framework. By exploiting the exceptional temporal resolution of event cameras, Event3GDS can reconstruct high-fidelity 3D structure and appearance under high-speed egomotion. Extensive experiments on multiple synthetic and real-world datasets demonstrate the superiority of Event3DGS compared with existing event-based dense 3D scene reconstruction frameworks; Event3DGS substantially improves reconstruction quality (+3dB) while reducing computational costs by 95\\%. Our framework also allows one to incorporate a few motion-blurred frame-based measurements into the reconstruction process to further improve appearance fidelity without loss of structural accuracy.", + "arxiv_url": "http://arxiv.org/abs/2406.02972v4", + "pdf_url": "http://arxiv.org/pdf/2406.02972v4", + "published_date": "2024-06-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats", + "authors": [ + "Sangeek Hyun", + "Jae-Pil Heo" + ], + "abstract": "Most advances in 3D Generative Adversarial Networks (3D GANs) largely depend on ray casting-based volume rendering, which incurs demanding rendering costs. One promising alternative is rasterization-based 3D Gaussian Splatting (3D-GS), providing a much faster rendering speed and explicit 3D representation. In this paper, we exploit Gaussian as a 3D representation for 3D GANs by leveraging its efficient and explicit characteristics. However, in an adversarial framework, we observe that a na\\\"ive generator architecture suffers from training instability and lacks the capability to adjust the scale of Gaussians. This leads to model divergence and visual artifacts due to the absence of proper guidance for initialized positions of Gaussians and densification to manage their scales adaptively. To address these issues, we introduce a generator architecture with a hierarchical multi-scale Gaussian representation that effectively regularizes the position and scale of generated Gaussians. Specifically, we design a hierarchy of Gaussians where finer-level Gaussians are parameterized by their coarser-level counterparts; the position of finer-level Gaussians would be located near their coarser-level counterparts, and the scale would monotonically decrease as the level becomes finer, modeling both coarse and fine details of the 3D scene. Experimental results demonstrate that ours achieves a significantly faster rendering speed (x100) compared to state-of-the-art 3D consistent GANs with comparable 3D generation capability. Project page: https://hse1032.github.io/gsgan.", + "arxiv_url": "http://arxiv.org/abs/2406.02968v2", + "pdf_url": "http://arxiv.org/pdf/2406.02968v2", + "published_date": "2024-06-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D-HGS: 3D Half-Gaussian Splatting", + "authors": [ + "Haolin Li", + "Jinyang Liu", + "Mario Sznaier", + "Octavia Camps" + ], + "abstract": "Photo-realistic 3D Reconstruction is a fundamental problem in 3D computer vision. This domain has seen considerable advancements owing to the advent of recent neural rendering techniques. These techniques predominantly aim to focus on learning volumetric representations of 3D scenes and refining these representations via loss functions derived from rendering. Among these, 3D Gaussian Splatting (3D-GS) has emerged as a significant method, surpassing Neural Radiance Fields (NeRFs). 3D-GS uses parameterized 3D Gaussians for modeling both spatial locations and color information, combined with a tile-based fast rendering technique. Despite its superior rendering performance and speed, the use of 3D Gaussian kernels has inherent limitations in accurately representing discontinuous functions, notably at edges and corners for shape discontinuities, and across varying textures for color discontinuities. To address this problem, we propose to employ 3D Half-Gaussian (3D-HGS) kernels, which can be used as a plug-and-play kernel. Our experiments demonstrate their capability to improve the performance of current 3D-GS related methods and achieve state-of-the-art rendering performance on various datasets without compromising rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2406.02720v2", + "pdf_url": "http://arxiv.org/pdf/2406.02720v2", + "published_date": "2024-06-04", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting", + "authors": [ + "Inkyu Shin", + "Qihang Yu", + "Xiaohui Shen", + "In So Kweon", + "Kuk-Jin Yoon", + "Liang-Chieh Chen" + ], + "abstract": "Recent advancements in zero-shot video diffusion models have shown promise for text-driven video editing, but challenges remain in achieving high temporal consistency. To address this, we introduce Video-3DGS, a 3D Gaussian Splatting (3DGS)-based video refiner designed to enhance temporal consistency in zero-shot video editors. Our approach utilizes a two-stage 3D Gaussian optimizing process tailored for editing dynamic monocular videos. In the first stage, Video-3DGS employs an improved version of COLMAP, referred to as MC-COLMAP, which processes original videos using a Masked and Clipped approach. For each video clip, MC-COLMAP generates the point clouds for dynamic foreground objects and complex backgrounds. These point clouds are utilized to initialize two sets of 3D Gaussians (Frg-3DGS and Bkg-3DGS) aiming to represent foreground and background views. Both foreground and background views are then merged with a 2D learnable parameter map to reconstruct full views. In the second stage, we leverage the reconstruction ability developed in the first stage to impose the temporal constraints on the video diffusion model. To demonstrate the efficacy of Video-3DGS on both stages, we conduct extensive experiments across two related tasks: Video Reconstruction and Video Editing. Video-3DGS trained with 3k iterations significantly improves video reconstruction quality (+3 PSNR, +7 PSNR increase) and training efficiency (x1.9, x4.5 times faster) over NeRF-based and 3DGS-based state-of-art methods on DAVIS dataset, respectively. Moreover, it enhances video editing by ensuring temporal consistency across 58 dynamic monocular videos.", + "arxiv_url": "http://arxiv.org/abs/2406.02541v3", + "pdf_url": "http://arxiv.org/pdf/2406.02541v3", + "published_date": "2024-06-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SatSplatYOLO: 3D Gaussian Splatting-based Virtual Object Detection Ensembles for Satellite Feature Recognition", + "authors": [ + "Van Minh Nguyen", + "Emma Sandidge", + "Trupti Mahendrakar", + "Ryan T. White" + ], + "abstract": "On-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possibly unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. In this article, we present an approach for mapping geometries and high-confidence detection of components of unknown, non-cooperative satellites on orbit. We implement accelerated 3D Gaussian splatting to learn a 3D representation of the satellite, render virtual views of the target, and ensemble the YOLOv5 object detector over the virtual views, resulting in reliable, accurate, and precise satellite component detections. The full pipeline capable of running on-board and stand to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks.", + "arxiv_url": "http://arxiv.org/abs/2406.02533v1", + "pdf_url": "http://arxiv.org/pdf/2406.02533v1", + "published_date": "2024-06-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering", + "authors": [ + "Zhongpai Gao", + "Benjamin Planche", + "Meng Zheng", + "Xiao Chen", + "Terrence Chen", + "Ziyan Wu" + ], + "abstract": "Digitally reconstructed radiographs (DRRs) are simulated 2D X-ray images generated from 3D CT volumes, widely used in preoperative settings but limited in intraoperative applications due to computational bottlenecks, especially for accurate but heavy physics-based Monte Carlo methods. While analytical DRR renderers offer greater efficiency, they overlook anisotropic X-ray image formation phenomena, such as Compton scattering. We present a novel approach that marries realistic physics-inspired X-ray simulation with efficient, differentiable DRR generation using 3D Gaussian splatting (3DGS). Our direction-disentangled 3DGS (DDGS) method separates the radiosity contribution into isotropic and direction-dependent components, approximating complex anisotropic interactions without intricate runtime simulations. Additionally, we adapt the 3DGS initialization to account for tomography data properties, enhancing accuracy and efficiency. Our method outperforms state-of-the-art techniques in image accuracy. Furthermore, our DDGS shows promise for intraoperative applications and inverse problems such as pose registration, delivering superior registration accuracy and runtime performance compared to analytical DRR methods.", + "arxiv_url": "http://arxiv.org/abs/2406.02518v1", + "pdf_url": "http://arxiv.org/pdf/2406.02518v1", + "published_date": "2024-06-04", + "categories": [ + "cs.CV", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections", + "authors": [ + "Yuze Wang", + "Junyi Wang", + "Yue Qi" + ], + "abstract": "Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. Recently, 3D Gaussian Splatting (3DGS) has shown promise for photorealistic and real-time NVS of static scenes. Building on 3DGS, we propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our key innovation is a residual-based spherical harmonic coefficients transfer module that adapts 3DGS to varying lighting conditions and photometric post-processing. This lightweight module can be pre-computed and ensures efficient gradient propagation from rendered images to 3D Gaussian attributes. Additionally, we observe that the appearance encoder and the transient mask predictor, the two most critical parts of NVS from unconstrained photo collections, can be mutually beneficial. We introduce a plug-and-play lightweight spatial attention module to simultaneously predict transient occluders and latent appearance representation for each image. After training and preprocessing, our method aligns with the standard 3DGS format and rendering pipeline, facilitating seamlessly integration into various 3DGS applications. Extensive experiments on diverse datasets show our approach outperforms existing approaches on the rendering quality of novel view and appearance synthesis with high converge and rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2406.02407v1", + "pdf_url": "http://arxiv.org/pdf/2406.02407v1", + "published_date": "2024-06-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Query-based Semantic Gaussian Field for Scene Representation in Reinforcement Learning", + "authors": [ + "Jiaxu Wang", + "Ziyi Zhang", + "Qiang Zhang", + "Jia Li", + "Jingkai Sun", + "Mingyuan Sun", + "Junhao He", + "Renjing Xu" + ], + "abstract": "Latent scene representation plays a significant role in training reinforcement learning (RL) agents. To obtain good latent vectors describing the scenes, recent works incorporate the 3D-aware latent-conditioned NeRF pipeline into scene representation learning. However, these NeRF-related methods struggle to perceive 3D structural information due to the inefficient dense sampling in volumetric rendering. Moreover, they lack fine-grained semantic information included in their scene representation vectors because they evenly consider free and occupied spaces. Both of them can destroy the performance of downstream RL tasks. To address the above challenges, we propose a novel framework that adopts the efficient 3D Gaussian Splatting (3DGS) to learn 3D scene representation for the first time. In brief, we present the Query-based Generalizable 3DGS to bridge the 3DGS technique and scene representations with more geometrical awareness than those in NeRFs. Moreover, we present the Hierarchical Semantics Encoding to ground the fine-grained semantic features to 3D Gaussians and further distilled to the scene representation vectors. We conduct extensive experiments on two RL platforms including Maniskill2 and Robomimic across 10 different tasks. The results show that our method outperforms the other 5 baselines by a large margin. We achieve the best success rates on 8 tasks and the second-best on the other two tasks.", + "arxiv_url": "http://arxiv.org/abs/2406.02370v4", + "pdf_url": "http://arxiv.org/pdf/2406.02370v4", + "published_date": "2024-06-04", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding", + "authors": [ + "Yanmin Wu", + "Jiarui Meng", + "Haijie Li", + "Chenming Wu", + "Yahao Shi", + "Xinhua Cheng", + "Chen Zhao", + "Haocheng Feng", + "Errui Ding", + "Jingdong Wang", + "Jian Zhang" + ], + "abstract": "This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the fine level. Finally, we introduce an instance-level 3D-2D feature association method that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments, including open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies, demonstrate the effectiveness of our proposed method. Project page: https://3d-aigc.github.io/OpenGaussian", + "arxiv_url": "http://arxiv.org/abs/2406.02058v1", + "pdf_url": "http://arxiv.org/pdf/2406.02058v1", + "published_date": "2024-06-04", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FastLGS: Speeding up Language Embedded Gaussians with Feature Grid Mapping", + "authors": [ + "Yuzhou Ji", + "He Zhu", + "Junshu Tang", + "Wuyi Liu", + "Zhizhong Zhang", + "Yuan Xie", + "Xin Tan" + ], + "abstract": "The semantically interactive radiance field has always been an appealing task for its potential to facilitate user-friendly and automated real-world 3D scene understanding applications. However, it is a challenging task to achieve high quality, efficiency and zero-shot ability at the same time with semantics in radiance fields. In this work, we present FastLGS, an approach that supports real-time open-vocabulary query within 3D Gaussian Splatting (3DGS) under high resolution. We propose the semantic feature grid to save multi-view CLIP features which are extracted based on Segment Anything Model (SAM) masks, and map the grids to low dimensional features for semantic field training through 3DGS. Once trained, we can restore pixel-aligned CLIP embeddings through feature grids from rendered features for open-vocabulary queries. Comparisons with other state-of-the-art methods prove that FastLGS can achieve the first place performance concerning both speed and accuracy, where FastLGS is 98x faster than LERF and 4x faster than LangSplat. Meanwhile, experiments show that FastLGS is adaptive and compatible with many downstream tasks, such as 3D segmentation and 3D object inpainting, which can be easily applied to other 3D manipulation systems.", + "arxiv_url": "http://arxiv.org/abs/2406.01916v3", + "pdf_url": "http://arxiv.org/pdf/2406.01916v3", + "published_date": "2024-06-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting", + "authors": [ + "Shaojie Ma", + "Yawei Luo", + "Wei Yang", + "Yi Yang" + ], + "abstract": "3D reconstruction and simulation, although interrelated, have distinct objectives: reconstruction requires a flexible 3D representation that can adapt to diverse scenes, while simulation needs a structured representation to model motion principles effectively. This paper introduces the Mesh-adsorbed Gaussian Splatting (MaGS) method to address this challenge. MaGS constrains 3D Gaussians to roam near the mesh, creating a mutually adsorbed mesh-Gaussian 3D representation. Such representation harnesses both the rendering flexibility of 3D Gaussians and the structured property of meshes. To achieve this, we introduce RMD-Net, a network that learns motion priors from video data to refine mesh deformations, alongside RGD-Net, which models the relative displacement between the mesh and Gaussians to enhance rendering fidelity under mesh constraints. To generalize to novel, user-defined deformations beyond input video without reliance on temporal data, we propose MPE-Net, which leverages inherent mesh information to bootstrap RMD-Net and RGD-Net. Due to the universality of meshes, MaGS is compatible with various deformation priors such as ARAP, SMPL, and soft physics simulation. Extensive experiments on the D-NeRF, DG-Mesh, and PeopleSnapshot datasets demonstrate that MaGS achieves state-of-the-art performance in both reconstruction and simulation.", + "arxiv_url": "http://arxiv.org/abs/2406.01593v2", + "pdf_url": "http://arxiv.org/pdf/2406.01593v2", + "published_date": "2024-06-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Tetrahedron Splatting for 3D Generation", + "authors": [ + "Chun Gu", + "Zeyu Yang", + "Zijie Pan", + "Xiatian Zhu", + "Li Zhang" + ], + "abstract": "3D representation is essential to the significant advance of 3D generation with 2D diffusion priors. As a flexible representation, NeRF has been first adopted for 3D representation. With density-based volumetric rendering, it however suffers both intensive computational overhead and inaccurate mesh extraction. Using a signed distance field and Marching Tetrahedra, DMTet allows for precise mesh extraction and real-time rendering but is limited in handling large topological changes in meshes, leading to optimization challenges. Alternatively, 3D Gaussian Splatting (3DGS) is favored in both training and rendering efficiency while falling short in mesh extraction. In this work, we introduce a novel 3D representation, Tetrahedron Splatting (TeT-Splatting), that supports easy convergence during optimization, precise mesh extraction, and real-time rendering simultaneously. This is achieved by integrating surface-based volumetric rendering within a structured tetrahedral grid while preserving the desired ability of precise mesh extraction, and a tile-based differentiable tetrahedron rasterizer. Furthermore, we incorporate eikonal and normal consistency regularization terms for the signed distance field to improve generation quality and stability. Critically, our representation can be trained without mesh extraction, making the optimization process easier to converge. Our TeT-Splatting can be readily integrated in existing 3D generation pipelines, along with polygonal mesh for texture optimization. Extensive experiments show that our TeT-Splatting strikes a superior tradeoff among convergence speed, render efficiency, and mesh quality as compared to previous alternatives under varying 3D generation settings.", + "arxiv_url": "http://arxiv.org/abs/2406.01579v2", + "pdf_url": "http://arxiv.org/pdf/2406.01579v2", + "published_date": "2024-06-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamPhysics: Learning Physical Properties of Dynamic 3D Gaussians with Video Diffusion Priors", + "authors": [ + "Tianyu Huang", + "Haoze Zhang", + "Yihan Zeng", + "Zhilu Zhang", + "Hui Li", + "Wangmeng Zuo", + "Rynson W. H. Lau" + ], + "abstract": "Dynamic 3D interaction has been attracting a lot of attention recently. However, creating such 4D content remains challenging. One solution is to animate 3D scenes with physics-based simulation, which requires manually assigning precise physical properties to the object or the simulated results would become unnatural. Another solution is to learn the deformation of 3D objects with the distillation of video generative models, which, however, tends to produce 3D videos with small and discontinuous motions due to the inappropriate extraction and application of physical prior. In this work, combining the strengths and complementing shortcomings of the above two solutions, we propose to learn the physical properties of a material field with video diffusion priors, and then utilize a physics-based Material-Point-Method (MPM) simulator to generate 4D content with realistic motions. In particular, we propose motion distillation sampling to emphasize video motion information during distillation. Moreover, to facilitate the optimization, we further propose a KAN-based material field with frame boosting. Experimental results demonstrate that our method enjoys more realistic motion than state-of-the-arts. Codes are released at: https://github.com/tyhuang0428/DreamPhysics.", + "arxiv_url": "http://arxiv.org/abs/2406.01476v2", + "pdf_url": "http://arxiv.org/pdf/2406.01476v2", + "published_date": "2024-06-03", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/tyhuang0428/DreamPhysics", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RaDe-GS: Rasterizing Depth in Gaussian Splatting", + "authors": [ + "Baowen Zhang", + "Chuan Fang", + "Rakesh Shrestha", + "Yixun Liang", + "Xiaoxiao Long", + "Ping Tan" + ], + "abstract": "Gaussian Splatting (GS) has proven to be highly effective in novel view synthesis, achieving high-quality and real-time rendering. However, its potential for reconstructing detailed 3D shapes has not been fully explored. Existing methods often suffer from limited shape accuracy due to the discrete and unstructured nature of Gaussian splats, which complicates the shape extraction. While recent techniques like 2D GS have attempted to improve shape reconstruction, they often reformulate the Gaussian primitives in ways that reduce both rendering quality and computational efficiency. To address these problems, our work introduces a rasterized approach to render the depth maps and surface normal maps of general 3D Gaussian splats. Our method not only significantly enhances shape reconstruction accuracy but also maintains the computational efficiency intrinsic to Gaussian Splatting. It achieves a Chamfer distance error comparable to NeuraLangelo on the DTU dataset and maintains similar computational efficiency as the original 3D GS methods. Our method is a significant advancement in Gaussian Splatting and can be directly integrated into existing Gaussian Splatting-based methods.", + "arxiv_url": "http://arxiv.org/abs/2406.01467v2", + "pdf_url": "http://arxiv.org/pdf/2406.01467v2", + "published_date": "2024-06-03", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Self-Calibrating 4D Novel View Synthesis from Monocular Videos Using Gaussian Splatting", + "authors": [ + "Fang Li", + "Hao Zhang", + "Narendra Ahuja" + ], + "abstract": "Gaussian Splatting (GS) has significantly elevated scene reconstruction efficiency and novel view synthesis (NVS) accuracy compared to Neural Radiance Fields (NeRF), particularly for dynamic scenes. However, current 4D NVS methods, whether based on GS or NeRF, primarily rely on camera parameters provided by COLMAP and even utilize sparse point clouds generated by COLMAP for initialization, which lack accuracy as well are time-consuming. This sometimes results in poor dynamic scene representation, especially in scenes with large object movements, or extreme camera conditions e.g. small translations combined with large rotations. Some studies simultaneously optimize the estimation of camera parameters and scenes, supervised by additional information like depth, optical flow, etc. obtained from off-the-shelf models. Using this unverified information as ground truth can reduce robustness and accuracy, which does frequently occur for long monocular videos (with e.g. > hundreds of frames). We propose a novel approach that learns a high-fidelity 4D GS scene representation with self-calibration of camera parameters. It includes the extraction of 2D point features that robustly represent 3D structure, and their use for subsequent joint optimization of camera parameters and 3D structure towards overall 4D scene optimization. We demonstrate the accuracy and time efficiency of our method through extensive quantitative and qualitative experimental results on several standard benchmarks. The results show significant improvements over state-of-the-art methods for 4D novel view synthesis. The source code will be released soon at https://github.com/fangli333/SC-4DGS.", + "arxiv_url": "http://arxiv.org/abs/2406.01042v2", + "pdf_url": "http://arxiv.org/pdf/2406.01042v2", + "published_date": "2024-06-03", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/fangli333/SC-4DGS", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SuperGaussian: Repurposing Video Models for 3D Super Resolution", + "authors": [ + "Yuan Shen", + "Duygu Ceylan", + "Paul Guerrero", + "Zexiang Xu", + "Niloy J. Mitra", + "Shenlong Wang", + "Anna Frühstück" + ], + "abstract": "We present a simple, modular, and generic method that upsamples coarse 3D models by adding geometric and appearance details. While generative 3D models now exist, they do not yet match the quality of their counterparts in image and video domains. We demonstrate that it is possible to directly repurpose existing (pretrained) video models for 3D super-resolution and thus sidestep the problem of the shortage of large repositories of high-quality 3D training models. We describe how to repurpose video upsampling models, which are not 3D consistent, and combine them with 3D consolidation to produce 3D-consistent results. As output, we produce high quality Gaussian Splat models, which are object centric and effective. Our method is category agnostic and can be easily incorporated into existing 3D workflows. We evaluate our proposed SuperGaussian on a variety of 3D inputs, which are diverse both in terms of complexity and representation (e.g., Gaussian Splats or NeRFs), and demonstrate that our simple method significantly improves the fidelity of the final 3D models. Check our project website for details: supergaussian.github.io", + "arxiv_url": "http://arxiv.org/abs/2406.00609v4", + "pdf_url": "http://arxiv.org/pdf/2406.00609v4", + "published_date": "2024-06-02", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Topo4D: Topology-Preserving Gaussian Splatting for High-Fidelity 4D Head Capture", + "authors": [ + "Xuanchen Li", + "Yuhao Cheng", + "Xingyu Ren", + "Haozhe Jia", + "Di Xu", + "Wenhan Zhu", + "Yichao Yan" + ], + "abstract": "4D head capture aims to generate dynamic topological meshes and corresponding texture maps from videos, which is widely utilized in movies and games for its ability to simulate facial muscle movements and recover dynamic textures in pore-squeezing. The industry often adopts the method involving multi-view stereo and non-rigid alignment. However, this approach is prone to errors and heavily reliant on time-consuming manual processing by artists. To simplify this process, we propose Topo4D, a novel framework for automatic geometry and texture generation, which optimizes densely aligned 4D heads and 8K texture maps directly from calibrated multi-view time-series images. Specifically, we first represent the time-series faces as a set of dynamic 3D Gaussians with fixed topology in which the Gaussian centers are bound to the mesh vertices. Afterward, we perform alternative geometry and texture optimization frame-by-frame for high-quality geometry and texture learning while maintaining temporal topology stability. Finally, we can extract dynamic facial meshes in regular wiring arrangement and high-fidelity textures with pore-level details from the learned Gaussians. Extensive experiments show that our method achieves superior results than the current SOTA face reconstruction methods both in the quality of meshes and textures. Project page: https://xuanchenli.github.io/Topo4D/.", + "arxiv_url": "http://arxiv.org/abs/2406.00440v3", + "pdf_url": "http://arxiv.org/pdf/2406.00440v3", + "published_date": "2024-06-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MoDGS: Dynamic Gaussian Splatting from Casually-captured Monocular Videos", + "authors": [ + "Qingming Liu", + "Yuan Liu", + "Jiepeng Wang", + "Xianqiang Lyv", + "Peng Wang", + "Wenping Wang", + "Junhui Hou" + ], + "abstract": "In this paper, we propose MoDGS, a new pipeline to render novel views of dy namic scenes from a casually captured monocular video. Previous monocular dynamic NeRF or Gaussian Splatting methods strongly rely on the rapid move ment of input cameras to construct multiview consistency but struggle to recon struct dynamic scenes on casually captured input videos whose cameras are either static or move slowly. To address this challenging task, MoDGS adopts recent single-view depth estimation methods to guide the learning of the dynamic scene. Then, a novel 3D-aware initialization method is proposed to learn a reasonable deformation field and a new robust depth loss is proposed to guide the learning of dynamic scene geometry. Comprehensive experiments demonstrate that MoDGS is able to render high-quality novel view images of dynamic scenes from just a casually captured monocular video, which outperforms state-of-the-art meth ods by a significant margin. The code will be publicly available.", + "arxiv_url": "http://arxiv.org/abs/2406.00434v2", + "pdf_url": "http://arxiv.org/pdf/2406.00434v2", + "published_date": "2024-06-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis", + "authors": [ + "Yumeng He", + "Yunbo Wang", + "Xiaokang Yang" + ], + "abstract": "Decoupling the illumination in 3D scenes is crucial for novel view synthesis and relighting. In this paper, we propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components, enabling the synthesis of realistic lighting effects. To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework. The fundamental idea is to view the rendering tasks under various lighting positions as a multi-task learning problem, which our meta-learning approach effectively addresses by generalizing the learned Gaussian geometries not only across different viewpoints but also across diverse light positions. Experimental results demonstrate the effectiveness of our approach in terms of training efficiency and rendering quality compared to existing methods for free-viewpoint relighting.", + "arxiv_url": "http://arxiv.org/abs/2405.20791v1", + "pdf_url": "http://arxiv.org/pdf/2405.20791v1", + "published_date": "2024-05-31", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model", + "authors": [ + "Yufei Wang", + "Zhihao Li", + "Lanqing Guo", + "Wenhan Yang", + "Alex C. Kot", + "Bihan Wen" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has become a promising framework for novel view synthesis, offering fast rendering speeds and high fidelity. However, the large number of Gaussians and their associated attributes require effective compression techniques. Existing methods primarily compress neural Gaussians individually and independently, i.e., coding all the neural Gaussians at the same time, with little design for their interactions and spatial dependence. Inspired by the effectiveness of the context model in image compression, we propose the first autoregressive model at the anchor level for 3DGS compression in this work. We divide anchors into different levels and the anchors that are not coded yet can be predicted based on the already coded ones in all the coarser levels, leading to more accurate modeling and higher coding efficiency. To further improve the efficiency of entropy coding, e.g., to code the coarsest level with no already coded anchors, we propose to introduce a low-dimensional quantized feature as the hyperprior for each anchor, which can be effectively compressed. Our work pioneers the context model in the anchor level for 3DGS representation, yielding an impressive size reduction of over 100 times compared to vanilla 3DGS and 15 times compared to the most recent state-of-the-art work Scaffold-GS, while achieving comparable or even higher rendering quality.", + "arxiv_url": "http://arxiv.org/abs/2405.20721v1", + "pdf_url": "http://arxiv.org/pdf/2405.20721v1", + "published_date": "2024-05-31", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "R$^2$-Gaussian: Rectifying Radiative Gaussian Splatting for Tomographic Reconstruction", + "authors": [ + "Ruyi Zha", + "Tao Jun Lin", + "Yuanhao Cai", + "Jiwen Cao", + "Yanhao Zhang", + "Hongdong Li" + ], + "abstract": "3D Gaussian splatting (3DGS) has shown promising results in image rendering and surface reconstruction. However, its potential in volumetric reconstruction tasks, such as X-ray computed tomography, remains under-explored. This paper introduces R$^2$-Gaussian, the first 3DGS-based framework for sparse-view tomographic reconstruction. By carefully deriving X-ray rasterization functions, we discover a previously unknown integration bias in the standard 3DGS formulation, which hampers accurate volume retrieval. To address this issue, we propose a novel rectification technique via refactoring the projection from 3D to 2D Gaussians. Our new method presents three key innovations: (1) introducing tailored Gaussian kernels, (2) extending rasterization to X-ray imaging, and (3) developing a CUDA-based differentiable voxelizer. Experiments on synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art approaches in accuracy and efficiency. Crucially, it delivers high-quality results in 4 minutes, which is 12$\\times$ faster than NeRF-based methods and on par with traditional algorithms. Code and models are available on the project page https://github.com/Ruyi-Zha/r2_gaussian.", + "arxiv_url": "http://arxiv.org/abs/2405.20693v2", + "pdf_url": "http://arxiv.org/pdf/2405.20693v2", + "published_date": "2024-05-31", + "categories": [ + "eess.IV", + "cs.CV" + ], + "github_url": "https://github.com/Ruyi-Zha/r2_gaussian", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Hybrid Fourier Score Distillation for Efficient One Image to 3D Object Generation", + "authors": [ + "Shuzhou Yang", + "Yu Wang", + "Haijie Li", + "Jiarui Meng", + "Yanmin Wu", + "Xiandong Meng", + "Jian Zhang" + ], + "abstract": "Single image-to-3D generation is pivotal for crafting controllable 3D assets. Given its under-constrained nature, we attempt to leverage 3D geometric priors from a novel view diffusion model and 2D appearance priors from an image generation model to guide the optimization process. We note that there is a disparity between the generation priors of these two diffusion models, leading to their different appearance outputs. Specifically, image generation models tend to deliver more detailed visuals, whereas novel view models produce consistent yet over-smooth results across different views. Directly combining them leads to suboptimal effects due to their appearance conflicts. Hence, we propose a 2D-3D hybrid Fourier Score Distillation objective function, hy-FSD. It optimizes 3D Gaussians using 3D priors in spatial domain to ensure geometric consistency, while exploiting 2D priors in the frequency domain through Fourier transform for better visual quality. hy-FSD can be integrated into existing 3D generation methods and produce significant performance gains. With this technique, we further develop an image-to-3D generation pipeline to create high-quality 3D objects within one minute, named Fourier123. Extensive experiments demonstrate that Fourier123 excels in efficient generation with rapid convergence speed and visually-friendly generation results.", + "arxiv_url": "http://arxiv.org/abs/2405.20669v2", + "pdf_url": "http://arxiv.org/pdf/2405.20669v2", + "published_date": "2024-05-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "$\\textit{S}^3$Gaussian: Self-Supervised Street Gaussians for Autonomous Driving", + "authors": [ + "Nan Huang", + "Xiaobao Wei", + "Wenzhao Zheng", + "Pengju An", + "Ming Lu", + "Wei Zhan", + "Masayoshi Tomizuka", + "Kurt Keutzer", + "Shanghang Zhang" + ], + "abstract": "Photorealistic 3D reconstruction of street scenes is a critical technique for developing real-world simulators for autonomous driving. Despite the efficacy of Neural Radiance Fields (NeRF) for driving scenes, 3D Gaussian Splatting (3DGS) emerges as a promising direction due to its faster speed and more explicit representation. However, most existing street 3DGS methods require tracked 3D vehicle bounding boxes to decompose the static and dynamic elements for effective reconstruction, limiting their applications for in-the-wild scenarios. To facilitate efficient 3D scene reconstruction without costly annotations, we propose a self-supervised street Gaussian ($\\textit{S}^3$Gaussian) method to decompose dynamic and static elements from 4D consistency. We represent each scene with 3D Gaussians to preserve the explicitness and further accompany them with a spatial-temporal field network to compactly model the 4D dynamics. We conduct extensive experiments on the challenging Waymo-Open dataset to evaluate the effectiveness of our method. Our $\\textit{S}^3$Gaussian demonstrates the ability to decompose static and dynamic scenes and achieves the best performance without using 3D annotations. Code is available at: https://github.com/nnanhuang/S3Gaussian/.", + "arxiv_url": "http://arxiv.org/abs/2405.20323v1", + "pdf_url": "http://arxiv.org/pdf/2405.20323v1", + "published_date": "2024-05-30", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "https://github.com/nnanhuang/S3Gaussian/", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction", + "authors": [ + "Jianghao Shen", + "Nan Xue", + "Tianfu Wu" + ], + "abstract": "Learning 3D scene representation from a single-view image is a long-standing fundamental problem in computer vision, with the inherent ambiguity in predicting contents unseen from the input view. Built on the recently proposed 3D Gaussian Splatting (3DGS), the Splatter Image method has made promising progress on fast single-image novel view synthesis via learning a single 3D Gaussian for each pixel based on the U-Net feature map of an input image. However, it has limited expressive power to represent occluded components that are not observable in the input view. To address this problem, this paper presents a Hierarchical Splatter Image method in which a pixel is worth more than one 3D Gaussians. Specifically, each pixel is represented by a parent 3D Gaussian and a small number of child 3D Gaussians. Parent 3D Gaussians are learned as done in the vanilla Splatter Image. Child 3D Gaussians are learned via a lightweight Multi-Layer Perceptron (MLP) which takes as input the projected image features of a parent 3D Gaussian and the embedding of a target camera view. Both parent and child 3D Gaussians are learned end-to-end in a stage-wise way. The joint condition of input image features from eyes of the parent Gaussians and the target camera position facilitates learning to allocate child Gaussians to ``see the unseen'', recovering the occluded details that are often missed by parent Gaussians. In experiments, the proposed method is tested on the ShapeNet-SRN and CO3D datasets with state-of-the-art performance obtained, especially showing promising capabilities of reconstructing occluded contents in the input view.", + "arxiv_url": "http://arxiv.org/abs/2405.20310v3", + "pdf_url": "http://arxiv.org/pdf/2405.20310v3", + "published_date": "2024-05-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Object-centric Reconstruction and Tracking of Dynamic Unknown Objects using 3D Gaussian Splatting", + "authors": [ + "Kuldeep R Barad", + "Antoine Richard", + "Jan Dentler", + "Miguel Olivares-Mendez", + "Carol Martinez" + ], + "abstract": "Generalizable perception is one of the pillars of high-level autonomy in space robotics. Estimating the structure and motion of unknown objects in dynamic environments is fundamental for such autonomous systems. Traditionally, the solutions have relied on prior knowledge of target objects, multiple disparate representations, or low-fidelity outputs unsuitable for robotic operations. This work proposes a novel approach to incrementally reconstruct and track a dynamic unknown object using a unified representation -- a set of 3D Gaussian blobs that describe its geometry and appearance. The differentiable 3D Gaussian Splatting framework is adapted to a dynamic object-centric setting. The input to the pipeline is a sequential set of RGB-D images. 3D reconstruction and 6-DoF pose tracking tasks are tackled using first-order gradient-based optimization. The formulation is simple, requires no pre-training, assumes no prior knowledge of the object or its motion, and is suitable for online applications. The proposed approach is validated on a dataset of 10 unknown spacecraft of diverse geometry and texture under arbitrary relative motion. The experiments demonstrate successful 3D reconstruction and accurate 6-DoF tracking of the target object in proximity operations over a short to medium duration. The causes of tracking drift are discussed and potential solutions are outlined.", + "arxiv_url": "http://arxiv.org/abs/2405.20104v2", + "pdf_url": "http://arxiv.org/pdf/2405.20104v2", + "published_date": "2024-05-30", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting", + "authors": [ + "Qiaowei Miao", + "JinSheng Quan", + "Kehan Li", + "Yawei Luo" + ], + "abstract": "Previous text-to-4D methods have leveraged multiple Score Distillation Sampling (SDS) techniques, combining motion priors from video-based diffusion models (DMs) with geometric priors from multiview DMs to implicitly guide 4D renderings. However, differences in these priors result in conflicting gradient directions during optimization, causing trade-offs between motion fidelity and geometry accuracy, and requiring substantial optimization time to reconcile the models. In this paper, we introduce \\textbf{P}ixel-\\textbf{L}evel \\textbf{A}lignment for text-driven \\textbf{4D} Gaussian splatting (PLA4D) to resolve this motion-geometry conflict. PLA4D provides an anchor reference, i.e., text-generated video, to align the rendering process conditioned by different DMs in pixel space. For static alignment, our approach introduces a focal alignment method and Gaussian-Mesh contrastive learning to iteratively adjust focal lengths and provide explicit geometric priors at each timestep. At the dynamic level, a motion alignment technique and T-MV refinement method are employed to enforce both pose alignment and motion continuity across unknown viewpoints, ensuring intrinsic geometric consistency across views. With such pixel-level multi-DM alignment, our PLA4D framework is able to generate 4D objects with superior geometric, motion, and semantic consistency. Fully implemented with open-source tools, PLA4D offers an efficient and accessible solution for high-quality 4D digital content creation with significantly reduced generation time.", + "arxiv_url": "http://arxiv.org/abs/2405.19957v4", + "pdf_url": "http://arxiv.org/pdf/2405.19957v4", + "published_date": "2024-05-30", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis", + "authors": [ + "Boming Zhao", + "Yuan Li", + "Ziyu Sun", + "Lin Zeng", + "Yujun Shen", + "Rui Ma", + "Yinda Zhang", + "Hujun Bao", + "Zhaopeng Cui" + ], + "abstract": "Forecasting future scenarios in dynamic environments is essential for intelligent decision-making and navigation, a challenge yet to be fully realized in computer vision and robotics. Traditional approaches like video prediction and novel-view synthesis either lack the ability to forecast from arbitrary viewpoints or to predict temporal dynamics. In this paper, we introduce GaussianPrediction, a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis in dynamic environments. GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes. To this end, we first propose a 3D Gaussian canonical space with deformation modeling to capture the appearance and geometry of dynamic scenes, and integrate the lifecycle property into Gaussians for irreversible deformations. To make the prediction feasible and efficient, a concentric motion distillation approach is developed by distilling the scene motion with key points. Finally, a Graph Convolutional Network is employed to predict the motions of key points, enabling the rendering of photorealistic images of future scenarios. Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.", + "arxiv_url": "http://arxiv.org/abs/2405.19745v1", + "pdf_url": "http://arxiv.org/pdf/2405.19745v1", + "published_date": "2024-05-30", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianRoom: Improving 3D Gaussian Splatting with SDF Guidance and Monocular Cues for Indoor Scene Reconstruction", + "authors": [ + "Haodong Xiang", + "Xinghui Li", + "Xiansong Lai", + "Wanting Zhang", + "Zhichao Liao", + "Kai Cheng", + "Xueping Liu" + ], + "abstract": "Recently, 3D Gaussian Splatting(3DGS) has revolutionized neural rendering with its high-quality rendering and real-time speed. However, when it comes to indoor scenes with a significant number of textureless areas, 3DGS yields incomplete and noisy reconstruction results due to the poor initialization of the point cloud and under-constrained optimization. Inspired by the continuity of signed distance field (SDF), which naturally has advantages in modeling surfaces, we present a unified optimizing framework integrating neural SDF with 3DGS. This framework incorporates a learnable neural SDF field to guide the densification and pruning of Gaussians, enabling Gaussians to accurately model scenes even with poor initialized point clouds. At the same time, the geometry represented by Gaussians improves the efficiency of the SDF field by piloting its point sampling. Additionally, we regularize the optimization with normal and edge priors to eliminate geometry ambiguity in textureless areas and improve the details. Extensive experiments in ScanNet and ScanNet++ show that our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.", + "arxiv_url": "http://arxiv.org/abs/2405.19671v1", + "pdf_url": "http://arxiv.org/pdf/2405.19671v1", + "published_date": "2024-05-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian", + "authors": [ + "Wei Sun", + "Qi Zhang", + "Yanzhao Zhou", + "Qixiang Ye", + "Jianbin Jiao", + "Yuan Li" + ], + "abstract": "3D Gaussian splatting has demonstrated impressive performance in real-time novel view synthesis. However, achieving successful reconstruction from RGB images generally requires multiple input views captured under static conditions. To address the challenge of sparse input views, previous approaches have incorporated depth supervision into the training of 3D Gaussians to mitigate overfitting, using dense predictions from pretrained depth networks as pseudo-ground truth. Nevertheless, depth predictions from monocular depth estimation models inherently exhibit significant uncertainty in specific areas. Relying solely on pixel-wise L2 loss may inadvertently incorporate detrimental noise from these uncertain areas. In this work, we introduce a novel method to supervise the depth distribution of 3D Gaussians, utilizing depth priors with integrated uncertainty estimates. To address these localized errors in depth predictions, we integrate a patch-wise optimal transport strategy to complement traditional L2 loss in depth supervision. Extensive experiments conducted on the LLFF, DTU, and Blender datasets demonstrate that our approach, UGOT, achieves superior novel view synthesis and consistently outperforms state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2405.19657v1", + "pdf_url": "http://arxiv.org/pdf/2405.19657v1", + "published_date": "2024-05-30", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM", + "authors": [ + "Peifeng Jiang", + "Hong Liu", + "Xia Li", + "Ti Wang", + "Fabian Zhang", + "Joachim M. Buhmann" + ], + "abstract": "The limited robustness of 3D Gaussian Splatting (3DGS) to motion blur and camera noise, along with its poor real-time performance, restricts its application in robotic SLAM tasks. Upon analysis, the primary causes of these issues are the density of views with motion blur and the cumulative errors in dense pose estimation from calculating losses based on noisy original images and rendering results, which increase the difficulty of 3DGS rendering convergence. Thus, a cutting-edge 3DGS-based SLAM system is introduced, leveraging the efficiency and flexibility of 3DGS to achieve real-time performance while remaining robust against sensor noise, motion blur, and the challenges posed by long-session SLAM. Central to this approach is the Fusion Bridge module, which seamlessly integrates tracking-centered ORB Visual Odometry with mapping-centered online 3DGS. Precise pose initialization is enabled by this module through joint optimization of re-projection and rendering loss, as well as strategic view selection, enhancing rendering convergence in large-scale scenes. Extensive experiments demonstrate state-of-the-art rendering quality and localization accuracy, positioning this system as a promising solution for real-world robotics applications that require stable, near-real-time performance. Our project is available at https://ZeldaFromHeaven.github.io/TAMBRIDGE/", + "arxiv_url": "http://arxiv.org/abs/2405.19614v1", + "pdf_url": "http://arxiv.org/pdf/2405.19614v1", + "published_date": "2024-05-30", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NPGA: Neural Parametric Gaussian Avatars", + "authors": [ + "Simon Giebenhain", + "Tobias Kirschstein", + "Martin Rünz", + "Lourdes Agapito", + "Matthias Nießner" + ], + "abstract": "The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. For increased representational capacity of our avatars, we propose per-Gaussian latent features that condition each primitives dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.", + "arxiv_url": "http://arxiv.org/abs/2405.19331v2", + "pdf_url": "http://arxiv.org/pdf/2405.19331v2", + "published_date": "2024-05-29", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DGD: Dynamic 3D Gaussians Distillation", + "authors": [ + "Isaac Labe", + "Noam Issachar", + "Itai Lang", + "Sagie Benaim" + ], + "abstract": "We tackle the task of learning dynamic 3D semantic radiance fields given a single monocular video as input. Our learned semantic radiance field captures per-point semantics as well as color and geometric properties for a dynamic 3D scene, enabling the generation of novel views and their corresponding semantics. This enables the segmentation and tracking of a diverse set of 3D semantic entities, specified using a simple and intuitive interface that includes a user click or a text prompt. To this end, we present DGD, a unified 3D representation for both the appearance and semantics of a dynamic 3D scene, building upon the recently proposed dynamic 3D Gaussians representation. Our representation is optimized over time with both color and semantic information. Key to our method is the joint optimization of the appearance and semantic attributes, which jointly affect the geometric properties of the scene. We evaluate our approach in its ability to enable dense semantic 3D object tracking and demonstrate high-quality results that are fast to render, for a diverse set of scenes. Our project webpage is available on https://isaaclabe.github.io/DGD-Website/", + "arxiv_url": "http://arxiv.org/abs/2405.19321v1", + "pdf_url": "http://arxiv.org/pdf/2405.19321v1", + "published_date": "2024-05-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "$E^{3}$Gen: Efficient, Expressive and Editable Avatars Generation", + "authors": [ + "Weitian Zhang", + "Yichao Yan", + "Yunhui Liu", + "Xingdong Sheng", + "Xiaokang Yang" + ], + "abstract": "This paper aims to introduce 3D Gaussian for efficient, expressive, and editable digital avatar generation. This task faces two major challenges: (1) The unstructured nature of 3D Gaussian makes it incompatible with current generation pipelines; (2) the expressive animation of 3D Gaussian in a generative setting that involves training with multiple subjects remains unexplored. In this paper, we propose a novel avatar generation method named $E^3$Gen, to effectively address these challenges. First, we propose a novel generative UV features plane representation that encodes unstructured 3D Gaussian onto a structured 2D UV space defined by the SMPL-X parametric model. This novel representation not only preserves the representation ability of the original 3D Gaussian but also introduces a shared structure among subjects to enable generative learning of the diffusion model. To tackle the second challenge, we propose a part-aware deformation module to achieve robust and accurate full-body expressive pose control. Extensive experiments demonstrate that our method achieves superior performance in avatar generation and enables expressive full-body pose control and editing. Our project page is https://olivia23333.github.io/E3Gen.", + "arxiv_url": "http://arxiv.org/abs/2405.19203v2", + "pdf_url": "http://arxiv.org/pdf/2405.19203v2", + "published_date": "2024-05-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LP-3DGS: Learning to Prune 3D Gaussian Splatting", + "authors": [ + "Zhaoliang Zhang", + "Tianchen Song", + "Yongjae Lee", + "Li Yang", + "Cheng Peng", + "Rama Chellappa", + "Deliang Fan" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset pruning ratio or importance score threshold to prune the point cloud. Such hyperparamter requires multiple rounds of training to optimize and achieve the maximum pruning ratio, while maintaining the rendering quality for each scene. In this work, we propose learning-to-prune 3DGS (LP-3DGS), where a trainable binary mask is applied to the importance score that can find optimal pruning ratio automatically. Instead of using the traditional straight-through estimator (STE) method to approximate the binary mask gradient, we redesign the masking function to leverage the Gumbel-Sigmoid method, making it differentiable and compatible with the existing training process of 3DGS. Extensive experiments have shown that LP-3DGS consistently produces a good balance that is both efficient and high quality.", + "arxiv_url": "http://arxiv.org/abs/2405.18784v1", + "pdf_url": "http://arxiv.org/pdf/2405.18784v1", + "published_date": "2024-05-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EvaGaussians: Event Stream Assisted Gaussian Splatting from Blurry Images", + "authors": [ + "Wangbo Yu", + "Chaoran Feng", + "Jiye Tang", + "Jiashu Yang", + "Xu Jia", + "Yuchao Yang", + "Li Yuan", + "Yonghong Tian" + ], + "abstract": "3D Gaussian Splatting (3D-GS) has demonstrated exceptional capabilities in 3D scene reconstruction and novel view synthesis. However, its training heavily depends on high-quality, sharp images and accurate camera poses. Fulfilling these requirements can be challenging in non-ideal real-world scenarios, where motion-blurred images are commonly encountered in high-speed moving cameras or low-light environments that require long exposure times. To address these challenges, we introduce Event Stream Assisted Gaussian Splatting (EvaGaussians), a novel approach that integrates event streams captured by an event camera to assist in reconstructing high-quality 3D-GS from blurry images. Capitalizing on the high temporal resolution and dynamic range offered by the event camera, we leverage the event streams to explicitly model the formation process of motion-blurred images and guide the deblurring reconstruction of 3D-GS. By jointly optimizing the 3D-GS parameters and recovering camera motion trajectories during the exposure time, our method can robustly facilitate the acquisition of high-fidelity novel views with intricate texture details. We comprehensively evaluated our method and compared it with previous state-of-the-art deblurring rendering methods. Both qualitative and quantitative comparisons demonstrate that our method surpasses existing techniques in restoring fine details from blurry images and producing high-fidelity novel views.", + "arxiv_url": "http://arxiv.org/abs/2405.20224v2", + "pdf_url": "http://arxiv.org/pdf/2405.20224v2", + "published_date": "2024-05-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GFlow: Recovering 4D World from Monocular Video", + "authors": [ + "Shizun Wang", + "Xingyi Yang", + "Qiuhong Shen", + "Zhenxiang Jiang", + "Xinchao Wang" + ], + "abstract": "Reconstructing 4D scenes from video inputs is a crucial yet challenging task. Conventional methods usually rely on the assumptions of multi-view video inputs, known camera parameters, or static scenes, all of which are typically absent under in-the-wild scenarios. In this paper, we relax all these constraints and tackle a highly ambitious but practical task, which we termed as AnyV4D: we assume only one monocular video is available without any camera parameters as input, and we aim to recover the dynamic 4D world alongside the camera poses. To this end, we introduce GFlow, a new framework that utilizes only 2D priors (depth and optical flow) to lift a video (3D) to a 4D explicit representation, entailing a flow of Gaussian splatting through space and time. GFlow first clusters the scene into still and moving parts, then applies a sequential optimization process that optimizes camera poses and the dynamics of 3D Gaussian points based on 2D priors and scene clustering, ensuring fidelity among neighboring points and smooth movement across frames. Since dynamic scenes always introduce new content, we also propose a new pixel-wise densification strategy for Gaussian points to integrate new visual content. Moreover, GFlow transcends the boundaries of mere 4D reconstruction; it also enables tracking of any points across frames without the need for prior training and segments moving objects from the scene in an unsupervised way. Additionally, the camera poses of each frame can be derived from GFlow, allowing for rendering novel views of a video scene through changing camera pose. By employing the explicit representation, we may readily conduct scene-level or object-level editing as desired, underscoring its versatility and power. Visit our project website at: https://littlepure2333.github.io/GFlow", + "arxiv_url": "http://arxiv.org/abs/2405.18426v1", + "pdf_url": "http://arxiv.org/pdf/2405.18426v1", + "published_date": "2024-05-28", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting", + "authors": [ + "Qihang Zhang", + "Yinghao Xu", + "Chaoyang Wang", + "Hsin-Ying Lee", + "Gordon Wetzstein", + "Bolei Zhou", + "Ceyuan Yang" + ], + "abstract": "Scene image editing is crucial for entertainment, photography, and advertising design. Existing methods solely focus on either 2D individual object or 3D global scene editing. This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity. In this work, we propose 3DitScene, a novel and unified scene editing framework leveraging language-guided disentangled Gaussian Splatting that enables seamless editing from 2D to 3D, allowing precise control over scene composition and individual objects. We first incorporate 3D Gaussians that are refined through generative priors and optimization techniques. Language features from CLIP then introduce semantics into 3D geometry for object disentanglement. With the disentangled Gaussians, 3DitScene allows for manipulation at both the global and individual levels, revolutionizing creative expression and empowering control over scenes and objects. Experimental results demonstrate the effectiveness and versatility of 3DitScene in scene image editing. Code and online demo can be found at our project homepage: https://zqh0253.github.io/3DitScene/.", + "arxiv_url": "http://arxiv.org/abs/2405.18424v1", + "pdf_url": "http://arxiv.org/pdf/2405.18424v1", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D StreetUnveiler with Semantic-Aware 2DGS", + "authors": [ + "Jingwei Xu", + "Yikai Wang", + "Yiqun Zhao", + "Yanwei Fu", + "Shenghua Gao" + ], + "abstract": "Unveiling an empty street from crowded observations captured by in-car cameras is crucial for autonomous driving. However, removing all temporarily static objects, such as stopped vehicles and standing pedestrians, presents a significant challenge. Unlike object-centric 3D inpainting, which relies on thorough observation in a small scene, street scene cases involve long trajectories that differ from previous 3D inpainting tasks. The camera-centric moving environment of captured videos further complicates the task due to the limited degree and time duration of object observation. To address these obstacles, we introduce StreetUnveiler to reconstruct an empty street. StreetUnveiler learns a 3D representation of the empty street from crowded observations. Our representation is based on the hard-label semantic 2D Gaussian Splatting (2DGS) for its scalability and ability to identify Gaussians to be removed. We inpaint rendered image after removing unwanted Gaussians to provide pseudo-labels and subsequently re-optimize the 2DGS. Given its temporal continuous movement, we divide the empty street scene into observed, partial-observed, and unobserved regions, which we propose to locate through a rendered alpha map. This decomposition helps us to minimize the regions that need to be inpainted. To enhance the temporal consistency of the inpainting, we introduce a novel time-reversal framework to inpaint frames in reverse order and use later frames as references for earlier frames to fully utilize the long-trajectory observations. Our experiments conducted on the street scene dataset successfully reconstructed a 3D representation of the empty street. The mesh representation of the empty street can be extracted for further applications. The project page and more visualizations can be found at: https://streetunveiler.github.io", + "arxiv_url": "http://arxiv.org/abs/2405.18416v2", + "pdf_url": "http://arxiv.org/pdf/2405.18416v2", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NegGS: Negative Gaussian Splatting", + "authors": [ + "Artur Kasymov", + "Bartosz Czekaj", + "Marcin Mazur", + "Jacek Tabor", + "Przemysław Spurek" + ], + "abstract": "One of the key advantages of 3D rendering is its ability to simulate intricate scenes accurately. One of the most widely used methods for this purpose is Gaussian Splatting, a novel approach that is known for its rapid training and inference capabilities. In essence, Gaussian Splatting involves incorporating data about the 3D objects of interest into a series of Gaussian distributions, each of which can then be depicted in 3D in a manner analogous to traditional meshes. It is regrettable that the use of Gaussians in Gaussian Splatting is currently somewhat restrictive due to their perceived linear nature. In practice, 3D objects are often composed of complex curves and highly nonlinear structures. This issue can to some extent be alleviated by employing a multitude of Gaussian components to reflect the complex, nonlinear structures accurately. However, this approach results in a considerable increase in time complexity. This paper introduces the concept of negative Gaussians, which are interpreted as items with negative colors. The rationale behind this approach is based on the density distribution created by dividing the probability density functions (PDFs) of two Gaussians, which we refer to as Diff-Gaussian. Such a distribution can be used to approximate structures such as donut and moon-shaped datasets. Experimental findings indicate that the application of these techniques enhances the modeling of high-frequency elements with rapid color transitions. Additionally, it improves the representation of shadows. To the best of our knowledge, this is the first paper to extend the simple elipsoid shapes of Gaussian Splatting to more complex nonlinear structures.", + "arxiv_url": "http://arxiv.org/abs/2405.18163v2", + "pdf_url": "http://arxiv.org/pdf/2405.18163v2", + "published_date": "2024-05-28", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Grid-Free Fluid Solver based on Gaussian Spatial Representation", + "authors": [ + "Jingrui Xing", + "Bin Wang", + "Mengyu Chu", + "Baoquan Chen" + ], + "abstract": "We present a grid-free fluid solver featuring a novel Gaussian representation. Drawing inspiration from the expressive capabilities of 3D Gaussian Splatting in multi-view image reconstruction, we model the continuous flow velocity as a weighted sum of multiple Gaussian functions. Leveraging this representation, we derive differential operators for the field and implement a time-dependent PDE solver using the traditional operator splitting method. Compared to implicit neural representations as another continuous spatial representation with increasing attention, our method with flexible 3D Gaussians presents enhanced accuracy on vorticity preservation. Moreover, we apply physics-driven strategies to accelerate the optimization-based time integration of Gaussian functions. This temporal evolution surpasses previous work based on implicit neural representation with reduced computational time and memory. Although not surpassing the quality of state-of-the-art Eulerian methods in fluid simulation, experiments and ablation studies indicate the potential of our memory-efficient representation. With enriched spatial information, our method exhibits a distinctive perspective combining the advantages of Eulerian and Lagrangian approaches.", + "arxiv_url": "http://arxiv.org/abs/2405.18133v1", + "pdf_url": "http://arxiv.org/pdf/2405.18133v1", + "published_date": "2024-05-28", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EG4D: Explicit Generation of 4D Object without Score Distillation", + "authors": [ + "Qi Sun", + "Zhiyang Guo", + "Ziyu Wan", + "Jing Nathan Yan", + "Shengming Yin", + "Wengang Zhou", + "Jing Liao", + "Houqiang Li" + ], + "abstract": "In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and Janus problem. Therefore, inspired by recent progress of video diffusion models, we propose to optimize a 4D representation by explicitly generating multi-view videos from one input image. However, it is far from trivial to handle practical challenges faced by such a pipeline, including dramatic temporal inconsistency, inter-frame geometry and texture diversity, and semantic defects brought by video generation results. To address these issues, we propose DG4D, a novel multi-stage framework that generates high-quality and consistent 4D assets without score distillation. Specifically, collaborative techniques and solutions are developed, including an attention injection strategy to synthesize temporal-consistent multi-view videos, a robust and efficient dynamic reconstruction method based on Gaussian Splatting, and a refinement stage with diffusion prior for semantic restoration. The qualitative results and user preference study demonstrate that our framework outperforms the baselines in generation quality by a considerable margin. Code will be released at \\url{https://github.com/jasongzy/EG4D}.", + "arxiv_url": "http://arxiv.org/abs/2405.18132v1", + "pdf_url": "http://arxiv.org/pdf/2405.18132v1", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/jasongzy/EG4D", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields", + "authors": [ + "Mihnea-Bogdan Jurca", + "Remco Royen", + "Ion Giosan", + "Adrian Munteanu" + ], + "abstract": "Gaussian Splatting has revolutionized the world of novel view synthesis by achieving high rendering performance in real-time. Recently, studies have focused on enriching these 3D representations with semantic information for downstream tasks. In this paper, we introduce RT-GS2, the first generalizable semantic segmentation method employing Gaussian Splatting. While existing Gaussian Splatting-based approaches rely on scene-specific training, RT-GS2 demonstrates the ability to generalize to unseen scenes. Our method adopts a new approach by first extracting view-independent 3D Gaussian features in a self-supervised manner, followed by a novel View-Dependent / View-Independent (VDVI) feature fusion to enhance semantic consistency over different views. Extensive experimentation on three different datasets showcases RT-GS2's superiority over the state-of-the-art methods in semantic segmentation quality, exemplified by a 8.01% increase in mIoU on the Replica dataset. Moreover, our method achieves real-time performance of 27.03 FPS, marking an astonishing 901 times speedup compared to existing approaches. This work represents a significant advancement in the field by introducing, to the best of our knowledge, the first real-time generalizable semantic segmentation method for 3D Gaussian representations of radiance fields.", + "arxiv_url": "http://arxiv.org/abs/2405.18033v2", + "pdf_url": "http://arxiv.org/pdf/2405.18033v2", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes", + "authors": [ + "Yunsong Wang", + "Tianxin Huang", + "Hanlin Chen", + "Gim Hee Lee" + ], + "abstract": "Empowering 3D Gaussian Splatting with generalization ability is appealing. However, existing generalizable 3D Gaussian Splatting methods are largely confined to narrow-range interpolation between stereo images due to their heavy backbones, thus lacking the ability to accurately localize 3D Gaussian and support free-view synthesis across wide view range. In this paper, we present a novel framework FreeSplat that is capable of reconstructing geometrically consistent 3D scenes from long sequence input towards free-view synthesis.Specifically, we firstly introduce Low-cost Cross-View Aggregation achieved by constructing adaptive cost volumes among nearby views and aggregating features using a multi-scale structure. Subsequently, we present the Pixel-wise Triplet Fusion to eliminate redundancy of 3D Gaussians in overlapping view regions and to aggregate features observed across multiple views. Additionally, we propose a simple but effective free-view training strategy that ensures robust view synthesis across broader view range regardless of the number of views. Our empirical results demonstrate state-of-the-art novel view synthesis peformances in both novel view rendered color maps quality and depth maps accuracy across different numbers of input views. We also show that FreeSplat performs inference more efficiently and can effectively reduce redundant Gaussians, offering the possibility of feed-forward large scene reconstruction without depth priors.", + "arxiv_url": "http://arxiv.org/abs/2405.17958v3", + "pdf_url": "http://arxiv.org/pdf/2405.17958v3", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction", + "authors": [ + "Bin Zhang", + "Bi Zeng", + "Zexin Peng" + ], + "abstract": "In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering quality and speed of radiance fields but inevitably led to a significant increase in memory usage. Additionally, effectively rendering dynamic scenes in 3D-GS has emerged as a pressing challenge. To address these concerns, this paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Firstly, we use a deformable multi-layer perceptron (MLP) network to capture the dynamic offset of Gaussian points and express the color features of points through hash encoding and a tiny MLP to reduce storage requirements. Subsequently, we introduce a learnable denoising mask coupled with denoising loss to eliminate noise points from the scene, thereby further compressing 3D Gaussian model. Finally, motion noise of points is mitigated through static constraints and motion consistency constraints. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping.", + "arxiv_url": "http://arxiv.org/abs/2405.17891v1", + "pdf_url": "http://arxiv.org/pdf/2405.17891v1", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HFGS: 4D Gaussian Splatting with Emphasis on Spatial and Temporal High-Frequency Components for Endoscopic Scene Reconstruction", + "authors": [ + "Haoyu Zhao", + "Xingyue Zhao", + "Lingting Zhu", + "Weixi Zheng", + "Yongchao Xu" + ], + "abstract": "Robot-assisted minimally invasive surgery benefits from enhancing dynamic scene reconstruction, as it improves surgical outcomes. While Neural Radiance Fields (NeRF) have been effective in scene reconstruction, their slow inference speeds and lengthy training durations limit their applicability. To overcome these limitations, 3D Gaussian Splatting (3D-GS) based methods have emerged as a recent trend, offering rapid inference capabilities and superior 3D quality. However, these methods still struggle with under-reconstruction in both static and dynamic scenes. In this paper, we propose HFGS, a novel approach for deformable endoscopic reconstruction that addresses these challenges from spatial and temporal frequency perspectives. Our approach incorporates deformation fields to better handle dynamic scenes and introduces Spatial High-Frequency Emphasis Reconstruction (SHF) to minimize discrepancies in spatial frequency spectra between the rendered image and its ground truth. Additionally, we introduce Temporal High-Frequency Emphasis Reconstruction (THF) to enhance dynamic awareness in neural rendering by leveraging flow priors, focusing optimization on motion-intensive parts. Extensive experiments on two widely used benchmarks demonstrate that HFGS achieves superior rendering quality.", + "arxiv_url": "http://arxiv.org/abs/2405.17872v3", + "pdf_url": "http://arxiv.org/pdf/2405.17872v3", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Deform3DGS: Flexible Deformation for Fast Surgical Scene Reconstruction with Gaussian Splatting", + "authors": [ + "Shuojue Yang", + "Qian Li", + "Daiyun Shen", + "Bingchen Gong", + "Qi Dou", + "Yueming Jin" + ], + "abstract": "Tissue deformation poses a key challenge for accurate surgical scene reconstruction. Despite yielding high reconstruction quality, existing methods suffer from slow rendering speeds and long training times, limiting their intraoperative applicability. Motivated by recent progress in 3D Gaussian Splatting, an emerging technology in real-time 3D rendering, this work presents a novel fast reconstruction framework, termed Deform3DGS, for deformable tissues during endoscopic surgery. Specifically, we introduce 3D GS into surgical scenes by integrating a point cloud initialization to improve reconstruction. Furthermore, we propose a novel flexible deformation modeling scheme (FDM) to learn tissue deformation dynamics at the level of individual Gaussians. Our FDM can model the surface deformation with efficient representations, allowing for real-time rendering performance. More importantly, FDM significantly accelerates surgical scene reconstruction, demonstrating considerable clinical values, particularly in intraoperative settings where time efficiency is crucial. Experiments on DaVinci robotic surgery videos indicate the efficacy of our approach, showcasing superior reconstruction fidelity PSNR: (37.90) and rendering speed (338.8 FPS) while substantially reducing training time to only 1 minute/scene. Our code is available at https://github.com/jinlab-imvr/Deform3DGS.", + "arxiv_url": "http://arxiv.org/abs/2405.17835v3", + "pdf_url": "http://arxiv.org/pdf/2405.17835v3", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/jinlab-imvr/Deform3DGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh", + "authors": [ + "Xiangjun Gao", + "Xiaoyu Li", + "Yiyu Zhuang", + "Qi Zhang", + "Wenbo Hu", + "Chaopeng Zhang", + "Yao Yao", + "Ying Shan", + "Long Quan" + ], + "abstract": "Neural 3D representations such as Neural Radiance Fields (NeRF), excel at producing photo-realistic rendering results but lack the flexibility for manipulation and editing which is crucial for content creation. Previous works have attempted to address this issue by deforming a NeRF in canonical space or manipulating the radiance field based on an explicit mesh. However, manipulating NeRF is not highly controllable and requires a long training and inference time. With the emergence of 3D Gaussian Splatting (3DGS), extremely high-fidelity novel view synthesis can be achieved using an explicit point-based 3D representation with much faster training and rendering speed. However, there is still a lack of effective means to manipulate 3DGS freely while maintaining rendering quality. In this work, we aim to tackle the challenge of achieving manipulable photo-realistic rendering. We propose to utilize a triangular mesh to manipulate 3DGS directly with self-adaptation. This approach reduces the need to design various algorithms for different types of Gaussian manipulation. By utilizing a triangle shape-aware Gaussian binding and adapting method, we can achieve 3DGS manipulation and preserve high-fidelity rendering after manipulation. Our approach is capable of handling large deformations, local manipulations, and soft body simulations while keeping high-quality rendering. Furthermore, we demonstrate that our method is also effective with inaccurate meshes extracted from 3DGS. Experiments conducted demonstrate the effectiveness of our method and its superiority over baseline approaches.", + "arxiv_url": "http://arxiv.org/abs/2405.17811v1", + "pdf_url": "http://arxiv.org/pdf/2405.17811v1", + "published_date": "2024-05-28", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SafeguardGS: 3D Gaussian Primitive Pruning While Avoiding Catastrophic Scene Destruction", + "authors": [ + "Yongjae Lee", + "Zhaoliang Zhang", + "Deliang Fan" + ], + "abstract": "3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis. However, its suboptimal densification process results in the excessively large number of Gaussian primitives, which impacts frame-per-second and increases memory usage, making it unsuitable for low-end devices. To address this issue, many follow-up studies have proposed various pruning techniques with score functions designed to identify and remove less important primitives. Nonetheless, a comprehensive discussion of their effectiveness and implications across all techniques is missing. In this paper, we are the first to categorize 3DGS pruning techniques into two types: Scene-level pruning and Pixel-level pruning, distinguished by their scope for ranking primitives. Our subsequent experiments reveal that, while scene-level pruning leads to disastrous quality drops under extreme decimation of Gaussian primitives, pixel-level pruning not only sustains relatively high rendering quality with minuscule performance degradation but also provides an inherent boundary of pruning, i.e., a safeguard of Gaussian pruning. Building on this observation, we further propose multiple variations of score functions based on the factors of rendering equations and discover that assessing based on color similarity with blending weight is the most effective method for discriminating insignificant primitives. In our experiments, our SafeguardGS with the optimal score function shows the highest PSNR-per-primitive performance under an extreme pruning setting, retaining only about 10% of the primitives from the original 3DGS scene (i.e., 10x compression ratio). We believe our research provides valuable insights for optimizing 3DGS for future works.", + "arxiv_url": "http://arxiv.org/abs/2405.17793v2", + "pdf_url": "http://arxiv.org/pdf/2405.17793v2", + "published_date": "2024-05-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos", + "authors": [ + "Linhan Wang", + "Kai Cheng", + "Shuo Lei", + "Shengkun Wang", + "Wei Yin", + "Chenyang Lei", + "Xiaoxiao Long", + "Chang-Tien Lu" + ], + "abstract": "We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across various types of vehicles and capture a broader range of scenarios. Dash cam videos often suffer from severe obstructions such as reflections and occlusions on the windshields, which significantly impede the application of neural rendering techniques. To address this challenge, we develop DC-Gaussian based on the recent real-time neural rendering technique 3D Gaussian Splatting (3DGS). Our approach includes an adaptive image decomposition module to model reflections and occlusions in a unified manner. Additionally, we introduce illumination-aware obstruction modeling to manage reflections and occlusions under varying lighting conditions. Lastly, we employ a geometry-guided Gaussian enhancement strategy to improve rendering details by incorporating additional geometry priors. Experiments on self-captured and public dash cam videos show that our method not only achieves state-of-the-art performance in novel view synthesis, but also accurately reconstructing captured scenes getting rid of obstructions. See the project page for code, data: https://linhanwang.github.io/dcgaussian/.", + "arxiv_url": "http://arxiv.org/abs/2405.17705v3", + "pdf_url": "http://arxiv.org/pdf/2405.17705v3", + "published_date": "2024-05-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane", + "authors": [ + "Yansong Qu", + "Shaohui Dai", + "Xinyang Li", + "Jianghang Lin", + "Liujuan Cao", + "Shengchuan Zhang", + "Rongrong Ji" + ], + "abstract": "3D open-vocabulary scene understanding, crucial for advancing augmented reality and robotic applications, involves interpreting and locating specific regions within a 3D space as directed by natural language instructions. To this end, we introduce GOI, a framework that integrates semantic features from 2D vision-language foundation models into 3D Gaussian Splatting (3DGS) and identifies 3D Gaussians of Interest using an Optimizable Semantic-space Hyperplane. Our approach includes an efficient compression method that utilizes scene priors to condense noisy high-dimensional semantic features into compact low-dimensional vectors, which are subsequently embedded in 3DGS. During the open-vocabulary querying process, we adopt a distinct approach compared to existing methods, which depend on a manually set fixed empirical threshold to select regions based on their semantic feature distance to the query text embedding. This traditional approach often lacks universal accuracy, leading to challenges in precisely identifying specific target areas. Instead, our method treats the feature selection process as a hyperplane division within the feature space, retaining only those features that are highly relevant to the query. We leverage off-the-shelf 2D Referring Expression Segmentation (RES) models to fine-tune the semantic-space hyperplane, enabling a more precise distinction between target regions and others. This fine-tuning substantially improves the accuracy of open-vocabulary queries, ensuring the precise localization of pertinent 3D Gaussians. Extensive experiments demonstrate GOI's superiority over previous state-of-the-art methods. Our project page is available at https://quyans.github.io/GOI-Hyperplane/ .", + "arxiv_url": "http://arxiv.org/abs/2405.17596v2", + "pdf_url": "http://arxiv.org/pdf/2405.17596v2", + "published_date": "2024-05-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction", + "authors": [ + "Yuanhui Huang", + "Wenzhao Zheng", + "Yunpeng Zhang", + "Jie Zhou", + "Jiwen Lu" + ], + "abstract": "3D semantic occupancy prediction aims to obtain 3D fine-grained geometry and semantics of the surrounding scene and is an important task for the robustness of vision-centric autonomous driving. Most existing methods employ dense grids such as voxels as scene representations, which ignore the sparsity of occupancy and the diversity of object scales and thus lead to unbalanced allocation of resources. To address this, we propose an object-centric representation to describe 3D scenes with sparse 3D semantic Gaussians where each Gaussian represents a flexible region of interest and its semantic features. We aggregate information from images through the attention mechanism and iteratively refine the properties of 3D Gaussians including position, covariance, and semantics. We then propose an efficient Gaussian-to-voxel splatting method to generate 3D occupancy predictions, which only aggregates the neighboring Gaussians for a certain position. We conduct extensive experiments on the widely adopted nuScenes and KITTI-360 datasets. Experimental results demonstrate that GaussianFormer achieves comparable performance with state-of-the-art methods with only 17.8% - 24.8% of their memory consumption. Code is available at: https://github.com/huang-yh/GaussianFormer.", + "arxiv_url": "http://arxiv.org/abs/2405.17429v1", + "pdf_url": "http://arxiv.org/pdf/2405.17429v1", + "published_date": "2024-05-27", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "https://github.com/huang-yh/GaussianFormer", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds", + "authors": [ + "Jiahui Lei", + "Yijia Weng", + "Adam Harley", + "Leonidas Guibas", + "Kostas Daniilidis" + ], + "abstract": "We introduce 4D Motion Scaffolds (MoSca), a modern 4D reconstruction system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild. To address such a challenging and ill-posed inverse problem, we leverage prior knowledge from foundational vision models and lift the video data to a novel Motion Scaffold (MoSca) representation, which compactly and smoothly encodes the underlying motions/deformations. The scene geometry and appearance are then disentangled from the deformation field and are encoded by globally fusing the Gaussians anchored onto the MoSca and optimized via Gaussian Splatting. Additionally, camera focal length and poses can be solved using bundle adjustment without the need of any other pose estimation tools. Experiments demonstrate state-of-the-art performance on dynamic rendering benchmarks and its effectiveness on real videos.", + "arxiv_url": "http://arxiv.org/abs/2405.17421v2", + "pdf_url": "http://arxiv.org/pdf/2405.17421v2", + "published_date": "2024-05-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Refocusing,Defocus Rendering and Blur Removal", + "authors": [ + "Yujie Wang", + "Praneeth Chakravarthula", + "Baoquan Chen" + ], + "abstract": "3D Gaussian Splatting-based techniques have recently advanced 3D scene reconstruction and novel view synthesis, achieving high-quality real-time rendering. However, these approaches are inherently limited by the underlying pinhole camera assumption in modeling the images and hence only work for All-in-Focus (AiF) sharp image inputs. This severely affects their applicability in real-world scenarios where images often exhibit defocus blur due to the limited depth-of-field (DOF) of imaging devices. Additionally, existing 3D Gaussian Splatting (3DGS) methods also do not support rendering of DOF effects. To address these challenges, we introduce DOF-GS that allows for rendering adjustable DOF effects, removing defocus blur as well as refocusing of 3D scenes, all from multi-view images degraded by defocus blur. To this end, we re-imagine the traditional Gaussian Splatting pipeline by employing a finite aperture camera model coupled with explicit, differentiable defocus rendering guided by the Circle-of-Confusion (CoC). The proposed framework provides for dynamic adjustment of DOF effects by changing the aperture and focal distance of the underlying camera model on-demand. It also enables rendering varying DOF effects of 3D scenes post-optimization, and generating AiF images from defocused training images. Furthermore, we devise a joint optimization strategy to further enhance details in the reconstructed scenes by jointly optimizing rendered defocused and AiF images. Our experimental results indicate that DOF-GS produces high-quality sharp all-in-focus renderings conditioned on inputs compromised by defocus blur, with the training process incurring only a modest increase in GPU memory consumption. We further demonstrate the applications of the proposed method for adjustable defocus rendering and refocusing of the 3D scene from input images degraded by defocus blur.", + "arxiv_url": "http://arxiv.org/abs/2405.17351v1", + "pdf_url": "http://arxiv.org/pdf/2405.17351v1", + "published_date": "2024-05-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Memorize What Matters: Emergent Scene Decomposition from Multitraverse", + "authors": [ + "Yiming Li", + "Zehong Wang", + "Yue Wang", + "Zhiding Yu", + "Zan Gojcic", + "Marco Pavone", + "Chen Feng", + "Jose M. Alvarez" + ], + "abstract": "Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.", + "arxiv_url": "http://arxiv.org/abs/2405.17187v2", + "pdf_url": "http://arxiv.org/pdf/2405.17187v2", + "published_date": "2024-05-27", + "categories": [ + "cs.CV", + "cs.AI", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting", + "authors": [ + "Xiangyu Sun", + "Joo Chan Lee", + "Daniel Rho", + "Jong Hwan Ko", + "Usman Ali", + "Eunbyung Park" + ], + "abstract": "The neural radiance field (NeRF) has made significant strides in representing 3D scenes and synthesizing novel views. Despite its advancements, the high computational costs of NeRF have posed challenges for its deployment in resource-constrained environments and real-time applications. As an alternative to NeRF-like neural rendering methods, 3D Gaussian Splatting (3DGS) offers rapid rendering speeds while maintaining excellent image quality. However, as it represents objects and scenes using a myriad of Gaussians, it requires substantial storage to achieve high-quality representation. To mitigate the storage overhead, we propose Factorized 3D Gaussian Splatting (F-3DGS), a novel approach that drastically reduces storage requirements while preserving image quality. Inspired by classical matrix and tensor factorization techniques, our method represents and approximates dense clusters of Gaussians with significantly fewer Gaussians through efficient factorization. We aim to efficiently represent dense 3D Gaussians by approximating them with a limited amount of information for each axis and their combinations. This method allows us to encode a substantially large number of Gaussians along with their essential attributes -- such as color, scale, and rotation -- necessary for rendering using a relatively small number of elements. Extensive experimental results demonstrate that F-3DGS achieves a significant reduction in storage costs while maintaining comparable quality in rendered images.", + "arxiv_url": "http://arxiv.org/abs/2405.17083v2", + "pdf_url": "http://arxiv.org/pdf/2405.17083v2", + "published_date": "2024-05-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain", + "authors": [ + "Butian Xiong", + "Xiaoyu Ye", + "Tze Ho Elden Tse", + "Kai Han", + "Shuguang Cui", + "Zhen Li" + ], + "abstract": "With the emergence of Gaussian Splats, recent efforts have focused on large-scale scene geometric reconstruction. However, most of these efforts either concentrate on memory reduction or spatial space division, neglecting information in the semantic space. In this paper, we propose a novel method, named SA-GS, for fine-grained 3D geometry reconstruction using semantic-aware 3D Gaussian Splats. Specifically, we leverage prior information stored in large vision models such as SAM and DINO to generate semantic masks. We then introduce a geometric complexity measurement function to serve as soft regularization, guiding the shape of each Gaussian Splat within specific semantic areas. Additionally, we present a method that estimates the expected number of Gaussian Splats in different semantic areas, effectively providing a lower bound for Gaussian Splats in these areas. Subsequently, we extract the point cloud using a novel probability density-based extraction method, transforming Gaussian Splats into a point cloud crucial for downstream tasks. Our method also offers the potential for detailed semantic inquiries while maintaining high image-based reconstruction results. We provide extensive experiments on publicly available large-scale scene reconstruction datasets with highly accurate point clouds as ground truth and our novel dataset. Our results demonstrate the superiority of our method over current state-of-the-art Gaussian Splats reconstruction methods by a significant margin in terms of geometric-based measurement metrics. Code and additional results will soon be available on our project page.", + "arxiv_url": "http://arxiv.org/abs/2405.16923v2", + "pdf_url": "http://arxiv.org/pdf/2405.16923v2", + "published_date": "2024-05-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation", + "authors": [ + "Zhoujie Fu", + "Jiacheng Wei", + "Wenhao Shen", + "Chaoyue Song", + "Xiaofeng Yang", + "Fayao Liu", + "Xulei Yang", + "Guosheng Lin" + ], + "abstract": "In this work, we introduce a novel approach for creating controllable dynamics in 3D-generated Gaussians using casually captured reference videos. Our method transfers the motion of objects from reference videos to a variety of generated 3D Gaussians across different categories, ensuring precise and customizable motion transfer. We achieve this by employing blend skinning-based non-parametric shape reconstruction to extract the shape and motion of reference objects. This process involves segmenting the reference objects into motion-related parts based on skinning weights and establishing shape correspondences with generated target shapes. To address shape and temporal inconsistencies prevalent in existing methods, we integrate physical simulation, driving the target shapes with matched motion. This integration is optimized through a displacement loss to ensure reliable and genuine dynamics. Our approach supports diverse reference inputs, including humans, quadrupeds, and articulated objects, and can generate dynamics of arbitrary length, providing enhanced fidelity and applicability. Unlike methods heavily reliant on diffusion video generation models, our technique offers specific and high-quality motion transfer, maintaining both shape integrity and temporal consistency.", + "arxiv_url": "http://arxiv.org/abs/2405.16849v3", + "pdf_url": "http://arxiv.org/pdf/2405.16849v3", + "published_date": "2024-05-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting", + "authors": [ + "Zipeng Wang", + "Dan Xu" + ], + "abstract": "Neural Radiance Fields (NeRFs) have demonstrated remarkable proficiency in synthesizing photorealistic images of large-scale scenes. However, they are often plagued by a loss of fine details and long rendering durations. 3D Gaussian Splatting has recently been introduced as a potent alternative, achieving both high-fidelity visual results and accelerated rendering performance. Nonetheless, scaling 3D Gaussian Splatting is fraught with challenges. Specifically, large-scale scenes grapples with the integration of objects across multiple scales and disparate viewpoints, which often leads to compromised efficacy as the Gaussians need to balance between detail levels. Furthermore, the generation of initialization points via COLMAP from large-scale dataset is both computationally demanding and prone to incomplete reconstructions. To address these challenges, we present Pyramidal 3D Gaussian Splatting (PyGS) with NeRF Initialization. Our approach represent the scene with a hierarchical assembly of Gaussians arranged in a pyramidal fashion. The top level of the pyramid is composed of a few large Gaussians, while each subsequent layer accommodates a denser collection of smaller Gaussians. We effectively initialize these pyramidal Gaussians through sampling a rapidly trained grid-based NeRF at various frequencies. We group these pyramidal Gaussians into clusters and use a compact weighting network to dynamically determine the influence of each pyramid level of each cluster considering camera viewpoint during rendering. Our method achieves a significant performance leap across multiple large-scale datasets and attains a rendering time that is over 400 times faster than current state-of-the-art approaches.", + "arxiv_url": "http://arxiv.org/abs/2405.16829v3", + "pdf_url": "http://arxiv.org/pdf/2405.16829v3", + "published_date": "2024-05-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models", + "authors": [ + "Hanwen Liang", + "Yuyang Yin", + "Dejia Xu", + "Hanxue Liang", + "Zhangyang Wang", + "Konstantinos N. Plataniotis", + "Yao Zhao", + "Yunchao Wei" + ], + "abstract": "The availability of large-scale multimodal datasets and advancements in diffusion models have significantly accelerated progress in 4D content generation. Most prior approaches rely on multiple image or video diffusion models, utilizing score distillation sampling for optimization or generating pseudo novel views for direct supervision. However, these methods are hindered by slow optimization speeds and multi-view inconsistency issues. Spatial and temporal consistency in 4D geometry has been extensively explored respectively in 3D-aware diffusion models and traditional monocular video diffusion models. Building on this foundation, we propose a strategy to migrate the temporal consistency in video diffusion models to the spatial-temporal consistency required for 4D generation. Specifically, we present a novel framework, \\textbf{Diffusion4D}, for efficient and scalable 4D content generation. Leveraging a meticulously curated dynamic 3D dataset, we develop a 4D-aware video diffusion model capable of synthesizing orbital views of dynamic 3D assets. To control the dynamic strength of these assets, we introduce a 3D-to-4D motion magnitude metric as guidance. Additionally, we propose a novel motion magnitude reconstruction loss and 3D-aware classifier-free guidance to refine the learning and generation of motion dynamics. After obtaining orbital views of the 4D asset, we perform explicit 4D construction with Gaussian splatting in a coarse-to-fine manner. The synthesized multi-view consistent 4D image set enables us to swiftly generate high-fidelity and diverse 4D assets within just several minutes. Extensive experiments demonstrate that our method surpasses prior state-of-the-art techniques in terms of generation efficiency and 4D geometry consistency across various prompt modalities.", + "arxiv_url": "http://arxiv.org/abs/2405.16645v1", + "pdf_url": "http://arxiv.org/pdf/2405.16645v1", + "published_date": "2024-05-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians", + "authors": [ + "Erik Sandström", + "Keisuke Tateno", + "Michael Oechsle", + "Michael Niemeyer", + "Luc Van Gool", + "Martin R. Oswald", + "Federico Tombari" + ], + "abstract": "3D Gaussian Splatting has emerged as a powerful representation of geometry and appearance for RGB-only dense Simultaneous Localization and Mapping (SLAM), as it provides a compact dense map representation while enabling efficient and high-quality map rendering. However, existing methods show significantly worse reconstruction quality than competing methods using other 3D representations, e.g. neural points clouds, since they either do not employ global map and pose optimization or make use of monocular depth. In response, we propose the first RGB-only SLAM system with a dense 3D Gaussian map representation that utilizes all benefits of globally optimized tracking by adapting dynamically to keyframe pose and depth updates by actively deforming the 3D Gaussian map. Moreover, we find that refining the depth updates in inaccurate areas with a monocular depth estimator further improves the accuracy of the 3D reconstruction. Our experiments on the Replica, TUM-RGBD, and ScanNet datasets indicate the effectiveness of globally optimized 3D Gaussians, as the approach achieves superior or on par performance with existing RGB-only SLAM methods methods in tracking, mapping and rendering accuracy while yielding small map sizes and fast runtimes. The source code is available at https://github.com/eriksandstroem/Splat-SLAM.", + "arxiv_url": "http://arxiv.org/abs/2405.16544v1", + "pdf_url": "http://arxiv.org/pdf/2405.16544v1", + "published_date": "2024-05-26", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/eriksandstroem/Splat-SLAM", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors", + "authors": [ + "Soumava Paul", + "Christopher Wewer", + "Bernt Schiele", + "Jan Eric Lenssen" + ], + "abstract": "We aim to tackle sparse-view reconstruction of a 360 3D scene using priors from latent diffusion models (LDM). The sparse-view setting is ill-posed and underconstrained, especially for scenes where the camera rotates 360 degrees around a point, as no visual information is available beyond some frontal views focused on the central object(s) of interest. In this work, we show that pretrained 2D diffusion models can strongly improve the reconstruction of a scene with low-cost fine-tuning. Specifically, we present SparseSplat360 (Sp2360), a method that employs a cascade of in-painting and artifact removal models to fill in missing details and clean novel views. Due to superior training and rendering speeds, we use an explicit scene representation in the form of 3D Gaussians over NeRF-based implicit representations. We propose an iterative update strategy to fuse generated pseudo novel views with existing 3D Gaussians fitted to the initial sparse inputs. As a result, we obtain a multi-view consistent scene representation with details coherent with the observed inputs. Our evaluation on the challenging Mip-NeRF360 dataset shows that our proposed 2D to 3D distillation algorithm considerably improves the performance of a regularized version of 3DGS adapted to a sparse-view setting and outperforms existing sparse-view reconstruction methods in 360 scene reconstruction. Qualitatively, our method generates entire 360 scenes from as few as 9 input views, with a high degree of foreground and background detail.", + "arxiv_url": "http://arxiv.org/abs/2405.16517v2", + "pdf_url": "http://arxiv.org/pdf/2405.16517v2", + "published_date": "2024-05-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Feature Splatting for Better Novel View Synthesis with Low Overlap", + "authors": [ + "T. Berriel Martins", + "Javier Civera" + ], + "abstract": "3D Gaussian Splatting has emerged as a very promising scene representation, achieving state-of-the-art quality in novel view synthesis significantly faster than competing alternatives. However, its use of spherical harmonics to represent scene colors limits the expressivity of 3D Gaussians and, as a consequence, the capability of the representation to generalize as we move away from the training views. In this paper, we propose to encode the color information of 3D Gaussians into per-Gaussian feature vectors, which we denote as Feature Splatting (FeatSplat). To synthesize a novel view, Gaussians are first \"splatted\" into the image plane, then the corresponding feature vectors are alpha-blended, and finally the blended vector is decoded by a small MLP to render the RGB pixel values. To further inform the model, we concatenate a camera embedding to the blended feature vector, to condition the decoding also on the viewpoint information. Our experiments show that these novel model for encoding the radiance considerably improves novel view synthesis for low overlap views that are distant from the training views. Finally, we also show the capacity and convenience of our feature vector representation, demonstrating its capability not only to generate RGB values for novel views, but also their per-pixel semantic labels. Code available at https://github.com/tberriel/FeatSplat . Keywords: Gaussian Splatting, Novel View Synthesis, Feature Splatting", + "arxiv_url": "http://arxiv.org/abs/2405.15518v2", + "pdf_url": "http://arxiv.org/pdf/2405.15518v2", + "published_date": "2024-05-24", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/tberriel/FeatSplat", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSDeformer: Direct Cage-based Deformation for 3D Gaussian Splatting", + "authors": [ + "Jiajun Huang", + "Hongchuan Yu" + ], + "abstract": "We present GSDeformer, a method that achieves free-form deformation on 3D Gaussian Splatting(3DGS) without requiring any architectural changes. Our method extends cage-based deformation, a traditional mesh deformation method, to 3DGS. This is done by converting 3DGS into a novel proxy point cloud representation, where its deformation can be used to infer the transformations to apply on the 3D gaussians making up 3DGS. We also propose an automatic cage construction algorithm for 3DGS to minimize manual work. Our method does not modify the underlying architecture of 3DGS. Therefore, any existing trained vanilla 3DGS can be easily edited by our method. We compare the deformation capability of our method against other existing methods, demonstrating the ease of use and comparable quality of our method, despite being more direct and thus easier to integrate with other concurrent developments on 3DGS.", + "arxiv_url": "http://arxiv.org/abs/2405.15491v1", + "pdf_url": "http://arxiv.org/pdf/2405.15491v1", + "published_date": "2024-05-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Don't Splat your Gaussians: Volumetric Ray-Traced Primitives for Modeling and Rendering Scattering and Emissive Media", + "authors": [ + "Jorge Condor", + "Sebastien Speierer", + "Lukas Bode", + "Aljaz Bozic", + "Simon Green", + "Piotr Didyk", + "Adrian Jarabo" + ], + "abstract": "Efficient scene representations are essential for many computer graphics applications. A general unified representation that can handle both surfaces and volumes simultaneously, remains a research challenge. Inspired by recent methods for scene reconstruction that leverage mixtures of 3D Gaussians to model radiance fields, we formalize and generalize the modeling of scattering and emissive media using mixtures of simple kernel-based volumetric primitives. We introduce closed-form solutions for transmittance and free-flight distance sampling for different kernels, and propose several optimizations to use our method efficiently within any off-the-shelf volumetric path tracer. We demonstrate our method as a compact and efficient alternative to other forms of volume modeling for forward and inverse rendering of scattering media. Furthermore, we adapt and showcase our method in radiance field optimization and rendering, providing additional flexibility compared to current state of the art given its ray-tracing formulation. We also introduce the Epanechnikov kernel and demonstrate its potential as an efficient alternative to the traditionally-used Gaussian kernel in scene reconstruction tasks. The versatility and physically-based nature of our approach allows us to go beyond radiance fields and bring to kernel-based modeling and rendering any path-tracing enabled functionality such as scattering, relighting and complex camera models.", + "arxiv_url": "http://arxiv.org/abs/2405.15425v2", + "pdf_url": "http://arxiv.org/pdf/2405.15425v2", + "published_date": "2024-05-24", + "categories": [ + "cs.GR", + "cs.CV", + "I.3.2; I.3.3; I.3.6; I.3.5; I.3.7" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DisC-GS: Discontinuity-aware Gaussian Splatting", + "authors": [ + "Haoxuan Qu", + "Zhuoling Li", + "Hossein Rahmani", + "Yujun Cai", + "Jun Liu" + ], + "abstract": "Recently, Gaussian Splatting, a method that represents a 3D scene as a collection of Gaussian distributions, has gained significant attention in addressing the task of novel view synthesis. In this paper, we highlight a fundamental limitation of Gaussian Splatting: its inability to accurately render discontinuities and boundaries in images due to the continuous nature of Gaussian distributions. To address this issue, we propose a novel framework enabling Gaussian Splatting to perform discontinuity-aware image rendering. Additionally, we introduce a B\\'ezier-boundary gradient approximation strategy within our framework to keep the \"differentiability\" of the proposed discontinuity-aware rendering process. Extensive experiments demonstrate the efficacy of our framework.", + "arxiv_url": "http://arxiv.org/abs/2405.15196v2", + "pdf_url": "http://arxiv.org/pdf/2405.15196v2", + "published_date": "2024-05-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HDR-GS: Efficient High Dynamic Range Novel View Synthesis at 1000x Speed via Gaussian Splatting", + "authors": [ + "Yuanhao Cai", + "Zihao Xiao", + "Yixun Liang", + "Minghan Qin", + "Yulun Zhang", + "Xiaokang Yang", + "Yaoyao Liu", + "Alan Yuille" + ], + "abstract": "High dynamic range (HDR) novel view synthesis (NVS) aims to create photorealistic images from novel viewpoints using HDR imaging techniques. The rendered HDR images capture a wider range of brightness levels containing more details of the scene than normal low dynamic range (LDR) images. Existing HDR NVS methods are mainly based on NeRF. They suffer from long training time and slow inference speed. In this paper, we propose a new framework, High Dynamic Range Gaussian Splatting (HDR-GS), which can efficiently render novel HDR views and reconstruct LDR images with a user input exposure time. Specifically, we design a Dual Dynamic Range (DDR) Gaussian point cloud model that uses spherical harmonics to fit HDR color and employs an MLP-based tone-mapper to render LDR color. The HDR and LDR colors are then fed into two Parallel Differentiable Rasterization (PDR) processes to reconstruct HDR and LDR views. To establish the data foundation for the research of 3D Gaussian splatting-based methods in HDR NVS, we recalibrate the camera parameters and compute the initial positions for Gaussian point clouds. Experiments demonstrate that our HDR-GS surpasses the state-of-the-art NeRF-based method by 3.84 and 1.91 dB on LDR and HDR NVS while enjoying 1000x inference speed and only requiring 6.3% training time. Code and recalibrated data will be publicly available at https://github.com/caiyuanhao1998/HDR-GS . A brief video introduction of our work is available at https://youtu.be/wtU7Kcwe7ck", + "arxiv_url": "http://arxiv.org/abs/2405.15125v4", + "pdf_url": "http://arxiv.org/pdf/2405.15125v4", + "published_date": "2024-05-24", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/caiyuanhao1998/HDR-GS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-Hider: Hiding Messages into 3D Gaussian Splatting", + "authors": [ + "Xuanyu Zhang", + "Jiarui Meng", + "Runyi Li", + "Zhipei Xu", + "Yongbing Zhang", + "Jian Zhang" + ], + "abstract": "3D Gaussian Splatting (3DGS) has already become the emerging research focus in the fields of 3D scene reconstruction and novel view synthesis. Given that training a 3DGS requires a significant amount of time and computational cost, it is crucial to protect the copyright, integrity, and privacy of such 3D assets. Steganography, as a crucial technique for encrypted transmission and copyright protection, has been extensively studied. However, it still lacks profound exploration targeted at 3DGS. Unlike its predecessor NeRF, 3DGS possesses two distinct features: 1) explicit 3D representation; and 2) real-time rendering speeds. These characteristics result in the 3DGS point cloud files being public and transparent, with each Gaussian point having a clear physical significance. Therefore, ensuring the security and fidelity of the original 3D scene while embedding information into the 3DGS point cloud files is an extremely challenging task. To solve the above-mentioned issue, we first propose a steganography framework for 3DGS, dubbed GS-Hider, which can embed 3D scenes and images into original GS point clouds in an invisible manner and accurately extract the hidden messages. Specifically, we design a coupled secured feature attribute to replace the original 3DGS's spherical harmonics coefficients and then use a scene decoder and a message decoder to disentangle the original RGB scene and the hidden message. Extensive experiments demonstrated that the proposed GS-Hider can effectively conceal multimodal messages without compromising rendering quality and possesses exceptional security, robustness, capacity, and flexibility. Our project is available at: https://xuanyuzhang21.github.io/project/gshider.", + "arxiv_url": "http://arxiv.org/abs/2405.15118v2", + "pdf_url": "http://arxiv.org/pdf/2405.15118v2", + "published_date": "2024-05-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting", + "authors": [ + "Jiaxu Wang", + "Junhao He", + "Ziyi Zhang", + "Mingyuan Sun", + "Jingkai Sun", + "Renjing Xu" + ], + "abstract": "Event cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based generalizable 3D reconstruction framework, called EvGGS, which reconstructs scenes as 3D Gaussians from only event input in a feedforward manner and can generalize to unseen cases without any retraining. This framework includes a depth estimation module, an intensity reconstruction module, and a Gaussian regression module. These submodules connect in a cascading manner, and we collaboratively train them with a designed joint loss to make them mutually promote. To facilitate related studies, we build a novel event-based 3D dataset with various material objects and calibrated labels of grayscale images, depth maps, camera poses, and silhouettes. Experiments show models that have jointly trained significantly outperform those trained individually. Our approach performs better than all baselines in reconstruction quality, and depth/intensity predictions with satisfactory rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2405.14959v3", + "pdf_url": "http://arxiv.org/pdf/2405.14959v3", + "published_date": "2024-05-23", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras", + "authors": [ + "Hanzhang Tu", + "Ruizhi Shao", + "Xue Dong", + "Shunyuan Zheng", + "Hao Zhang", + "Lili Chen", + "Meili Wang", + "Wenyu Li", + "Siyan Ma", + "Shengping Zhang", + "Boyao Zhou", + "Yebin Liu" + ], + "abstract": "In this paper, we present a low-budget and high-authenticity bidirectional telepresence system, Tele-Aloha, targeting peer-to-peer communication scenarios. Compared to previous systems, Tele-Aloha utilizes only four sparse RGB cameras, one consumer-grade GPU, and one autostereoscopic screen to achieve high-resolution (2048x2048), real-time (30 fps), low-latency (less than 150ms) and robust distant communication. As the core of Tele-Aloha, we propose an efficient novel view synthesis algorithm for upper-body. Firstly, we design a cascaded disparity estimator for obtaining a robust geometry cue. Additionally a neural rasterizer via Gaussian Splatting is introduced to project latent features onto target view and to decode them into a reduced resolution. Further, given the high-quality captured data, we leverage weighted blending mechanism to refine the decoded image into the final resolution of 2K. Exploiting world-leading autostereoscopic display and low-latency iris tracking, users are able to experience a strong three-dimensional sense even without any wearable head-mounted display device. Altogether, our telepresence system demonstrates the sense of co-presence in real-life experiments, inspiring the next generation of communication.", + "arxiv_url": "http://arxiv.org/abs/2405.14866v1", + "pdf_url": "http://arxiv.org/pdf/2405.14866v1", + "published_date": "2024-05-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LDM: Large Tensorial SDF Model for Textured Mesh Generation", + "authors": [ + "Rengan Xie", + "Wenting Zheng", + "Kai Huang", + "Yizheng Chen", + "Qi Wang", + "Qi Ye", + "Wei Chen", + "Yuchi Huo" + ], + "abstract": "Previous efforts have managed to generate production-ready 3D assets from text or images. However, these methods primarily employ NeRF or 3D Gaussian representations, which are not adept at producing smooth, high-quality geometries required by modern rendering pipelines. In this paper, we propose LDM, a novel feed-forward framework capable of generating high-fidelity, illumination-decoupled textured mesh from a single image or text prompts. We firstly utilize a multi-view diffusion model to generate sparse multi-view inputs from single images or text prompts, and then a transformer-based model is trained to predict a tensorial SDF field from these sparse multi-view image inputs. Finally, we employ a gradient-based mesh optimization layer to refine this model, enabling it to produce an SDF field from which high-quality textured meshes can be extracted. Extensive experiments demonstrate that our method can generate diverse, high-quality 3D mesh assets with corresponding decomposed RGB textures within seconds.", + "arxiv_url": "http://arxiv.org/abs/2405.14580v3", + "pdf_url": "http://arxiv.org/pdf/2405.14580v3", + "published_date": "2024-05-23", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes", + "authors": [ + "Ruiyuan Gao", + "Kai Chen", + "Zhihao Li", + "Lanqing Hong", + "Zhenguo Li", + "Qiang Xu" + ], + "abstract": "While controllable generative models for images and videos have achieved remarkable success, high-quality models for 3D scenes, particularly in unbounded scenarios like autonomous driving, remain underdeveloped due to high data acquisition costs. In this paper, we introduce MagicDrive3D, a novel pipeline for controllable 3D street scene generation that supports multi-condition control, including BEV maps, 3D objects, and text descriptions. Unlike previous methods that reconstruct before training the generative models, MagicDrive3D first trains a video generation model and then reconstructs from the generated data. This innovative approach enables easily controllable generation and static scene acquisition, resulting in high-quality scene reconstruction. To address the minor errors in generated content, we propose deformable Gaussian splatting with monocular depth initialization and appearance modeling to manage exposure discrepancies across viewpoints. Validated on the nuScenes dataset, MagicDrive3D generates diverse, high-quality 3D driving scenes that support any-view rendering and enhance downstream tasks like BEV segmentation. Our results demonstrate the framework's superior performance, showcasing its potential for autonomous driving simulation and beyond.", + "arxiv_url": "http://arxiv.org/abs/2405.14475v3", + "pdf_url": "http://arxiv.org/pdf/2405.14475v3", + "published_date": "2024-05-23", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing", + "authors": [ + "Teng Xu", + "Jiamin Chen", + "Peng Chen", + "Youjia Zhang", + "Junqing Yu", + "Wei Yang" + ], + "abstract": "Editing objects within a scene is a critical functionality required across a broad spectrum of applications in computer vision and graphics. As 3D Gaussian Splatting (3DGS) emerges as a frontier in scene representation, the effective modification of 3D Gaussian scenes has become increasingly vital. This process entails accurately retrieve the target objects and subsequently performing modifications based on instructions. Though available in pieces, existing techniques mainly embed sparse semantics into Gaussians for retrieval, and rely on an iterative dataset update paradigm for editing, leading to over-smoothing or inconsistency issues. To this end, this paper proposes a systematic approach, namely TIGER, for coherent text-instructed 3D Gaussian retrieval and editing. In contrast to the top-down language grounding approach for 3D Gaussians, we adopt a bottom-up language aggregation strategy to generate a denser language embedded 3D Gaussians that supports open-vocabulary retrieval. To overcome the over-smoothing and inconsistency issues in editing, we propose a Coherent Score Distillation (CSD) that aggregates a 2D image editing diffusion model and a multi-view diffusion model for score distillation, producing multi-view consistent editing with much finer details. In various experiments, we demonstrate that our TIGER is able to accomplish more consistent and realistic edits than prior work.", + "arxiv_url": "http://arxiv.org/abs/2405.14455v2", + "pdf_url": "http://arxiv.org/pdf/2405.14455v2", + "published_date": "2024-05-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RoGs: Large Scale Road Surface Reconstruction with Meshgrid Gaussian", + "authors": [ + "Zhiheng Feng", + "Wenhua Wu", + "Tianchen Deng", + "Hesheng Wang" + ], + "abstract": "Road surface reconstruction plays a crucial role in autonomous driving, which can be used for road lane perception and autolabeling. Recently, mesh-based road surface reconstruction algorithms have shown promising reconstruction results. However, these mesh-based methods suffer from slow speed and poor reconstruction quality. To address these limitations, we propose a novel large-scale road surface reconstruction approach with meshgrid Gaussian, named RoGs. Specifically, we model the road surface by placing Gaussian surfels in the vertices of a uniformly distributed square mesh, where each surfel stores color, semantic, and geometric information. This square mesh-based layout covers the entire road with fewer Gaussian surfels and reduces the overlap between Gaussian surfels during training. In addition, because the road surface has no thickness, 2D Gaussian surfel is more consistent with the physical reality of the road surface than 3D Gaussian sphere. Then, unlike previous initialization methods that rely on point clouds, we introduce a vehicle pose-based initialization method to initialize the height and rotation of the Gaussian surfel. Thanks to this meshgrid Gaussian modeling and pose-based initialization, our method achieves significant speedups while improving reconstruction quality. We obtain excellent results in reconstruction of road surfaces in a variety of challenging real-world scenes.", + "arxiv_url": "http://arxiv.org/abs/2405.14342v3", + "pdf_url": "http://arxiv.org/pdf/2405.14342v3", + "published_date": "2024-05-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "D-MiSo: Editing Dynamic 3D Scenes using Multi-Gaussians Soup", + "authors": [ + "Joanna Waczyńska", + "Piotr Borycki", + "Joanna Kaleta", + "Sławomir Tadeja", + "Przemysław Spurek" + ], + "abstract": "Over the past years, we have observed an abundance of approaches for modeling dynamic 3D scenes using Gaussian Splatting (GS). Such solutions use GS to represent the scene's structure and the neural network to model dynamics. Such approaches allow fast rendering and extracting each element of such a dynamic scene. However, modifying such objects over time is challenging. SC-GS (Sparse Controlled Gaussian Splatting) enhanced with Deformed Control Points partially solves this issue. However, this approach necessitates selecting elements that need to be kept fixed, as well as centroids that should be adjusted throughout editing. Moreover, this task poses additional difficulties regarding the re-productivity of such editing. To address this, we propose Dynamic Multi-Gaussian Soup (D-MiSo), which allows us to model the mesh-inspired representation of dynamic GS. Additionally, we propose a strategy of linking parameterized Gaussian splats, forming a Triangle Soup with the estimated mesh. Consequently, we can separately construct new trajectories for the 3D objects composing the scene. Thus, we can make the scene's dynamic editable over time or while maintaining partial dynamics.", + "arxiv_url": "http://arxiv.org/abs/2405.14276v2", + "pdf_url": "http://arxiv.org/pdf/2405.14276v2", + "published_date": "2024-05-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NeuroGauss4D-PCI: 4D Neural Fields and Gaussian Deformation Fields for Point Cloud Interpolation", + "authors": [ + "Chaokang Jiang", + "Dalong Du", + "Jiuming Liu", + "Siting Zhu", + "Zhenqiang Liu", + "Zhuang Ma", + "Zhujin Liang", + "Jie Zhou" + ], + "abstract": "Point Cloud Interpolation confronts challenges from point sparsity, complex spatiotemporal dynamics, and the difficulty of deriving complete 3D point clouds from sparse temporal information. This paper presents NeuroGauss4D-PCI, which excels at modeling complex non-rigid deformations across varied dynamic scenes. The method begins with an iterative Gaussian cloud soft clustering module, offering structured temporal point cloud representations. The proposed temporal radial basis function Gaussian residual utilizes Gaussian parameter interpolation over time, enabling smooth parameter transitions and capturing temporal residuals of Gaussian distributions. Additionally, a 4D Gaussian deformation field tracks the evolution of these parameters, creating continuous spatiotemporal deformation fields. A 4D neural field transforms low-dimensional spatiotemporal coordinates ($x,y,z,t$) into a high-dimensional latent space. Finally, we adaptively and efficiently fuse the latent features from neural fields and the geometric features from Gaussian deformation fields. NeuroGauss4D-PCI outperforms existing methods in point cloud frame interpolation, delivering leading performance on both object-level (DHB) and large-scale autonomous driving datasets (NL-Drive), with scalability to auto-labeling and point cloud densification tasks. The source code is released at https://github.com/jiangchaokang/NeuroGauss4D-PCI.", + "arxiv_url": "http://arxiv.org/abs/2405.14241v1", + "pdf_url": "http://arxiv.org/pdf/2405.14241v1", + "published_date": "2024-05-23", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/jiangchaokang/NeuroGauss4D-PCI", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DOGS: Distributed-Oriented Gaussian Splatting for Large-Scale 3D Reconstruction Via Gaussian Consensus", + "authors": [ + "Yu Chen", + "Gim Hee Lee" + ], + "abstract": "The recent advances in 3D Gaussian Splatting (3DGS) show promising results on the novel view synthesis (NVS) task. With its superior rendering performance and high-fidelity rendering quality, 3DGS is excelling at its previous NeRF counterparts. The most recent 3DGS method focuses either on improving the instability of rendering efficiency or reducing the model size. On the other hand, the training efficiency of 3DGS on large-scale scenes has not gained much attention. In this work, we propose DoGaussian, a method that trains 3DGS distributedly. Our method first decomposes a scene into K blocks and then introduces the Alternating Direction Method of Multipliers (ADMM) into the training procedure of 3DGS. During training, our DOGS maintains one global 3DGS model on the master node and K local 3DGS models on the slave nodes. The K local 3DGS models are dropped after training and we only query the global 3DGS model during inference. The training time is reduced by scene decomposition, and the training convergence and stability are guaranteed through the consensus on the shared 3D Gaussians. Our method accelerates the training of 3DGS by 6+ times when evaluated on large-scale scenes while concurrently achieving state-of-the-art rendering quality. Our code is publicly available at https://github.com/AIBluefisher/DOGS.", + "arxiv_url": "http://arxiv.org/abs/2405.13943v2", + "pdf_url": "http://arxiv.org/pdf/2405.13943v2", + "published_date": "2024-05-22", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/AIBluefisher/DOGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Monocular Gaussian SLAM with Language Extended Loop Closure", + "authors": [ + "Tian Lan", + "Qinwei Lin", + "Haoqian Wang" + ], + "abstract": "Recently,3DGaussianSplattinghasshowngreatpotentialin visual Simultaneous Localization And Mapping (SLAM). Existing methods have achieved encouraging results on RGB-D SLAM, but studies of the monocular case are still scarce. Moreover, they also fail to correct drift errors due to the lack of loop closure and global optimization. In this paper, we present MG-SLAM, a monocular Gaussian SLAM with a language-extended loop closure module capable of performing drift-corrected tracking and high-fidelity reconstruction while achieving a high-level understanding of the environment. Our key idea is to represent the global map as 3D Gaussian and use it to guide the estimation of the scene geometry, thus mitigating the efforts of missing depth information. Further, an additional language-extended loop closure module which is based on CLIP feature is designed to continually perform global optimization to correct drift errors accumulated as the system runs. Our system shows promising results on multiple challenging datasets in both tracking and mapping and even surpasses some existing RGB-D methods.", + "arxiv_url": "http://arxiv.org/abs/2405.13748v1", + "pdf_url": "http://arxiv.org/pdf/2405.13748v1", + "published_date": "2024-05-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Time Machine: A Real-Time Rendering Methodology for Time-Variant Appearances", + "authors": [ + "Licheng Shen", + "Ho Ngai Chow", + "Lingyun Wang", + "Tong Zhang", + "Mengqiu Wang", + "Yuxing Han" + ], + "abstract": "Recent advancements in neural rendering techniques have significantly enhanced the fidelity of 3D reconstruction. Notably, the emergence of 3D Gaussian Splatting (3DGS) has marked a significant milestone by adopting a discrete scene representation, facilitating efficient training and real-time rendering. Several studies have successfully extended the real-time rendering capability of 3DGS to dynamic scenes. However, a challenge arises when training images are captured under vastly differing weather and lighting conditions. This scenario poses a challenge for 3DGS and its variants in achieving accurate reconstructions. Although NeRF-based methods (NeRF-W, CLNeRF) have shown promise in handling such challenging conditions, their computational demands hinder real-time rendering capabilities. In this paper, we present Gaussian Time Machine (GTM) which models the time-dependent attributes of Gaussian primitives with discrete time embedding vectors decoded by a lightweight Multi-Layer-Perceptron(MLP). By adjusting the opacity of Gaussian primitives, we can reconstruct visibility changes of objects. We further propose a decomposed color model for improved geometric consistency. GTM achieved state-of-the-art rendering fidelity on 3 datasets and is 100 times faster than NeRF-based counterparts in rendering. Moreover, GTM successfully disentangles the appearance changes and renders smooth appearance interpolation.", + "arxiv_url": "http://arxiv.org/abs/2405.13694v1", + "pdf_url": "http://arxiv.org/pdf/2405.13694v1", + "published_date": "2024-05-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "real-time rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-ROR: 3D Gaussian Splatting for Reflective Object Relighting via SDF Priors", + "authors": [ + "Zuo-Liang Zhu", + "Beibei Wang", + "Jian Yang" + ], + "abstract": "3D Gaussian Splatting (3DGS) has shown a powerful capability for novel view synthesis due to its detailed expressive ability and highly efficient rendering speed. Unfortunately, creating relightable 3D assets with 3DGS is still problematic, particularly for reflective objects, as its discontinuous representation raises difficulties in constraining geometries. Inspired by previous works, the signed distance field (SDF) can serve as an effective way for geometry regularization. However, a direct incorporation between Gaussians and SDF significantly slows training. To this end, we propose GS-ROR for reflective objects relighting with 3DGS aided by SDF priors. At the core of our method is the mutual supervision of the depth and normal between deferred Gaussians and SDF, which avoids the expensive volume rendering of SDF. Thanks to this mutual supervision, the learned deferred Gaussians are well-constrained with a minimal time cost. As the Gaussians are rendered in a deferred shading mode, while the alpha-blended Gaussians are smooth, individual Gaussians may still be outliers, yielding floater artifacts. Therefore, we further introduce an SDF-aware pruning strategy to remove Gaussian outliers, which are located distant from the surface defined by SDF, avoiding the floater issue. Consequently, our method outperforms the existing Gaussian-based inverse rendering methods in terms of relighting quality. Our method also exhibits competitive relighting quality compared to NeRF-based methods with at most 25% of training time and allows rendering at 200+ frames per second on an RTX4090.", + "arxiv_url": "http://arxiv.org/abs/2406.18544v2", + "pdf_url": "http://arxiv.org/pdf/2406.18544v2", + "published_date": "2024-05-22", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video", + "authors": [ + "Hongsheng Wang", + "Xiang Cai", + "Xi Sun", + "Jinhong Yue", + "Zhanyun Tang", + "Shengyu Zhang", + "Feng Lin", + "Fei Wu" + ], + "abstract": "Single-view clothed human reconstruction holds a central position in virtual reality applications, especially in contexts involving intricate human motions. It presents notable challenges in achieving realistic clothing deformation. Current methodologies often overlook the influence of motion on surface deformation, resulting in surfaces lacking the constraints imposed by global motion. To overcome these limitations, we introduce an innovative framework, Motion-Based 3D Clo}thed Humans Synthesis (MOSS), which employs kinematic information to achieve motion-aware Gaussian split on the human surface. Our framework consists of two modules: Kinematic Gaussian Locating Splatting (KGAS) and Surface Deformation Detector (UID). KGAS incorporates matrix-Fisher distribution to propagate global motion across the body surface. The density and rotation factors of this distribution explicitly control the Gaussians, thereby enhancing the realism of the reconstructed surface. Additionally, to address local occlusions in single-view, based on KGAS, UID identifies significant surfaces, and geometric reconstruction is performed to compensate for these deformations. Experimental results demonstrate that MOSS achieves state-of-the-art visual quality in 3D clothed human synthesis from monocular videos. Notably, we improve the Human NeRF and the Gaussian Splatting by 33.94% and 16.75% in LPIPS* respectively. Codes are available at https://wanghongsheng01.github.io/MOSS/.", + "arxiv_url": "http://arxiv.org/abs/2405.12806v3", + "pdf_url": "http://arxiv.org/pdf/2405.12806v3", + "published_date": "2024-05-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting", + "authors": [ + "Jia Gong", + "Shenyu Ji", + "Lin Geng Foo", + "Kang Chen", + "Hossein Rahmani", + "Jun Liu" + ], + "abstract": "Creating and customizing a 3D clothed avatar from textual descriptions is a critical and challenging task. Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to freely mix and match garments. In response to this limitation, we present LAyered Gaussian Avatar (LAGA), a carefully designed framework enabling the creation of high-fidelity decomposable avatars with diverse garments. By decoupling garments from avatar, our framework empowers users to conviniently edit avatars at the garment level. Our approach begins by modeling the avatar using a set of Gaussian points organized in a layered structure, where each layer corresponds to a specific garment or the human body itself. To generate high-quality garments for each layer, we introduce a coarse-to-fine strategy for diverse garment generation and a novel dual-SDS loss function to maintain coherence between the generated garments and avatar components, including the human body and other garments. Moreover, we introduce three regularization losses to guide the movement of Gaussians for garment transfer, allowing garments to be freely transferred to various avatars. Extensive experimentation demonstrates that our approach surpasses existing methods in the generation of 3D clothed humans.", + "arxiv_url": "http://arxiv.org/abs/2405.12663v1", + "pdf_url": "http://arxiv.org/pdf/2405.12663v1", + "published_date": "2024-05-21", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery", + "authors": [ + "Hongsheng Wang", + "Weiyue Zhang", + "Sihao Liu", + "Xinrui Zhou", + "Jing Li", + "Zhanyun Tang", + "Shengyu Zhang", + "Fei Wu", + "Feng Lin" + ], + "abstract": "Although 3D Gaussian Splatting (3DGS) has recently made progress in 3D human reconstruction, it primarily relies on 2D pixel-level supervision, overlooking the geometric complexity and topological relationships of different body parts. To address this gap, we introduce the Hierarchical Graph Human Gaussian Control (HUGS) framework for achieving high-fidelity 3D human reconstruction. Our approach involves leveraging explicitly semantic priors of body parts to ensure the consistency of geometric topology, thereby enabling the capture of the complex geometrical and topological associations among body parts. Additionally, we disentangle high-frequency features from global human features to refine surface details in body parts. Extensive experiments demonstrate that our method exhibits superior performance in human body reconstruction, particularly in enhancing surface details and accurately reconstructing body part junctions. Codes are available at https://wanghongsheng01.github.io/HUGS/.", + "arxiv_url": "http://arxiv.org/abs/2405.12477v3", + "pdf_url": "http://arxiv.org/pdf/2405.12477v3", + "published_date": "2024-05-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GarmentDreamer: 3DGS Guided Garment Synthesis with Diverse Geometry and Texture Details", + "authors": [ + "Boqian Li", + "Xuan Li", + "Ying Jiang", + "Tianyi Xie", + "Feng Gao", + "Huamin Wang", + "Yin Yang", + "Chenfanfu Jiang" + ], + "abstract": "Traditional 3D garment creation is labor-intensive, involving sketching, modeling, UV mapping, and texturing, which are time-consuming and costly. Recent advances in diffusion-based generative models have enabled new possibilities for 3D garment generation from text prompts, images, and videos. However, existing methods either suffer from inconsistencies among multi-view images or require additional processes to separate cloth from the underlying human model. In this paper, we propose GarmentDreamer, a novel method that leverages 3D Gaussian Splatting (GS) as guidance to generate wearable, simulation-ready 3D garment meshes from text prompts. In contrast to using multi-view images directly predicted by generative models as guidance, our 3DGS guidance ensures consistent optimization in both garment deformation and texture synthesis. Our method introduces a novel garment augmentation module, guided by normal and RGBA information, and employs implicit Neural Texture Fields (NeTF) combined with Score Distillation Sampling (SDS) to generate diverse geometric and texture details. We validate the effectiveness of our approach through comprehensive qualitative and quantitative experiments, showcasing the superior performance of GarmentDreamer over state-of-the-art alternatives. Our project page is available at: https://xuan-li.github.io/GarmentDreamerDemo/.", + "arxiv_url": "http://arxiv.org/abs/2405.12420v1", + "pdf_url": "http://arxiv.org/pdf/2405.12420v1", + "published_date": "2024-05-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AtomGS: Atomizing Gaussian Splatting for High-Fidelity Radiance Field", + "authors": [ + "Rong Liu", + "Rui Xu", + "Yue Hu", + "Meida Chen", + "Andrew Feng" + ], + "abstract": "3D Gaussian Splatting (3DGS) has recently advanced radiance field reconstruction by offering superior capabilities for novel view synthesis and real-time rendering speed. However, its strategy of blending optimization and adaptive density control might lead to sub-optimal results; it can sometimes yield noisy geometry and blurry artifacts due to prioritizing optimizing large Gaussians at the cost of adequately densifying smaller ones. To address this, we introduce AtomGS, consisting of Atomized Proliferation and Geometry-Guided Optimization. The Atomized Proliferation constrains ellipsoid Gaussians of various sizes into more uniform-sized Atom Gaussians. The strategy enhances the representation of areas with fine features by placing greater emphasis on densification in accordance with scene details. In addition, we proposed a Geometry-Guided Optimization approach that incorporates an Edge-Aware Normal Loss. This optimization method effectively smooths flat surfaces while preserving intricate details. Our evaluation shows that AtomGS outperforms existing state-of-the-art methods in rendering quality. Additionally, it achieves competitive accuracy in geometry reconstruction and offers a significant improvement in training speed over other SDF-based methods. More interactive demos can be found in our website (https://rongliu-leo.github.io/AtomGS/).", + "arxiv_url": "http://arxiv.org/abs/2405.12369v3", + "pdf_url": "http://arxiv.org/pdf/2405.12369v3", + "published_date": "2024-05-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo", + "authors": [ + "Tianqi Liu", + "Guangcong Wang", + "Shoukang Hu", + "Liao Shen", + "Xinyi Ye", + "Yuhang Zang", + "Zhiguo Cao", + "Wei Li", + "Ziwei Liu" + ], + "abstract": "We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.", + "arxiv_url": "http://arxiv.org/abs/2405.12218v3", + "pdf_url": "http://arxiv.org/pdf/2405.12218v3", + "published_date": "2024-05-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Embracing Radiance Field Rendering in 6G: Over-the-Air Training and Inference with 3D Contents", + "authors": [ + "Guanlin Wu", + "Zhonghao Lyu", + "Juyong Zhang", + "Jie Xu" + ], + "abstract": "The efficient representation, transmission, and reconstruction of three-dimensional (3D) contents are becoming increasingly important for sixth-generation (6G) networks that aim to merge virtual and physical worlds for offering immersive communication experiences. Neural radiance field (NeRF) and 3D Gaussian splatting (3D-GS) have recently emerged as two promising 3D representation techniques based on radiance field rendering, which are able to provide photorealistic rendering results for complex scenes. Therefore, embracing NeRF and 3D-GS in 6G networks is envisioned to be a prominent solution to support emerging 3D applications with enhanced quality of experience. This paper provides a comprehensive overview on the integration of NeRF and 3D-GS in 6G. First, we review the basics of the radiance field rendering techniques, and highlight their applications and implementation challenges over wireless networks. Next, we consider the over-the-air training of NeRF and 3D-GS models over wireless networks by presenting various learning techniques. We particularly focus on the federated learning design over a hierarchical device-edge-cloud architecture, which is suitable for exploiting distributed data and computing resources over 6G networks to train large models representing large-scale scenes. Then, we consider the over-the-air rendering of NeRF and 3D-GS models at wireless network edge. We present three practical rendering architectures, namely local, remote, and co-rendering, respectively, and provide model compression approaches to facilitate the transmission of radiance field models for rendering. We also present rendering acceleration approaches and joint computation and communication designs to enhance the rendering efficiency. In a case study, we propose a new semantic communication enabled 3D content transmission design.", + "arxiv_url": "http://arxiv.org/abs/2405.12155v2", + "pdf_url": "http://arxiv.org/pdf/2405.12155v2", + "published_date": "2024-05-20", + "categories": [ + "cs.IT", + "math.IT" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization", + "authors": [ + "Jiawei Zhang", + "Jiahe Li", + "Xiaohan Yu", + "Lei Huang", + "Lin Gu", + "Jin Zheng", + "Xiao Bai" + ], + "abstract": "3D Gaussian Splatting (3DGS) creates a radiance field consisting of 3D Gaussians to represent a scene. With sparse training views, 3DGS easily suffers from overfitting, negatively impacting rendering. This paper introduces a new co-regularization perspective for improving sparse-view 3DGS. When training two 3D Gaussian radiance fields, we observe that the two radiance fields exhibit point disagreement and rendering disagreement that can unsupervisedly predict reconstruction quality, stemming from the randomness of densification implementation. We further quantify the two disagreements and demonstrate the negative correlation between them and accurate reconstruction, which allows us to identify inaccurate reconstruction without accessing ground-truth information. Based on the study, we propose CoR-GS, which identifies and suppresses inaccurate reconstruction based on the two disagreements: (1) Co-pruning considers Gaussians that exhibit high point disagreement in inaccurate positions and prunes them. (2) Pseudo-view co-regularization considers pixels that exhibit high rendering disagreement are inaccurate and suppress the disagreement. Results on LLFF, Mip-NeRF360, DTU, and Blender demonstrate that CoR-GS effectively regularizes the scene geometry, reconstructs the compact representations, and achieves state-of-the-art novel view synthesis quality under sparse training views.", + "arxiv_url": "http://arxiv.org/abs/2405.12110v2", + "pdf_url": "http://arxiv.org/pdf/2405.12110v2", + "published_date": "2024-05-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Head & Shoulders: High Fidelity Neural Upper Body Avatars with Anchor Gaussian Guided Texture Warping", + "authors": [ + "Tianhao Wu", + "Jing Yang", + "Zhilin Guo", + "Jingyi Wan", + "Fangcheng Zhong", + "Cengiz Oztireli" + ], + "abstract": "By equipping the most recent 3D Gaussian Splatting representation with head 3D morphable models (3DMM), existing methods manage to create head avatars with high fidelity. However, most existing methods only reconstruct a head without the body, substantially limiting their application scenarios. We found that naively applying Gaussians to model the clothed chest and shoulders tends to result in blurry reconstruction and noisy floaters under novel poses. This is because of the fundamental limitation of Gaussians and point clouds -- each Gaussian or point can only have a single directional radiance without spatial variance, therefore an unnecessarily large number of them is required to represent complicated spatially varying texture, even for simple geometry. In contrast, we propose to model the body part with a neural texture that consists of coarse and pose-dependent fine colors. To properly render the body texture for each view and pose without accurate geometry nor UV mapping, we optimize another sparse set of Gaussians as anchors that constrain the neural warping field that maps image plane coordinates to the texture space. We demonstrate that Gaussian Head & Shoulders can fit the high-frequency details on the clothed upper body with high fidelity and potentially improve the accuracy and fidelity of the head region. We evaluate our method with casual phone-captured and internet videos and show our method archives superior reconstruction quality and robustness in both self and cross reenactment tasks. To fully utilize the efficient rendering speed of Gaussian splatting, we additionally propose an accelerated inference method of our trained model without Multi-Layer Perceptron (MLP) queries and reach a stable rendering speed of around 130 FPS for any subjects.", + "arxiv_url": "http://arxiv.org/abs/2405.12069v2", + "pdf_url": "http://arxiv.org/pdf/2405.12069v2", + "published_date": "2024-05-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections", + "authors": [ + "Jiayue Liu", + "Xiao Tang", + "Freeman Cheng", + "Roy Yang", + "Zhihao Li", + "Jianzhuang Liu", + "Yi Huang", + "Jiaqi Lin", + "Shiyong Liu", + "Xiaofei Wu", + "Songcen Xu", + "Chun Yuan" + ], + "abstract": "3D Gaussian Splatting showcases notable advancements in photo-realistic and real-time novel view synthesis. However, it faces challenges in modeling mirror reflections, which exhibit substantial appearance variations from different viewpoints. To tackle this problem, we present MirrorGaussian, the first method for mirror scene reconstruction with real-time rendering based on 3D Gaussian Splatting. The key insight is grounded on the mirror symmetry between the real-world space and the virtual mirror space. We introduce an intuitive dual-rendering strategy that enables differentiable rasterization of both the real-world 3D Gaussians and the mirrored counterpart obtained by reflecting the former about the mirror plane. All 3D Gaussians are jointly optimized with the mirror plane in an end-to-end framework. MirrorGaussian achieves high-quality and real-time rendering in scenes with mirrors, empowering scene editing like adding new mirrors and objects. Comprehensive experiments on multiple datasets demonstrate that our approach significantly outperforms existing methods, achieving state-of-the-art results. Project page: https://mirror-gaussian.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2405.11921v1", + "pdf_url": "http://arxiv.org/pdf/2405.11921v1", + "published_date": "2024-05-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching", + "authors": [ + "Xingyu Miao", + "Haoran Duan", + "Varun Ojha", + "Jun Song", + "Tejal Shah", + "Yang Long", + "Rajiv Ranjan" + ], + "abstract": "In this work, we propose a novel Trajectory Score Matching (TSM) method that aims to solve the pseudo ground truth inconsistency problem caused by the accumulated error in Interval Score Matching (ISM) when using the Denoising Diffusion Implicit Models (DDIM) inversion process. Unlike ISM which adopts the inversion process of DDIM to calculate on a single path, our TSM method leverages the inversion process of DDIM to generate two paths from the same starting point for calculation. Since both paths start from the same starting point, TSM can reduce the accumulated error compared to ISM, thus alleviating the problem of pseudo ground truth inconsistency. TSM enhances the stability and consistency of the model's generated paths during the distillation process. We demonstrate this experimentally and further show that ISM is a special case of TSM. Furthermore, to optimize the current multi-stage optimization process from high-resolution text to 3D generation, we adopt Stable Diffusion XL for guidance. In response to the issues of abnormal replication and splitting caused by unstable gradients during the 3D Gaussian splatting process when using Stable Diffusion XL, we propose a pixel-by-pixel gradient clipping method. Extensive experiments show that our model significantly surpasses the state-of-the-art models in terms of visual quality and performance. Code: \\url{https://github.com/xingy038/Dreamer-XL}.", + "arxiv_url": "http://arxiv.org/abs/2405.11252v1", + "pdf_url": "http://arxiv.org/pdf/2405.11252v1", + "published_date": "2024-05-18", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/xingy038/Dreamer-XL", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MotionGS : Compact Gaussian Splatting SLAM by Motion Filter", + "authors": [ + "Xinli Guo", + "Weidong Zhang", + "Ruonan Liu", + "Peng Han", + "Hongtian Chen" + ], + "abstract": "With their high-fidelity scene representation capability, the attention of SLAM field is deeply attracted by the Neural Radiation Field (NeRF) and 3D Gaussian Splatting (3DGS). Recently, there has been a surge in NeRF-based SLAM, while 3DGS-based SLAM is sparse. A novel 3DGS-based SLAM approach with a fusion of deep visual feature, dual keyframe selection and 3DGS is presented in this paper. Compared with the existing methods, the proposed tracking is achieved by feature extraction and motion filter on each frame. The joint optimization of poses and 3D Gaussians runs through the entire mapping process. Additionally, the coarse-to-fine pose estimation and compact Gaussian scene representation are implemented by dual keyframe selection and novel loss functions. Experimental results demonstrate that the proposed algorithm not only outperforms the existing methods in tracking and mapping, but also has less memory usage.", + "arxiv_url": "http://arxiv.org/abs/2405.11129v2", + "pdf_url": "http://arxiv.org/pdf/2405.11129v2", + "published_date": "2024-05-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Enhanced 3D Urban Scene Reconstruction and Point Cloud Densification using Gaussian Splatting and Google Earth Imagery", + "authors": [ + "Kyle Gao", + "Dening Lu", + "Hongjie He", + "Linlin Xu", + "Jonathan Li" + ], + "abstract": "3D urban scene reconstruction and modelling is a crucial research area in remote sensing with numerous applications in academia, commerce, industry, and administration. Recent advancements in view synthesis models have facilitated photorealistic 3D reconstruction solely from 2D images. Leveraging Google Earth imagery, we construct a 3D Gaussian Splatting model of the Waterloo region centered on the University of Waterloo and are able to achieve view-synthesis results far exceeding previous 3D view-synthesis results based on neural radiance fields which we demonstrate in our benchmark. Additionally, we retrieved the 3D geometry of the scene using the 3D point cloud extracted from the 3D Gaussian Splatting model which we benchmarked against our Multi- View-Stereo dense reconstruction of the scene, thereby reconstructing both the 3D geometry and photorealistic lighting of the large-scale urban scene through 3D Gaussian Splatting", + "arxiv_url": "http://arxiv.org/abs/2405.11021v2", + "pdf_url": "http://arxiv.org/pdf/2405.11021v2", + "published_date": "2024-05-17", + "categories": [ + "cs.CV", + "I.4; I.3" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes Generation", + "authors": [ + "Pengzhi Li", + "Chengshuai Tang", + "Qinxuan Huang", + "Zhiheng Li" + ], + "abstract": "In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.", + "arxiv_url": "http://arxiv.org/abs/2405.10508v1", + "pdf_url": "http://arxiv.org/pdf/2405.10508v1", + "published_date": "2024-05-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-Planner: A Gaussian-Splatting-based Planning Framework for Active High-Fidelity Reconstruction", + "authors": [ + "Rui Jin", + "Yuman Gao", + "Yingjian Wang", + "Haojian Lu", + "Fei Gao" + ], + "abstract": "Active reconstruction technique enables robots to autonomously collect scene data for full coverage, relieving users from tedious and time-consuming data capturing process. However, designed based on unsuitable scene representations, existing methods show unrealistic reconstruction results or the inability of online quality evaluation. Due to the recent advancements in explicit radiance field technology, online active high-fidelity reconstruction has become achievable. In this paper, we propose GS-Planner, a planning framework for active high-fidelity reconstruction using 3D Gaussian Splatting. With improvement on 3DGS to recognize unobserved regions, we evaluate the reconstruction quality and completeness of 3DGS map online to guide the robot. Then we design a sampling-based active reconstruction strategy to explore the unobserved areas and improve the reconstruction geometric and textural quality. To establish a complete robot active reconstruction system, we choose quadrotor as the robotic platform for its high agility. Then we devise a safety constraint with 3DGS to generate executable trajectories for quadrotor navigation in the 3DGS map. To validate the effectiveness of our method, we conduct extensive experiments and ablation studies in highly realistic simulation scenes.", + "arxiv_url": "http://arxiv.org/abs/2405.10142v2", + "pdf_url": "http://arxiv.org/pdf/2405.10142v2", + "published_date": "2024-05-16", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "From NeRFs to Gaussian Splats, and Back", + "authors": [ + "Siming He", + "Zach Osman", + "Pratik Chaudhari" + ], + "abstract": "For robotics applications where there is a limited number of (typically ego-centric) views, parametric representations such as neural radiance fields (NeRFs) generalize better than non-parametric ones such as Gaussian splatting (GS) to views that are very different from those in the training data; GS however can render much faster than NeRFs. We develop a procedure to convert back and forth between the two. Our approach achieves the best of both NeRFs (superior PSNR, SSIM, and LPIPS on dissimilar views, and a compact representation) and GS (real-time rendering and ability for easily modifying the representation); the computational cost of these conversions is minor compared to training the two from scratch.", + "arxiv_url": "http://arxiv.org/abs/2405.09717v3", + "pdf_url": "http://arxiv.org/pdf/2405.09717v3", + "published_date": "2024-05-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting", + "authors": [ + "Haodong Chen", + "Yongle Huang", + "Haojian Huang", + "Xiangsheng Ge", + "Dian Shao" + ], + "abstract": "The increasing prominence of e-commerce has underscored the importance of Virtual Try-On (VTON). However, previous studies predominantly focus on the 2D realm and rely heavily on extensive data for training. Research on 3D VTON primarily centers on garment-body shape compatibility, a topic extensively covered in 2D VTON. Thanks to advances in 3D scene editing, a 2D diffusion model has now been adapted for 3D editing via multi-viewpoint editing. In this work, we propose GaussianVTON, an innovative 3D VTON pipeline integrating Gaussian Splatting (GS) editing with 2D VTON. To facilitate a seamless transition from 2D to 3D VTON, we propose, for the first time, the use of only images as editing prompts for 3D editing. To further address issues, e.g., face blurring, garment inaccuracy, and degraded viewpoint quality during editing, we devise a three-stage refinement strategy to gradually mitigate potential issues. Furthermore, we introduce a new editing strategy termed Edit Recall Reconstruction (ERR) to tackle the limitations of previous editing strategies in leading to complex geometric changes. Our comprehensive experiments demonstrate the superiority of GaussianVTON, offering a novel perspective on 3D VTON while also establishing a novel starting point for image-prompting 3D scene editing.", + "arxiv_url": "http://arxiv.org/abs/2405.07472v2", + "pdf_url": "http://arxiv.org/pdf/2405.07472v2", + "published_date": "2024-05-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LayGA: Layered Gaussian Avatars for Animatable Clothing Transfer", + "authors": [ + "Siyou Lin", + "Zhe Li", + "Zhaoqi Su", + "Zerong Zheng", + "Hongwen Zhang", + "Yebin Liu" + ], + "abstract": "Animatable clothing transfer, aiming at dressing and animating garments across characters, is a challenging problem. Most human avatar works entangle the representations of the human body and clothing together, which leads to difficulties for virtual try-on across identities. What's worse, the entangled representations usually fail to exactly track the sliding motion of garments. To overcome these limitations, we present Layered Gaussian Avatars (LayGA), a new representation that formulates body and clothing as two separate layers for photorealistic animatable clothing transfer from multi-view videos. Our representation is built upon the Gaussian map-based avatar for its excellent representation power of garment details. However, the Gaussian map produces unstructured 3D Gaussians distributed around the actual surface. The absence of a smooth explicit surface raises challenges in accurate garment tracking and collision handling between body and garments. Therefore, we propose two-stage training involving single-layer reconstruction and multi-layer fitting. In the single-layer reconstruction stage, we propose a series of geometric constraints to reconstruct smooth surfaces and simultaneously obtain the segmentation between body and clothing. Next, in the multi-layer fitting stage, we train two separate models to represent body and clothing and utilize the reconstructed clothing geometries as 3D supervision for more accurate garment tracking. Furthermore, we propose geometry and rendering layers for both high-quality geometric reconstruction and high-fidelity rendering. Overall, the proposed LayGA realizes photorealistic animations and virtual try-on, and outperforms other baseline methods. Our project page is https://jsnln.github.io/layga/index.html.", + "arxiv_url": "http://arxiv.org/abs/2405.07319v1", + "pdf_url": "http://arxiv.org/pdf/2405.07319v1", + "published_date": "2024-05-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Direct Learning of Mesh and Appearance via 3D Gaussian Splatting", + "authors": [ + "Ancheng Lin", + "Jun Li" + ], + "abstract": "Accurately reconstructing a 3D scene including explicit geometry information is both attractive and challenging. Geometry reconstruction can benefit from incorporating differentiable appearance models, such as Neural Radiance Fields and 3D Gaussian Splatting (3DGS). However, existing methods encounter efficiency issues due to indirect geometry learning and the paradigm of separately modeling geometry and surface appearance. In this work, we propose a learnable scene model that incorporates 3DGS with an explicit geometry representation, namely a mesh. Our model learns the mesh and appearance in an end-to-end manner, where we bind 3D Gaussians to the mesh faces and perform differentiable rendering of 3DGS to obtain photometric supervision. The model creates an effective information pathway to supervise the learning of both 3DGS and mesh. Experimental results demonstrate that the learned scene model not only achieves state-of-the-art efficiency and rendering quality but also supports manipulation using the explicit mesh. In addition, our model has a unique advantage in adapting to scene updates, thanks to the end-to-end learning of both mesh and appearance.", + "arxiv_url": "http://arxiv.org/abs/2405.06945v2", + "pdf_url": "http://arxiv.org/pdf/2405.06945v2", + "published_date": "2024-05-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OneTo3D: One Image to Re-editable Dynamic 3D Model and Video Generation", + "authors": [ + "Jinwei Lin" + ], + "abstract": "One image to editable dynamic 3D model and video generation is novel direction and change in the research area of single image to 3D representation or 3D reconstruction of image. Gaussian Splatting has demonstrated its advantages in implicit 3D reconstruction, compared with the original Neural Radiance Fields. As the rapid development of technologies and principles, people tried to used the Stable Diffusion models to generate targeted models with text instructions. However, using the normal implicit machine learning methods is hard to gain the precise motions and actions control, further more, it is difficult to generate a long content and semantic continuous 3D video. To address this issue, we propose the OneTo3D, a method and theory to used one single image to generate the editable 3D model and generate the targeted semantic continuous time-unlimited 3D video. We used a normal basic Gaussian Splatting model to generate the 3D model from a single image, which requires less volume of video memory and computer calculation ability. Subsequently, we designed an automatic generation and self-adaptive binding mechanism for the object armature. Combined with the re-editable motions and actions analyzing and controlling algorithm we proposed, we can achieve a better performance than the SOTA projects in the area of building the 3D model precise motions and actions control, and generating a stable semantic continuous time-unlimited 3D video with the input text instructions. Here we will analyze the detailed implementation methods and theories analyses. Relative comparisons and conclusions will be presented. The project code is open source.", + "arxiv_url": "http://arxiv.org/abs/2405.06547v1", + "pdf_url": "http://arxiv.org/pdf/2405.06547v1", + "published_date": "2024-05-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "I3DGS: Improve 3D Gaussian Splatting from Multiple Dimensions", + "authors": [ + "Jinwei Lin" + ], + "abstract": "3D Gaussian Splatting is a novel method for 3D view synthesis, which can gain an implicit neural learning rendering result than the traditional neural rendering technology but keep the more high-definition fast rendering speed. But it is still difficult to achieve a fast enough efficiency on 3D Gaussian Splatting for the practical applications. To Address this issue, we propose the I3DS, a synthetic model performance improvement evaluation solution and experiments test. From multiple and important levels or dimensions of the original 3D Gaussian Splatting, we made more than two thousand various kinds of experiments to test how the selected different items and components can make an impact on the training efficiency of the 3D Gaussian Splatting model. In this paper, we will share abundant and meaningful experiences and methods about how to improve the training, performance and the impacts caused by different items of the model. A special but normal Integer compression in base 95 and a floating-point compression in base 94 with ASCII encoding and decoding mechanism is presented. Many real and effective experiments and test results or phenomena will be recorded. After a series of reasonable fine-tuning, I3DS can gain excellent performance improvements than the previous one. The project code is available as open source.", + "arxiv_url": "http://arxiv.org/abs/2405.06408v1", + "pdf_url": "http://arxiv.org/pdf/2405.06408v1", + "published_date": "2024-05-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization", + "authors": [ + "Pengcheng Zhu", + "Yaoming Zhuang", + "Baoquan Chen", + "Li Li", + "Chengdong Wu", + "Zhanlin Liu" + ], + "abstract": "This letter introduces a novel framework for dense Visual Simultaneous Localization and Mapping (VSLAM) based on Gaussian Splatting. Recently, SLAM based on Gaussian Splatting has shown promising results. However, in monocular scenarios, the Gaussian maps reconstructed lack geometric accuracy and exhibit weaker tracking capability. To address these limitations, we jointly optimize sparse visual odometry tracking and 3D Gaussian Splatting scene representation for the first time. We obtain depth maps on visual odometry keyframe windows using a fast Multi-View Stereo (MVS) network for the geometric supervision of Gaussian maps. Furthermore, we propose a depth smooth loss and Sparse-Dense Adjustment Ring (SDAR) to reduce the negative effect of estimated depth maps and preserve the consistency in scale between the visual odometry and Gaussian maps. We have evaluated our system across various synthetic and real-world datasets. The accuracy of our pose estimation surpasses existing methods and achieves state-of-the-art. Additionally, it outperforms previous monocular methods in terms of novel view synthesis and geometric reconstruction fidelities.", + "arxiv_url": "http://arxiv.org/abs/2405.06241v2", + "pdf_url": "http://arxiv.org/pdf/2405.06241v2", + "published_date": "2024-05-10", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DragGaussian: Enabling Drag-style Manipulation on 3D Gaussian Representation", + "authors": [ + "Sitian Shen", + "Jing Xu", + "Yuheng Yuan", + "Xingyi Yang", + "Qiuhong Shen", + "Xinchao Wang" + ], + "abstract": "User-friendly 3D object editing is a challenging task that has attracted significant attention recently. The limitations of direct 3D object editing without 2D prior knowledge have prompted increased attention towards utilizing 2D generative models for 3D editing. While existing methods like Instruct NeRF-to-NeRF offer a solution, they often lack user-friendliness, particularly due to semantic guided editing. In the realm of 3D representation, 3D Gaussian Splatting emerges as a promising approach for its efficiency and natural explicit property, facilitating precise editing tasks. Building upon these insights, we propose DragGaussian, a 3D object drag-editing framework based on 3D Gaussian Splatting, leveraging diffusion models for interactive image editing with open-vocabulary input. This framework enables users to perform drag-based editing on pre-trained 3D Gaussian object models, producing modified 2D images through multi-view consistent editing. Our contributions include the introduction of a new task, the development of DragGaussian for interactive point-based 3D editing, and comprehensive validation of its effectiveness through qualitative and quantitative experiments.", + "arxiv_url": "http://arxiv.org/abs/2405.05800v1", + "pdf_url": "http://arxiv.org/pdf/2405.05800v1", + "published_date": "2024-05-09", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FastScene: Text-Driven Fast 3D Indoor Scene Generation via Panoramic Gaussian Splatting", + "authors": [ + "Yikun Ma", + "Dandan Zhan", + "Zhi Jin" + ], + "abstract": "Text-driven 3D indoor scene generation holds broad applications, ranging from gaming and smart homes to AR/VR applications. Fast and high-fidelity scene generation is paramount for ensuring user-friendly experiences. However, existing methods are characterized by lengthy generation processes or necessitate the intricate manual specification of motion parameters, which introduces inconvenience for users. Furthermore, these methods often rely on narrow-field viewpoint iterative generations, compromising global consistency and overall scene quality. To address these issues, we propose FastScene, a framework for fast and higher-quality 3D scene generation, while maintaining the scene consistency. Specifically, given a text prompt, we generate a panorama and estimate its depth, since the panorama encompasses information about the entire scene and exhibits explicit geometric constraints. To obtain high-quality novel views, we introduce the Coarse View Synthesis (CVS) and Progressive Novel View Inpainting (PNVI) strategies, ensuring both scene consistency and view quality. Subsequently, we utilize Multi-View Projection (MVP) to form perspective views, and apply 3D Gaussian Splatting (3DGS) for scene reconstruction. Comprehensive experiments demonstrate FastScene surpasses other methods in both generation speed and quality with better scene consistency. Notably, guided only by a text prompt, FastScene can generate a 3D scene within a mere 15 minutes, which is at least one hour faster than state-of-the-art methods, making it a paradigm for user-friendly scene generation.", + "arxiv_url": "http://arxiv.org/abs/2405.05768v1", + "pdf_url": "http://arxiv.org/pdf/2405.05768v1", + "published_date": "2024-05-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NGM-SLAM: Gaussian Splatting SLAM with Radiance Field Submap", + "authors": [ + "Mingrui Li", + "Jingwei Huang", + "Lei Sun", + "Aaron Xuxiang Tian", + "Tianchen Deng", + "Hongyu Wang" + ], + "abstract": "SLAM systems based on Gaussian Splatting have garnered attention due to their capabilities for rapid real-time rendering and high-fidelity mapping. However, current Gaussian Splatting SLAM systems usually struggle with large scene representation and lack effective loop closure detection. To address these issues, we introduce NGM-SLAM, the first 3DGS based SLAM system that utilizes neural radiance field submaps for progressive scene expression, effectively integrating the strengths of neural radiance fields and 3D Gaussian Splatting. We utilize neural radiance field submaps as supervision and achieve high-quality scene expression and online loop closure adjustments through Gaussian rendering of fused submaps. Our results on multiple real-world scenes and large-scale scene datasets demonstrate that our method can achieve accurate hole filling and high-quality scene expression, supporting monocular, stereo, and RGB-D inputs, and achieving state-of-the-art scene reconstruction and tracking performance.", + "arxiv_url": "http://arxiv.org/abs/2405.05702v6", + "pdf_url": "http://arxiv.org/pdf/2405.05702v6", + "published_date": "2024-05-09", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview", + "authors": [ + "Yuhang Ming", + "Xingrui Yang", + "Weihan Wang", + "Zheng Chen", + "Jinglun Feng", + "Yifan Xing", + "Guofeng Zhang" + ], + "abstract": "Neural Radiance Fields (NeRF) have emerged as a powerful paradigm for 3D scene representation, offering high-fidelity renderings and reconstructions from a set of sparse and unstructured sensor data. In the context of autonomous robotics, where perception and understanding of the environment are pivotal, NeRF holds immense promise for improving performance. In this paper, we present a comprehensive survey and analysis of the state-of-the-art techniques for utilizing NeRF to enhance the capabilities of autonomous robots. We especially focus on the perception, localization and navigation, and decision-making modules of autonomous robots and delve into tasks crucial for autonomous operation, including 3D reconstruction, segmentation, pose estimation, simultaneous localization and mapping (SLAM), navigation and planning, and interaction. Our survey meticulously benchmarks existing NeRF-based methods, providing insights into their strengths and limitations. Moreover, we explore promising avenues for future research and development in this domain. Notably, we discuss the integration of advanced techniques such as 3D Gaussian splatting (3DGS), large language models (LLM), and generative AIs, envisioning enhanced reconstruction efficiency, scene understanding, decision-making capabilities. This survey serves as a roadmap for researchers seeking to leverage NeRFs to empower autonomous robots, paving the way for innovative solutions that can navigate and interact seamlessly in complex environments.", + "arxiv_url": "http://arxiv.org/abs/2405.05526v2", + "pdf_url": "http://arxiv.org/pdf/2405.05526v2", + "published_date": "2024-05-09", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GDGS: Gradient Domain Gaussian Splatting for Sparse Representation of Radiance Fields", + "authors": [ + "Yuanhao Gong" + ], + "abstract": "The 3D Gaussian splatting methods are getting popular. However, they work directly on the signal, leading to a dense representation of the signal. Even with some techniques such as pruning or distillation, the results are still dense. In this paper, we propose to model the gradient of the original signal. The gradients are much sparser than the original signal. Therefore, the gradients use much less Gaussian splats, leading to the more efficient storage and thus higher computational performance during both training and rendering. Thanks to the sparsity, during the view synthesis, only a small mount of pixels are needed, leading to much higher computational performance ($100\\sim 1000\\times$ faster). And the 2D image can be recovered from the gradients via solving a Poisson equation with linear computation complexity. Several experiments are performed to confirm the sparseness of the gradients and the computation performance of the proposed method. The method can be applied various applications, such as human body modeling and indoor environment modeling.", + "arxiv_url": "http://arxiv.org/abs/2405.05446v1", + "pdf_url": "http://arxiv.org/pdf/2405.05446v1", + "published_date": "2024-05-08", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "cs.LG", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting", + "authors": [ + "Ola Shorinwa", + "Johnathan Tucker", + "Aliyah Smith", + "Aiden Swann", + "Timothy Chen", + "Roya Firoozi", + "Monroe Kennedy III", + "Mac Schwager" + ], + "abstract": "We present Splat-MOVER, a modular robotics stack for open-vocabulary robotic manipulation, which leverages the editability of Gaussian Splatting (GSplat) scene representations to enable multi-stage manipulation tasks. Splat-MOVER consists of: (i) ASK-Splat, a GSplat representation that distills semantic and grasp affordance features into the 3D scene. ASK-Splat enables geometric, semantic, and affordance understanding of 3D scenes, which is critical in many robotics tasks; (ii) SEE-Splat, a real-time scene-editing module using 3D semantic masking and infilling to visualize the motions of objects that result from robot interactions in the real-world. SEE-Splat creates a \"digital twin\" of the evolving environment throughout the manipulation task; and (iii) Grasp-Splat, a grasp generation module that uses ASK-Splat and SEE-Splat to propose affordance-aligned candidate grasps for open-world objects. ASK-Splat is trained in real-time from RGB images in a brief scanning phase prior to operation, while SEE-Splat and Grasp-Splat run in real-time during operation. We demonstrate the superior performance of Splat-MOVER in hardware experiments on a Kinova robot compared to two recent baselines in four single-stage, open-vocabulary manipulation tasks and in four multi-stage manipulation tasks, using the edited scene to reflect changes due to prior manipulation stages, which is not possible with existing baselines. Video demonstrations and the code for the project are available at https://splatmover.github.io.", + "arxiv_url": "http://arxiv.org/abs/2405.04378v4", + "pdf_url": "http://arxiv.org/pdf/2405.04378v4", + "published_date": "2024-05-07", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose", + "authors": [ + "Kaiwen Jiang", + "Yang Fu", + "Mukund Varma T", + "Yash Belhe", + "Xiaolong Wang", + "Hao Su", + "Ravi Ramamoorthi" + ], + "abstract": "Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation. In this paper, we leverage the recent 3D Gaussian splatting method to develop a novel construct-and-optimize method for sparse view synthesis without camera poses. Specifically, we construct a solution progressively by using monocular depth and projecting pixels back into the 3D world. During construction, we optimize the solution by detecting 2D correspondences between training views and the corresponding rendered images. We develop a unified differentiable pipeline for camera registration and adjustment of both camera poses and depths, followed by back-projection. We also introduce a novel notion of an expected surface in Gaussian splatting, which is critical to our optimization. These steps enable a coarse solution, which can then be low-pass filtered and refined using standard optimization methods. We demonstrate results on the Tanks and Temples and Static Hikes datasets with as few as three widely-spaced views, showing significantly better quality than competing methods, including those with approximate camera pose information. Moreover, our results improve with more views and outperform previous InstantNGP and Gaussian Splatting algorithms even when using half the dataset. Project page: https://raymondjiangkw.github.io/cogs.github.io/", + "arxiv_url": "http://arxiv.org/abs/2405.03659v2", + "pdf_url": "http://arxiv.org/pdf/2405.03659v2", + "published_date": "2024-05-06", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review", + "authors": [ + "Anurag Dalal", + "Daniel Hagen", + "Kjell G. Robbersmyr", + "Kristian Muri Knausgård" + ], + "abstract": "Image-based 3D reconstruction is a challenging task that involves inferring the 3D shape of an object or scene from a set of input images. Learning-based methods have gained attention for their ability to directly estimate 3D shapes. This review paper focuses on state-of-the-art techniques for 3D reconstruction, including the generation of novel, unseen views. An overview of recent developments in the Gaussian Splatting method is provided, covering input types, model structures, output representations, and training strategies. Unresolved challenges and future directions are also discussed. Given the rapid progress in this domain and the numerous opportunities for enhancing 3D reconstruction methods, a comprehensive examination of algorithms appears essential. Consequently, this study offers a thorough overview of the latest advancements in Gaussian Splatting.", + "arxiv_url": "http://arxiv.org/abs/2405.03417v1", + "pdf_url": "http://arxiv.org/pdf/2405.03417v1", + "published_date": "2024-05-06", + "categories": [ + "cs.CV", + "cs.GR", + "I.2.10; I.3.6; I.3.7; I.3.8; I.4.5; I.4.8; I.4.10" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HoloGS: Instant Depth-based 3D Gaussian Splatting with Microsoft HoloLens 2", + "authors": [ + "Miriam Jäger", + "Theodor Kapler", + "Michael Feßenbecker", + "Felix Birkelbach", + "Markus Hillemann", + "Boris Jutzi" + ], + "abstract": "In the fields of photogrammetry, computer vision and computer graphics, the task of neural 3D scene reconstruction has led to the exploration of various techniques. Among these, 3D Gaussian Splatting stands out for its explicit representation of scenes using 3D Gaussians, making it appealing for tasks like 3D point cloud extraction and surface reconstruction. Motivated by its potential, we address the domain of 3D scene reconstruction, aiming to leverage the capabilities of the Microsoft HoloLens 2 for instant 3D Gaussian Splatting. We present HoloGS, a novel workflow utilizing HoloLens sensor data, which bypasses the need for pre-processing steps like Structure from Motion by instantly accessing the required input data i.e. the images, camera poses and the point cloud from depth sensing. We provide comprehensive investigations, including the training process and the rendering quality, assessed through the Peak Signal-to-Noise Ratio, and the geometric 3D accuracy of the densified point cloud from Gaussian centers, measured by Chamfer Distance. We evaluate our approach on two self-captured scenes: An outdoor scene of a cultural heritage statue and an indoor scene of a fine-structured plant. Our results show that the HoloLens data, including RGB images, corresponding camera poses, and depth sensing based point clouds to initialize the Gaussians, are suitable as input for 3D Gaussian Splatting.", + "arxiv_url": "http://arxiv.org/abs/2405.02005v1", + "pdf_url": "http://arxiv.org/pdf/2405.02005v1", + "published_date": "2024-05-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SimEndoGS: Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians", + "authors": [ + "Zhenya Yang", + "Kai Chen", + "Yonghao Long", + "Qi Dou" + ], + "abstract": "Surgical scene simulation plays a crucial role in surgical education and simulator-based robot learning. Traditional approaches for creating these environments with surgical scene involve a labor-intensive process where designers hand-craft tissues models with textures and geometries for soft body simulations. This manual approach is not only time-consuming but also limited in the scalability and realism. In contrast, data-driven simulation offers a compelling alternative. It has the potential to automatically reconstruct 3D surgical scenes from real-world surgical video data, followed by the application of soft body physics. This area, however, is relatively uncharted. In our research, we introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video. To prevent over-fitting and ensure the geometrical correctness of these scenes, we incorporate depth supervision and anisotropy regularization into the Gaussian learning process. Furthermore, we apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations. Our method was evaluated on our collected in-house and public surgical videos datasets. Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene-and produce both visually and physically plausible deformations at a speed approaching real-time. The results demonstrate great potential of our proposed method to enhance the efficiency and variety of simulations available for surgical education and robot learning.", + "arxiv_url": "http://arxiv.org/abs/2405.00956v3", + "pdf_url": "http://arxiv.org/pdf/2405.00956v3", + "published_date": "2024-05-02", + "categories": [ + "cs.RO", + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Spectrally Pruned Gaussian Fields with Neural Compensation", + "authors": [ + "Runyi Yang", + "Zhenxin Zhu", + "Zhou Jiang", + "Baijun Ye", + "Xiaoxue Chen", + "Yifei Zhang", + "Yuantao Chen", + "Jian Zhao", + "Hao Zhao" + ], + "abstract": "Recently, 3D Gaussian Splatting, as a novel 3D representation, has garnered attention for its fast rendering speed and high rendering quality. However, this comes with high memory consumption, e.g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. We credit this high memory footprint to the lack of consideration for the relationship between primitives. In this paper, we propose a memory-efficient Gaussian field named SUNDAE with spectral pruning and neural compensation. On one hand, we construct a graph on the set of Gaussian primitives to model their relationship and design a spectral down-sampling module to prune out primitives while preserving desired signals. On the other hand, to compensate for the quality loss of pruning Gaussians, we exploit a lightweight neural network head to mix splatted features, which effectively compensates for quality losses while capturing the relationship between primitives in its weights. We demonstrate the performance of SUNDAE with extensive results. For example, SUNDAE can achieve 26.80 PSNR at 145 FPS using 104 MB memory while the vanilla Gaussian splatting algorithm achieves 25.60 PSNR at 160 FPS using 523 MB memory, on the Mip-NeRF360 dataset. Codes are publicly available at https://runyiyang.github.io/projects/SUNDAE/.", + "arxiv_url": "http://arxiv.org/abs/2405.00676v1", + "pdf_url": "http://arxiv.org/pdf/2405.00676v1", + "published_date": "2024-05-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting", + "authors": [ + "Zhexi Peng", + "Tianjia Shao", + "Yong Liu", + "Jingke Zhou", + "Yin Yang", + "Jingdong Wang", + "Kun Zhou" + ], + "abstract": "We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy.", + "arxiv_url": "http://arxiv.org/abs/2404.19706v3", + "pdf_url": "http://arxiv.org/pdf/2404.19706v3", + "published_date": "2024-04-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting", + "authors": [ + "Kai Zhang", + "Sai Bi", + "Hao Tan", + "Yuanbo Xiangli", + "Nanxuan Zhao", + "Kalyan Sunkavalli", + "Zexiang Xu" + ], + "abstract": "We propose GS-LRM, a scalable large reconstruction model that can predict high-quality 3D Gaussian primitives from 2-4 posed sparse images in 0.23 seconds on single A100 GPU. Our model features a very simple transformer-based architecture; we patchify input posed images, pass the concatenated multi-view image tokens through a sequence of transformer blocks, and decode final per-pixel Gaussian parameters directly from these tokens for differentiable rendering. In contrast to previous LRMs that can only reconstruct objects, by predicting per-pixel Gaussians, GS-LRM naturally handles scenes with large variations in scale and complexity. We show that our model can work on both object and scene captures by training it on Objaverse and RealEstate10K respectively. In both scenarios, the models outperform state-of-the-art baselines by a wide margin. We also demonstrate applications of our model in downstream 3D generation tasks. Our project webpage is available at: https://sai-bi.github.io/project/gs-lrm/ .", + "arxiv_url": "http://arxiv.org/abs/2404.19702v1", + "pdf_url": "http://arxiv.org/pdf/2404.19702v1", + "published_date": "2024-04-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MicroDreamer: Efficient 3D Generation in $\\sim$20 Seconds by Score-based Iterative Reconstruction", + "authors": [ + "Luxi Chen", + "Zhengyi Wang", + "Zihan Zhou", + "Tingting Gao", + "Hang Su", + "Jun Zhu", + "Chongxuan Li" + ], + "abstract": "Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample and the limitation of optimization confined to latent space. This paper introduces score-based iterative reconstruction (SIR), an efficient and general algorithm mimicking a differentiable 3D reconstruction process to reduce the NFEs and enable optimization in pixel space. Given a single set of images sampled from a multi-view score-based diffusion model, SIR repeatedly optimizes 3D parameters, unlike the single-step optimization in SDS. With other improvements in training, we present an efficient approach called MicroDreamer that generally applies to various 3D representations and 3D generation tasks. In particular, MicroDreamer is 5-20 times faster than SDS in generating neural radiance field while retaining a comparable performance and takes about 20 seconds to create meshes from 3D Gaussian splatting on a single A100 GPU, halving the time of the fastest optimization-based baseline DreamGaussian with significantly superior performance compared to the measurement standard deviation. Our code is available at https://github.com/ML-GSAI/MicroDreamer.", + "arxiv_url": "http://arxiv.org/abs/2404.19525v3", + "pdf_url": "http://arxiv.org/pdf/2404.19525v3", + "published_date": "2024-04-30", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/ML-GSAI/MicroDreamer", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Blendshapes for Head Avatar Animation", + "authors": [ + "Shengjie Ma", + "Yanlin Weng", + "Tianjia Shao", + "Kun Zhou" + ], + "abstract": "We introduce 3D Gaussian blendshapes for modeling photorealistic head avatars. Taking a monocular video as input, we learn a base head model of neutral expression, along with a group of expression blendshapes, each of which corresponds to a basis expression in classical parametric face models. Both the neutral model and expression blendshapes are represented as 3D Gaussians, which contain a few properties to depict the avatar appearance. The avatar model of an arbitrary expression can be effectively generated by combining the neutral model and expression blendshapes through linear blending of Gaussians with the expression coefficients. High-fidelity head avatar animations can be synthesized in real time using Gaussian splatting. Compared to state-of-the-art methods, our Gaussian blendshape representation better captures high-frequency details exhibited in input video, and achieves superior rendering performance.", + "arxiv_url": "http://arxiv.org/abs/2404.19398v2", + "pdf_url": "http://arxiv.org/pdf/2404.19398v2", + "published_date": "2024-04-30", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SAGS: Structure-Aware 3D Gaussian Splatting", + "authors": [ + "Evangelos Ververas", + "Rolandos Alexandros Potamias", + "Jifei Song", + "Jiankang Deng", + "Stefanos Zafeiriou" + ], + "abstract": "Following the advent of NeRFs, 3D Gaussian Splatting (3D-GS) has paved the way to real-time neural rendering overcoming the computational burden of volumetric methods. Following the pioneering work of 3D-GS, several methods have attempted to achieve compressible and high-fidelity performance alternatives. However, by employing a geometry-agnostic optimization scheme, these methods neglect the inherent 3D structure of the scene, thereby restricting the expressivity and the quality of the representation, resulting in various floating points and artifacts. In this work, we propose a structure-aware Gaussian Splatting method (SAGS) that implicitly encodes the geometry of the scene, which reflects to state-of-the-art rendering performance and reduced storage requirements on benchmark novel-view synthesis datasets. SAGS is founded on a local-global graph representation that facilitates the learning of complex scenes and enforces meaningful point displacements that preserve the scene's geometry. Additionally, we introduce a lightweight version of SAGS, using a simple yet effective mid-point interpolation scheme, which showcases a compact representation of the scene with up to 24$\\times$ size reduction without the reliance on any compression strategies. Extensive experiments across multiple benchmark datasets demonstrate the superiority of SAGS compared to state-of-the-art 3D-GS methods under both rendering quality and model size. Besides, we demonstrate that our structure-aware method can effectively mitigate floating artifacts and irregular distortions of previous methods while obtaining precise depth maps. Project page https://eververas.github.io/SAGS/.", + "arxiv_url": "http://arxiv.org/abs/2404.19149v1", + "pdf_url": "http://arxiv.org/pdf/2404.19149v1", + "published_date": "2024-04-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSTalker: Real-time Audio-Driven Talking Face Generation via Deformable Gaussian Splatting", + "authors": [ + "Bo Chen", + "Shoukang Hu", + "Qi Chen", + "Chenpeng Du", + "Ran Yi", + "Yanmin Qian", + "Xie Chen" + ], + "abstract": "We present GStalker, a 3D audio-driven talking face generation model with Gaussian Splatting for both fast training (40 minutes) and real-time rendering (125 FPS) with a 3$\\sim$5 minute video for training material, in comparison with previous 2D and 3D NeRF-based modeling frameworks which require hours of training and seconds of rendering per frame. Specifically, GSTalker learns an audio-driven Gaussian deformation field to translate and transform 3D Gaussians to synchronize with audio information, in which multi-resolution hashing grid-based tri-plane and temporal smooth module are incorporated to learn accurate deformation for fine-grained facial details. In addition, a pose-conditioned deformation field is designed to model the stabilized torso. To enable efficient optimization of the condition Gaussian deformation field, we initialize 3D Gaussians by learning a coarse static Gaussian representation. Extensive experiments in person-specific videos with audio tracks validate that GSTalker can generate high-fidelity and audio-lips synchronized results with fast training and real-time rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2404.19040v1", + "pdf_url": "http://arxiv.org/pdf/2404.19040v1", + "published_date": "2024-04-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing", + "authors": [ + "Cong Wang", + "Di Kang", + "He-Yi Sun", + "Shen-Han Qian", + "Zi-Xuan Wang", + "Linchao Bao", + "Song-Hai Zhang" + ], + "abstract": "Creating high-fidelity head avatars from multi-view videos is a core issue for many AR/VR applications. However, existing methods usually struggle to obtain high-quality renderings for all different head components simultaneously since they use one single representation to model components with drastically different characteristics (e.g., skin vs. hair). In this paper, we propose a Hybrid Mesh-Gaussian Head Avatar (MeGA) that models different head components with more suitable representations. Specifically, we select an enhanced FLAME mesh as our facial representation and predict a UV displacement map to provide per-vertex offsets for improved personalized geometric details. To achieve photorealistic renderings, we obtain facial colors using deferred neural rendering and disentangle neural textures into three meaningful parts. For hair modeling, we first build a static canonical hair using 3D Gaussian Splatting. A rigid transformation and an MLP-based deformation field are further applied to handle complex dynamic expressions. Combined with our occlusion-aware blending, MeGA generates higher-fidelity renderings for the whole head and naturally supports more downstream tasks. Experiments on the NeRSemble dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods and supporting various editing functionalities, including hairstyle alteration and texture editing.", + "arxiv_url": "http://arxiv.org/abs/2404.19026v1", + "pdf_url": "http://arxiv.org/pdf/2404.19026v1", + "published_date": "2024-04-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing", + "authors": [ + "Minghao Chen", + "Iro Laina", + "Andrea Vedaldi" + ], + "abstract": "We consider the problem of editing 3D objects and scenes based on open-ended language instructions. A common approach to this problem is to use a 2D image generator or editor to guide the 3D editing process, obviating the need for 3D data. However, this process is often inefficient due to the need for iterative updates of costly 3D representations, such as neural radiance fields, either through individual view edits or score distillation sampling. A major disadvantage of this approach is the slow convergence caused by aggregating inconsistent information across views, as the guidance from 2D models is not multi-view consistent. We thus introduce the Direct Gaussian Editor (DGE), a method that addresses these issues in two stages. First, we modify a given high-quality image editor like InstructPix2Pix to be multi-view consistent. To do so, we propose a training-free approach that integrates cues from the 3D geometry of the underlying scene. Second, given a multi-view consistent edited sequence of images, we directly and efficiently optimize the 3D representation, which is based on 3D Gaussian Splatting. Because it avoids incremental and iterative edits, DGE is significantly more accurate and efficient than existing approaches and offers additional benefits, such as enabling selective editing of parts of the scene.", + "arxiv_url": "http://arxiv.org/abs/2404.18929v3", + "pdf_url": "http://arxiv.org/pdf/2404.18929v3", + "published_date": "2024-04-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Bootstrap 3D Reconstructed Scenes from 3D Gaussian Splatting", + "authors": [ + "Yifei Gao", + "Jie Ou", + "Lei Wang", + "Jun Cheng" + ], + "abstract": "Recent developments in neural rendering techniques have greatly enhanced the rendering of photo-realistic 3D scenes across both academic and commercial fields. The latest method, known as 3D Gaussian Splatting (3D-GS), has set new benchmarks for rendering quality and speed. Nevertheless, the limitations of 3D-GS become pronounced in synthesizing new viewpoints, especially for views that greatly deviate from those seen during training. Additionally, issues such as dilation and aliasing arise when zooming in or out. These challenges can all be traced back to a single underlying issue: insufficient sampling. In our paper, we present a bootstrapping method that significantly addresses this problem. This approach employs a diffusion model to enhance the rendering of novel views using trained 3D-GS, thereby streamlining the training process. Our results indicate that bootstrapping effectively reduces artifacts, as well as clear enhancements on the evaluation metrics. Furthermore, we show that our method is versatile and can be easily integrated, allowing various 3D reconstruction projects to benefit from our approach.", + "arxiv_url": "http://arxiv.org/abs/2404.18669v2", + "pdf_url": "http://arxiv.org/pdf/2404.18669v2", + "published_date": "2024-04-29", + "categories": [ + "cs.GR", + "cs.AI", + "cs.CV", + "I.4.8" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Splatting with Deferred Reflection", + "authors": [ + "Keyang Ye", + "Qiming Hou", + "Kun Zhou" + ], + "abstract": "The advent of neural and Gaussian-based radiance field methods have achieved great success in the field of novel view synthesis. However, specular reflection remains non-trivial, as the high frequency radiance field is notoriously difficult to fit stably and accurately. We present a deferred shading method to effectively render specular reflection with Gaussian splatting. The key challenge comes from the environment map reflection model, which requires accurate surface normal while simultaneously bottlenecks normal estimation with discontinuous gradients. We leverage the per-pixel reflection gradients generated by deferred shading to bridge the optimization process of neighboring Gaussians, allowing nearly correct normal estimations to gradually propagate and eventually spread over all reflective objects. Our method significantly outperforms state-of-the-art techniques and concurrent work in synthesizing high-quality specular reflection effects, demonstrating a consistent improvement of peak signal-to-noise ratio (PSNR) for both synthetic and real-world scenes, while running at a frame rate almost identical to vanilla Gaussian splatting.", + "arxiv_url": "http://arxiv.org/abs/2404.18454v2", + "pdf_url": "http://arxiv.org/pdf/2404.18454v2", + "published_date": "2024-04-29", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Reconstructing Satellites in 3D from Amateur Telescope Images", + "authors": [ + "Zhiming Chang", + "Boyang Liu", + "Yifei Xia", + "Weimin Bai", + "Youming Guo", + "Boxin Shi", + "He Sun" + ], + "abstract": "This paper proposes a framework for the 3D reconstruction of satellites in low-Earth orbit, utilizing videos captured by small amateur telescopes. The video data obtained from these telescopes differ significantly from data for standard 3D reconstruction tasks, characterized by intense motion blur, atmospheric turbulence, pervasive background light pollution, extended focal length and constrained observational perspectives. To address these challenges, our approach begins with a comprehensive pre-processing workflow that encompasses deep learning-based image restoration, feature point extraction and camera pose initialization. We apply a customized Structure from Motion (SfM) approach, followed by an improved 3D Gaussian splatting algorithm, to achieve high-fidelity 3D model reconstruction. Our technique supports simultaneous 3D Gaussian training and pose estimation, enabling the robust generation of intricate 3D point clouds from sparse, noisy data. The procedure is further bolstered by a post-editing phase designed to eliminate noise points inconsistent with our prior knowledge of a satellite's geometric constraints. We validate our approach on synthetic datasets and actual observations of China's Space Station and International Space Station, showcasing its significant advantages over existing methods in reconstructing 3D space objects from ground-based observations.", + "arxiv_url": "http://arxiv.org/abs/2404.18394v2", + "pdf_url": "http://arxiv.org/pdf/2404.18394v2", + "published_date": "2024-04-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "High-quality Surface Reconstruction using Gaussian Surfels", + "authors": [ + "Pinxuan Dai", + "Jiamin Xu", + "Wenxiang Xie", + "Xinguo Liu", + "Huamin Wang", + "Weiwei Xu" + ], + "abstract": "We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer. By treating the local z-axis as the normal direction, it greatly improves optimization stability and surface alignment. While the derivatives to the local z-axis computed from the covariance matrix are zero in this setting, we design a self-supervised normal-depth consistency loss to remedy this issue. Monocular normal priors and foreground masks are incorporated to enhance the quality of the reconstruction, mitigating issues related to highlights and background. We propose a volumetric cutting method to aggregate the information of Gaussian surfels so as to remove erroneous points in depth maps generated by alpha blending. Finally, we apply screened Poisson reconstruction method to the fused depth maps to extract the surface mesh. Experimental results show that our method demonstrates superior performance in surface reconstruction compared to state-of-the-art neural volume rendering and point-based rendering methods.", + "arxiv_url": "http://arxiv.org/abs/2404.17774v2", + "pdf_url": "http://arxiv.org/pdf/2404.17774v2", + "published_date": "2024-04-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SLAM for Indoor Mapping of Wide Area Construction Environments", + "authors": [ + "Vincent Ress", + "Wei Zhang", + "David Skuddis", + "Norbert Haala", + "Uwe Soergel" + ], + "abstract": "Simultaneous localization and mapping (SLAM), i.e., the reconstruction of the environment represented by a (3D) map and the concurrent pose estimation, has made astonishing progress. Meanwhile, large scale applications aiming at the data collection in complex environments like factory halls or construction sites are becoming feasible. However, in contrast to small scale scenarios with building interiors separated to single rooms, shop floors or construction areas require measures at larger distances in potentially texture less areas under difficult illumination. Pose estimation is further aggravated since no GNSS measures are available as it is usual for such indoor applications. In our work, we realize data collection in a large factory hall by a robot system equipped with four stereo cameras as well as a 3D laser scanner. We apply our state-of-the-art LiDAR and visual SLAM approaches and discuss the respective pros and cons of the different sensor types for trajectory estimation and dense map generation in such an environment. Additionally, dense and accurate depth maps are generated by 3D Gaussian splatting, which we plan to use in the context of our project aiming on the automatic construction and site monitoring.", + "arxiv_url": "http://arxiv.org/abs/2404.17215v1", + "pdf_url": "http://arxiv.org/pdf/2404.17215v1", + "published_date": "2024-04-26", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Interactive3D: Create What You Want by Interactive 3D Generation", + "authors": [ + "Shaocong Dong", + "Lihe Ding", + "Zhanpeng Huang", + "Zibin Wang", + "Tianfan Xue", + "Dan Xu" + ], + "abstract": "3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user expectations, thus limiting their applicability. User-envisioning 3D object generation faces significant challenges in realizing its concepts using current generative models due to limited interaction capabilities. Existing methods mainly offer two approaches: (i) interpreting textual instructions with constrained controllability, or (ii) reconstructing 3D objects from 2D images. Both of them limit customization to the confines of the 2D reference and potentially introduce undesirable artifacts during the 3D lifting process, restricting the scope for direct and versatile 3D modifications. In this work, we introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process through extensive 3D interaction capabilities. Interactive3D is constructed in two cascading stages, utilizing distinct 3D representations. The first stage employs Gaussian Splatting for direct user interaction, allowing modifications and guidance of the generative direction at any intermediate step through (i) Adding and Removing components, (ii) Deformable and Rigid Dragging, (iii) Geometric Transformations, and (iv) Semantic Editing. Subsequently, the Gaussian splats are transformed into InstantNGP. We introduce a novel (v) Interactive Hash Refinement module to further add details and extract the geometry in the second stage. Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation. Our project webpage is available at \\url{https://interactive-3d.github.io/}.", + "arxiv_url": "http://arxiv.org/abs/2404.16510v1", + "pdf_url": "http://arxiv.org/pdf/2404.16510v1", + "published_date": "2024-04-25", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians", + "authors": [ + "Jiamin Wu", + "Kenkun Liu", + "Han Gao", + "Xiaoke Jiang", + "Lei Zhang" + ], + "abstract": "Rencently, Gaussian splatting has demonstrated significant success in novel view synthesis. Current methods often regress Gaussians with pixel or point cloud correspondence, linking each Gaussian with a pixel or a 3D point. This leads to the redundancy of Gaussians being used to overfit the correspondence rather than the objects represented by the 3D Gaussians themselves, consequently wasting resources and lacking accurate geometries or textures. In this paper, we introduce LeanGaussian, a novel approach that treats each query in deformable Transformer as one 3D Gaussian ellipsoid, breaking the pixel or point cloud correspondence constraints. We leverage deformable decoder to iteratively refine the Gaussians layer-by-layer with the image features as keys and values. Notably, the center of each 3D Gaussian is defined as 3D reference points, which are then projected onto the image for deformable attention in 2D space. On both the ShapeNet SRN dataset (category level) and the Google Scanned Objects dataset (open-category level, trained with the Objaverse dataset), our approach, outperforms prior methods by approximately 6.1\\%, achieving a PSNR of 25.44 and 22.36, respectively. Additionally, our method achieves a 3D reconstruction speed of 7.2 FPS and rendering speed 500 FPS. The code will be released at https://github.com/jwubz123/DIG3D.", + "arxiv_url": "http://arxiv.org/abs/2404.16323v2", + "pdf_url": "http://arxiv.org/pdf/2404.16323v2", + "published_date": "2024-04-25", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/jwubz123/DIG3D", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting", + "authors": [ + "Kyusun Cho", + "Joungbin Lee", + "Heeji Yoon", + "Yeobin Hong", + "Jaehoon Ko", + "Sangjun Ahn", + "Seungryong Kim" + ], + "abstract": "We propose GaussianTalker, a novel framework for real-time generation of pose-controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian Splatting (3DGS) while addressing the challenges of directly controlling 3DGS with speech audio. GaussianTalker constructs a canonical 3DGS representation of the head and deforms it in sync with the audio. A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute. This design exploits the spatial-aware features and enforces interactions between neighboring points. The feature embeddings are then fed to a spatial-audio attention module, which predicts frame-wise offsets for the attributes of each Gaussian. It is more stable than previous concatenation or multiplication approaches for manipulating the numerous Gaussians and their intricate parameters. Experimental results showcase GaussianTalker's superiority in facial fidelity, lip synchronization accuracy, and rendering speed compared to previous methods. Specifically, GaussianTalker achieves a remarkable rendering speed up to 120 FPS, surpassing previous benchmarks. Our code is made available at https://github.com/KU-CVLAB/GaussianTalker/ .", + "arxiv_url": "http://arxiv.org/abs/2404.16012v2", + "pdf_url": "http://arxiv.org/pdf/2404.16012v2", + "published_date": "2024-04-24", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "https://github.com/KU-CVLAB/GaussianTalker", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation", + "authors": [ + "Lizhi Wang", + "Feng Zhou", + "Bo yu", + "Pu Cao", + "Jianqin Yin" + ], + "abstract": "Recent advancements in 3D reconstruction technologies have paved the way for high-quality and real-time rendering of complex 3D scenes. Despite these achievements, a notable challenge persists: it is difficult to precisely reconstruct specific objects from large scenes. Current scene reconstruction techniques frequently result in the loss of object detail textures and are unable to reconstruct object portions that are occluded or unseen in views. To address this challenge, we delve into the meticulous 3D reconstruction of specific objects within large scenes and propose a framework termed OMEGAS: Object Mesh Extraction from Large Scenes Guided by Gaussian Segmentation. Specifically, we proposed a novel 3D target segmentation technique based on 2D Gaussian Splatting, which segments 3D consistent target masks in multi-view scene images and generates a preliminary target model. Moreover, to reconstruct the unseen portions of the target, we propose a novel target replenishment technique driven by large-scale generative diffusion priors. We demonstrate that our method can accurately reconstruct specific targets from large scenes, both quantitatively and qualitatively. Our experiments show that OMEGAS significantly outperforms existing reconstruction methods across various scenarios. Our project page is at: https://github.com/CrystalWlz/OMEGAS", + "arxiv_url": "http://arxiv.org/abs/2404.15891v4", + "pdf_url": "http://arxiv.org/pdf/2404.15891v4", + "published_date": "2024-04-24", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/CrystalWlz/OMEGAS", + "keywords": [ + "gaussian splatting", + "real-time rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting", + "authors": [ + "Jiahe Li", + "Jiawei Zhang", + "Xiao Bai", + "Jin Zheng", + "Xin Ning", + "Jun Zhou", + "Lin Gu" + ], + "abstract": "Radiance fields have demonstrated impressive performance in synthesizing lifelike 3D talking heads. However, due to the difficulty in fitting steep appearance changes, the prevailing paradigm that presents facial motions by directly modifying point appearance may lead to distortions in dynamic regions. To tackle this challenge, we introduce TalkingGaussian, a deformation-based radiance fields framework for high-fidelity talking head synthesis. Leveraging the point-based Gaussian Splatting, facial motions can be represented in our method by applying smooth and continuous deformations to persistent Gaussian primitives, without requiring to learn the difficult appearance change like previous methods. Due to this simplification, precise facial motions can be synthesized while keeping a highly intact facial feature. Under such a deformation paradigm, we further identify a face-mouth motion inconsistency that would affect the learning of detailed speaking motions. To address this conflict, we decompose the model into two branches separately for the face and inside mouth areas, therefore simplifying the learning tasks to help reconstruct more accurate motion and structure of the mouth region. Extensive experiments demonstrate that our method renders high-quality lip-synchronized talking head videos, with better facial fidelity and higher efficiency compared with previous methods.", + "arxiv_url": "http://arxiv.org/abs/2404.15264v2", + "pdf_url": "http://arxiv.org/pdf/2404.15264v2", + "published_date": "2024-04-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FlowMap: High-Quality Camera Poses, Intrinsics, and Depth via Gradient Descent", + "authors": [ + "Cameron Smith", + "David Charatan", + "Ayush Tewari", + "Vincent Sitzmann" + ], + "abstract": "This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Our method performs per-video gradient-descent minimization of a simple least-squares objective that compares the optical flow induced by depth, intrinsics, and poses against correspondences obtained via off-the-shelf optical flow and point tracking. Alongside the use of point tracks to encourage long-term geometric consistency, we introduce differentiable re-parameterizations of depth, intrinsics, and pose that are amenable to first-order optimization. We empirically show that camera parameters and dense depth recovered by our method enable photo-realistic novel view synthesis on 360-degree trajectories using Gaussian Splatting. Our method not only far outperforms prior gradient-descent based bundle adjustment methods, but surprisingly performs on par with COLMAP, the state-of-the-art SfM method, on the downstream task of 360-degree novel view synthesis (even though our method is purely gradient-descent based, fully differentiable, and presents a complete departure from conventional SfM).", + "arxiv_url": "http://arxiv.org/abs/2404.15259v3", + "pdf_url": "http://arxiv.org/pdf/2404.15259v3", + "published_date": "2024-04-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses", + "authors": [ + "Inhee Lee", + "Byungjun Kim", + "Hanbyul Joo" + ], + "abstract": "In this paper, we present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input. As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation, enabling to conveniently and efficiently compose and render them together. In particular, we address the scenarios with severely limited and sparse observations in 3D human reconstruction, a common challenge encountered in the real world. To tackle this challenge, we introduce a novel approach to optimize the 3D-GS representation in a canonical space by fusing the sparse cues in the common space, where we leverage a pre-trained 2D diffusion model to synthesize unseen views while keeping the consistency with the observed 2D appearances. We demonstrate our method can reconstruct high-quality animatable 3D humans in various challenging examples, in the presence of occlusion, image crops, few-shot, and extremely sparse observations. After reconstruction, our method is capable of not only rendering the scene in any novel views at arbitrary time instances, but also editing the 3D scene by removing individual humans or applying different motions for each human. Through various experiments, we demonstrate the quality and efficiency of our methods over alternative existing approaches.", + "arxiv_url": "http://arxiv.org/abs/2404.14410v1", + "pdf_url": "http://arxiv.org/pdf/2404.14410v1", + "published_date": "2024-04-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding", + "authors": [ + "Guibiao Liao", + "Jiankun Li", + "Zhenyu Bao", + "Xiaoqing Ye", + "Jingdong Wang", + "Qing Li", + "Kanglin Liu" + ], + "abstract": "The recent 3D Gaussian Splatting (GS) exhibits high-quality and real-time synthesis of novel views in 3D scenes. Currently, it primarily focuses on geometry and appearance modeling, while lacking the semantic understanding of scenes. To bridge this gap, we present CLIP-GS, which integrates semantics from Contrastive Language-Image Pre-Training (CLIP) into Gaussian Splatting to efficiently comprehend 3D environments without annotated semantic data. In specific, rather than straightforwardly learning and rendering high-dimensional semantic features of 3D Gaussians, which significantly diminishes the efficiency, we propose a Semantic Attribute Compactness (SAC) approach. SAC exploits the inherent unified semantics within objects to learn compact yet effective semantic representations of 3D Gaussians, enabling highly efficient rendering (>100 FPS). Additionally, to address the semantic ambiguity, caused by utilizing view-inconsistent 2D CLIP semantics to supervise Gaussians, we introduce a 3D Coherent Self-training (3DCS) strategy, resorting to the multi-view consistency originated from the 3D model. 3DCS imposes cross-view semantic consistency constraints by leveraging refined, self-predicted pseudo-labels derived from the trained 3D Gaussian model, thereby enhancing precise and view-consistent segmentation results. Extensive experiments demonstrate that our method remarkably outperforms existing state-of-the-art approaches, achieving improvements of 17.29% and 20.81% in mIoU metric on Replica and ScanNet datasets, respectively, while maintaining real-time rendering speed. Furthermore, our approach exhibits superior performance even with sparse input data, verifying the robustness of our method.", + "arxiv_url": "http://arxiv.org/abs/2404.14249v1", + "pdf_url": "http://arxiv.org/pdf/2404.14249v1", + "published_date": "2024-04-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting", + "authors": [ + "Hongyun Yu", + "Zhan Qu", + "Qihang Yu", + "Jianchuan Chen", + "Zhonghua Jiang", + "Zhiwen Chen", + "Shengyu Zhang", + "Jimin Xu", + "Fei Wu", + "Chengfei Lv", + "Gang Yu" + ], + "abstract": "Recent works on audio-driven talking head synthesis using Neural Radiance Fields (NeRF) have achieved impressive results. However, due to inadequate pose and expression control caused by NeRF implicit representation, these methods still have some limitations, such as unsynchronized or unnatural lip movements, and visual jitter and artifacts. In this paper, we propose GaussianTalker, a novel method for audio-driven talking head synthesis based on 3D Gaussian Splatting. With the explicit representation property of 3D Gaussians, intuitive control of the facial motion is achieved by binding Gaussians to 3D facial models. GaussianTalker consists of two modules, Speaker-specific Motion Translator and Dynamic Gaussian Renderer. Speaker-specific Motion Translator achieves accurate lip movements specific to the target speaker through universalized audio feature extraction and customized lip motion generation. Dynamic Gaussian Renderer introduces Speaker-specific BlendShapes to enhance facial detail representation via a latent pose, delivering stable and realistic rendered videos. Extensive experimental results suggest that GaussianTalker outperforms existing state-of-the-art methods in talking head synthesis, delivering precise lip synchronization and exceptional visual quality. Our method achieves rendering speeds of 130 FPS on NVIDIA RTX4090 GPU, significantly exceeding the threshold for real-time rendering performance, and can potentially be deployed on other hardware platforms.", + "arxiv_url": "http://arxiv.org/abs/2404.14037v3", + "pdf_url": "http://arxiv.org/pdf/2404.14037v3", + "published_date": "2024-04-22", + "categories": [ + "cs.CV", + "cs.MM" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GScream: Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal", + "authors": [ + "Yuxin Wang", + "Qianyi Wu", + "Guofeng Zhang", + "Dan Xu" + ], + "abstract": "This paper tackles the intricate challenge of object removal to update the radiance field using the 3D Gaussian Splatting. The main challenges of this task lie in the preservation of geometric consistency and the maintenance of texture coherence in the presence of the substantial discrete nature of Gaussian primitives. We introduce a robust framework specifically designed to overcome these obstacles. The key insight of our approach is the enhancement of information exchange among visible and invisible areas, facilitating content restoration in terms of both geometry and texture. Our methodology begins with optimizing the positioning of Gaussian primitives to improve geometric consistency across both removed and visible areas, guided by an online registration process informed by monocular depth estimation. Following this, we employ a novel feature propagation mechanism to bolster texture coherence, leveraging a cross-attention design that bridges sampling Gaussians from both uncertain and certain areas. This innovative approach significantly refines the texture coherence within the final radiance field. Extensive experiments validate that our method not only elevates the quality of novel view synthesis for scenes undergoing object removal but also showcases notable efficiency gains in training and rendering speeds.", + "arxiv_url": "http://arxiv.org/abs/2404.13679v1", + "pdf_url": "http://arxiv.org/pdf/2404.13679v1", + "published_date": "2024-04-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Learn2Talk: 3D Talking Face Learns from 2D Talking Face", + "authors": [ + "Yixiang Zhuang", + "Baoping Cheng", + "Yao Cheng", + "Yuntao Jin", + "Renshuai Liu", + "Chengyang Li", + "Xuan Cheng", + "Jing Liao", + "Juncong Lin" + ], + "abstract": "Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we propose a learning framework named Learn2Talk, which can construct a better 3D talking face network by exploiting two expertise points from the field of 2D talking face. Firstly, inspired by the audio-video sync network, a 3D sync-lip expert model is devised for the pursuit of lip-sync between audio and 3D facial motion. Secondly, a teacher model selected from 2D talking face methods is used to guide the training of the audio-to-3D motions regression network to yield more 3D vertex accuracy. Extensive experiments show the advantages of the proposed framework in terms of lip-sync, vertex accuracy and speech perception, compared with state-of-the-arts. Finally, we show two applications of the proposed framework: audio-visual speech recognition and speech-driven 3D Gaussian Splatting based avatar animation.", + "arxiv_url": "http://arxiv.org/abs/2404.12888v1", + "pdf_url": "http://arxiv.org/pdf/2404.12888v1", + "published_date": "2024-04-19", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Contrastive Gaussian Clustering: Weakly Supervised 3D Scene Segmentation", + "authors": [ + "Myrna C. Silva", + "Mahtab Dahaghin", + "Matteo Toso", + "Alessio Del Bue" + ], + "abstract": "We introduce Contrastive Gaussian Clustering, a novel approach capable of provide segmentation masks from any viewpoint and of enabling 3D segmentation of the scene. Recent works in novel-view synthesis have shown how to model the appearance of a scene via a cloud of 3D Gaussians, and how to generate accurate images from a given viewpoint by projecting on it the Gaussians before $\\alpha$ blending their color. Following this example, we train a model to include also a segmentation feature vector for each Gaussian. These can then be used for 3D scene segmentation, by clustering Gaussians according to their feature vectors; and to generate 2D segmentation masks, by projecting the Gaussians on a plane and $\\alpha$ blending over their segmentation features. Using a combination of contrastive learning and spatial regularization, our method can be trained on inconsistent 2D segmentation masks, and still learn to generate segmentation masks consistent across all views. Moreover, the resulting model is extremely accurate, improving the IoU accuracy of the predicted masks by $+8\\%$ over the state of the art. Code and trained models will be released soon.", + "arxiv_url": "http://arxiv.org/abs/2404.12784v1", + "pdf_url": "http://arxiv.org/pdf/2404.12784v1", + "published_date": "2024-04-19", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation", + "authors": [ + "Wenkai Liu", + "Tao Guan", + "Bin Zhu", + "Lili Ju", + "Zikai Song", + "Dan Li", + "Yuesong Wang", + "Wei Yang" + ], + "abstract": "In the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology. However, its application to large-scale, high-resolution scenes (exceeding 4k$\\times$4k pixels) is hindered by the excessive computational requirements for managing a large number of Gaussians. Addressing this, we introduce 'EfficientGS', an advanced approach that optimizes 3DGS for high-resolution, large-scale scenes. We analyze the densification process in 3DGS and identify areas of Gaussian over-proliferation. We propose a selective strategy, limiting Gaussian increase to key primitives, thereby enhancing the representational efficiency. Additionally, we develop a pruning mechanism to remove redundant Gaussians, those that are merely auxiliary to adjacent ones. For further enhancement, we integrate a sparse order increment for Spherical Harmonics (SH), designed to alleviate storage constraints and reduce training overhead. Our empirical evaluations, conducted on a range of datasets including extensive 4K+ aerial images, demonstrate that 'EfficientGS' not only expedites training and rendering times but also achieves this with a model size approximately tenfold smaller than conventional 3DGS while maintaining high rendering fidelity.", + "arxiv_url": "http://arxiv.org/abs/2404.12777v1", + "pdf_url": "http://arxiv.org/pdf/2404.12777v1", + "published_date": "2024-04-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Evaluating Alternatives to SFM Point Cloud Initialization for Gaussian Splatting", + "authors": [ + "Yalda Foroutan", + "Daniel Rebain", + "Kwang Moo Yi", + "Andrea Tagliasacchi" + ], + "abstract": "3D Gaussian Splatting has recently been embraced as a versatile and effective method for scene reconstruction and novel view synthesis, owing to its high-quality results and compatibility with hardware rasterization. Despite its advantages, Gaussian Splatting's reliance on high-quality point cloud initialization by Structure-from-Motion (SFM) algorithms is a significant limitation to be overcome. To this end, we investigate various initialization strategies for Gaussian Splatting and delve into how volumetric reconstructions from Neural Radiance Fields (NeRF) can be utilized to bypass the dependency on SFM data. Our findings demonstrate that random initialization can perform much better if carefully designed and that by employing a combination of improved initialization strategies and structure distillation from low-cost NeRF models, it is possible to achieve equivalent results, or at times even superior, to those obtained from SFM initialization. Source code is available at https://theialab.github.io/nerf-3dgs .", + "arxiv_url": "http://arxiv.org/abs/2404.12547v3", + "pdf_url": "http://arxiv.org/pdf/2404.12547v3", + "published_date": "2024-04-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos", + "authors": [ + "Isabella Liu", + "Hao Su", + "Xiaolong Wang" + ], + "abstract": "Modern 3D engines and graphics pipelines require mesh as a memory-efficient representation, which allows efficient rendering, geometry processing, texture editing, and many other downstream operations. However, it is still highly difficult to obtain high-quality mesh in terms of structure and detail from monocular visual observations. The problem becomes even more challenging for dynamic scenes and objects. To this end, we introduce Dynamic Gaussians Mesh (DG-Mesh), a framework to reconstruct a high-fidelity and time-consistent mesh given a single monocular video. Our work leverages the recent advancement in 3D Gaussian Splatting to construct the mesh sequence with temporal consistency from a video. Building on top of this representation, DG-Mesh recovers high-quality meshes from the Gaussian points and can track the mesh vertices over time, which enables applications such as texture editing on dynamic objects. We introduce the Gaussian-Mesh Anchoring, which encourages evenly distributed Gaussians, resulting better mesh reconstruction through mesh-guided densification and pruning on the deformed Gaussians. By applying cycle-consistent deformation between the canonical and the deformed space, we can project the anchored Gaussian back to the canonical space and optimize Gaussians across all time frames. During the evaluation on different datasets, DG-Mesh provides significantly better mesh reconstruction and rendering than baselines. Project page: https://www.liuisabella.com/DG-Mesh/", + "arxiv_url": "http://arxiv.org/abs/2404.12379v2", + "pdf_url": "http://arxiv.org/pdf/2404.12379v2", + "published_date": "2024-04-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior", + "authors": [ + "Zhiheng Liu", + "Hao Ouyang", + "Qiuyu Wang", + "Ka Leong Cheng", + "Jie Xiao", + "Kai Zhu", + "Nan Xue", + "Yu Liu", + "Yujun Shen", + "Yang Cao" + ], + "abstract": "3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant properties of the introduced points, whose optimization largely benefits from their initial 3D positions. To this end, we propose to guide the point initialization with an image-conditioned depth completion model, which learns to directly restore the depth map based on the observed image. Such a design allows our model to fill in depth values at an aligned scale with the original depth, and also to harness strong generalizability from largescale diffusion prior. Thanks to the more accurate depth completion, our approach, dubbed InFusion, surpasses existing alternatives with sufficiently better fidelity and efficiency under various complex scenarios. We further demonstrate the effectiveness of InFusion with several practical applications, such as inpainting with user-specific texture or with novel object insertion.", + "arxiv_url": "http://arxiv.org/abs/2404.11613v1", + "pdf_url": "http://arxiv.org/pdf/2404.11613v1", + "published_date": "2024-04-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RainyScape: Unsupervised Rainy Scene Reconstruction using Decoupled Neural Rendering", + "authors": [ + "Xianqiang Lyu", + "Hui Liu", + "Junhui Hou" + ], + "abstract": "We propose RainyScape, an unsupervised framework for reconstructing clean scenes from a collection of multi-view rainy images. RainyScape consists of two main modules: a neural rendering module and a rain-prediction module that incorporates a predictor network and a learnable latent embedding that captures the rain characteristics of the scene. Specifically, based on the spectral bias property of neural networks, we first optimize the neural rendering pipeline to obtain a low-frequency scene representation. Subsequently, we jointly optimize the two modules, driven by the proposed adaptive direction-sensitive gradient-based reconstruction loss, which encourages the network to distinguish between scene details and rain streaks, facilitating the propagation of gradients to the relevant components. Extensive experiments on both the classic neural radiance field and the recently proposed 3D Gaussian splatting demonstrate the superiority of our method in effectively eliminating rain streaks and rendering clean images, achieving state-of-the-art performance. The constructed high-quality dataset and source code will be publicly available.", + "arxiv_url": "http://arxiv.org/abs/2404.11401v1", + "pdf_url": "http://arxiv.org/pdf/2404.11401v1", + "published_date": "2024-04-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DeblurGS: Gaussian Splatting for Camera Motion Blur", + "authors": [ + "Jeongtaek Oh", + "Jaeyoung Chung", + "Dongwoo Lee", + "Kyoung Mu Lee" + ], + "abstract": "Although significant progress has been made in reconstructing sharp 3D scenes from motion-blurred images, a transition to real-world applications remains challenging. The primary obstacle stems from the severe blur which leads to inaccuracies in the acquisition of initial camera poses through Structure-from-Motion, a critical aspect often overlooked by previous approaches. To address this challenge, we propose DeblurGS, a method to optimize sharp 3D Gaussian Splatting from motion-blurred images, even with the noisy camera pose initialization. We restore a fine-grained sharp scene by leveraging the remarkable reconstruction capability of 3D Gaussian Splatting. Our approach estimates the 6-Degree-of-Freedom camera motion for each blurry observation and synthesizes corresponding blurry renderings for the optimization process. Furthermore, we propose Gaussian Densification Annealing strategy to prevent the generation of inaccurate Gaussians at erroneous locations during the early training stages when camera motion is still imprecise. Comprehensive experiments demonstrate that our DeblurGS achieves state-of-the-art performance in deblurring and novel view synthesis for real-world and synthetic benchmark datasets, as well as field-captured blurry smartphone videos.", + "arxiv_url": "http://arxiv.org/abs/2404.11358v2", + "pdf_url": "http://arxiv.org/pdf/2404.11358v2", + "published_date": "2024-04-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Application of 3D Gaussian Splatting for Cinematic Anatomy on Consumer Class Devices", + "authors": [ + "Simon Niedermayr", + "Christoph Neuhauser", + "Kaloian Petkov", + "Klaus Engel", + "Rüdiger Westermann" + ], + "abstract": "Interactive photorealistic rendering of 3D anatomy is used in medical education to explain the structure of the human body. It is currently restricted to frontal teaching scenarios, where even with a powerful GPU and high-speed access to a large storage device where the data set is hosted, interactive demonstrations can hardly be achieved. We present the use of novel view synthesis via compressed 3D Gaussian Splatting (3DGS) to overcome this restriction, and to even enable students to perform cinematic anatomy on lightweight and mobile devices. Our proposed pipeline first finds a set of camera poses that captures all potentially seen structures in the data. High-quality images are then generated with path tracing and converted into a compact 3DGS representation, consuming < 70 MB even for data sets of multiple GBs. This allows for real-time photorealistic novel view synthesis that recovers structures up to the voxel resolution and is almost indistinguishable from the path-traced images", + "arxiv_url": "http://arxiv.org/abs/2404.11285v2", + "pdf_url": "http://arxiv.org/pdf/2404.11285v2", + "published_date": "2024-04-17", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Opacity Fields: Efficient Adaptive Surface Reconstruction in Unbounded Scenes", + "authors": [ + "Zehao Yu", + "Torsten Sattler", + "Andreas Geiger" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results, while allowing the rendering of high-resolution images in real-time. However, leveraging 3D Gaussians for surface reconstruction poses significant challenges due to the explicit and disconnected nature of 3D Gaussians. In this work, we present Gaussian Opacity Fields (GOF), a novel approach for efficient, high-quality, and adaptive surface reconstruction in unbounded scenes. Our GOF is derived from ray-tracing-based volume rendering of 3D Gaussians, enabling direct geometry extraction from 3D Gaussians by identifying its levelset, without resorting to Poisson reconstruction or TSDF fusion as in previous work. We approximate the surface normal of Gaussians as the normal of the ray-Gaussian intersection plane, enabling the application of regularization that significantly enhances geometry. Furthermore, we develop an efficient geometry extraction method utilizing Marching Tetrahedra, where the tetrahedral grids are induced from 3D Gaussians and thus adapt to the scene's complexity. Our evaluations reveal that GOF surpasses existing 3DGS-based methods in surface reconstruction and novel view synthesis. Further, it compares favorably to or even outperforms, neural implicit methods in both quality and speed.", + "arxiv_url": "http://arxiv.org/abs/2404.10772v2", + "pdf_url": "http://arxiv.org/pdf/2404.10772v2", + "published_date": "2024-04-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks", + "authors": [ + "Florian Barthel", + "Arian Beckmann", + "Wieland Morgenstern", + "Anna Hilsmann", + "Peter Eisert" + ], + "abstract": "NeRF-based 3D-aware Generative Adversarial Networks (GANs) like EG3D or GIRAFFE have shown very high rendering quality under large representational variety. However, rendering with Neural Radiance Fields poses challenges for 3D applications: First, the significant computational demands of NeRF rendering preclude its use on low-power devices, such as mobiles and VR/AR headsets. Second, implicit representations based on neural networks are difficult to incorporate into explicit 3D scenes, such as VR environments or video games. 3D Gaussian Splatting (3DGS) overcomes these limitations by providing an explicit 3D representation that can be rendered efficiently at high frame rates. In this work, we present a novel approach that combines the high rendering quality of NeRF-based 3D-aware GANs with the flexibility and computational advantages of 3DGS. By training a decoder that maps implicit NeRF representations to explicit 3D Gaussian Splatting attributes, we can integrate the representational diversity and quality of 3D GANs into the ecosystem of 3D Gaussian Splatting for the first time. Additionally, our approach allows for a high resolution GAN inversion and real-time GAN editing with 3D Gaussian Splatting scenes. Project page: florian-barthel.github.io/gaussian_decoder", + "arxiv_url": "http://arxiv.org/abs/2404.10625v2", + "pdf_url": "http://arxiv.org/pdf/2404.10625v2", + "published_date": "2024-04-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AbsGS: Recovering Fine Details for 3D Gaussian Splatting", + "authors": [ + "Zongxin Ye", + "Wenyu Li", + "Sidun Liu", + "Peng Qiao", + "Yong Dou" + ], + "abstract": "3D Gaussian Splatting (3D-GS) technique couples 3D Gaussian primitives with differentiable rasterization to achieve high-quality novel view synthesis results while providing advanced real-time rendering performance. However, due to the flaw of its adaptive density control strategy in 3D-GS, it frequently suffers from over-reconstruction issue in intricate scenes containing high-frequency details, leading to blurry rendered images. The underlying reason for the flaw has still been under-explored. In this work, we present a comprehensive analysis of the cause of aforementioned artifacts, namely gradient collision, which prevents large Gaussians in over-reconstructed regions from splitting. To address this issue, we propose the novel homodirectional view-space positional gradient as the criterion for densification. Our strategy efficiently identifies large Gaussians in over-reconstructed regions, and recovers fine details by splitting. We evaluate our proposed method on various challenging datasets. The experimental results indicate that our approach achieves the best rendering quality with reduced or similar memory consumption. Our method is easy to implement and can be incorporated into a wide variety of most recent Gaussian Splatting-based methods. We will open source our codes upon formal publication. Our project page is available at: https://ty424.github.io/AbsGS.github.io/", + "arxiv_url": "http://arxiv.org/abs/2404.10484v1", + "pdf_url": "http://arxiv.org/pdf/2404.10484v1", + "published_date": "2024-04-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SRGS: Super-Resolution 3D Gaussian Splatting", + "authors": [ + "Xiang Feng", + "Yongbo He", + "Yubo Wang", + "Yan Yang", + "Wen Li", + "Yifei Chen", + "Zhenzhong Kuang", + "Jiajun ding", + "Jianping Fan", + "Yu Jun" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has gained popularity as a novel explicit 3D representation. This approach relies on the representation power of Gaussian primitives to provide a high-quality rendering. However, primitives optimized at low resolution inevitably exhibit sparsity and texture deficiency, posing a challenge for achieving high-resolution novel view synthesis (HRNVS). To address this problem, we propose Super-Resolution 3D Gaussian Splatting (SRGS) to perform the optimization in a high-resolution (HR) space. The sub-pixel constraint is introduced for the increased viewpoints in HR space, exploiting the sub-pixel cross-view information of the multiple low-resolution (LR) views. The gradient accumulated from more viewpoints will facilitate the densification of primitives. Furthermore, a pre-trained 2D super-resolution model is integrated with the sub-pixel constraint, enabling these dense primitives to learn faithful texture features. In general, our method focuses on densification and texture learning to effectively enhance the representation ability of primitives. Experimentally, our method achieves high rendering quality on HRNVS only with LR inputs, outperforming state-of-the-art methods on challenging datasets such as Mip-NeRF 360 and Tanks & Temples. Related codes will be released upon acceptance.", + "arxiv_url": "http://arxiv.org/abs/2404.10318v2", + "pdf_url": "http://arxiv.org/pdf/2404.10318v2", + "published_date": "2024-04-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives", + "authors": [ + "Jiadi Cui", + "Junming Cao", + "Fuqiang Zhao", + "Zhipeng He", + "Yifan Chen", + "Yuhui Zhong", + "Lan Xu", + "Yujiao Shi", + "Yingliang Zhang", + "Jingyi Yu" + ], + "abstract": "Large garages are ubiquitous yet intricate scenes that present unique challenges due to their monotonous colors, repetitive patterns, reflective surfaces, and transparent vehicle glass. Conventional Structure from Motion (SfM) methods for camera pose estimation and 3D reconstruction often fail in these environments due to poor correspondence construction. To address these challenges, we introduce LetsGo, a LiDAR-assisted Gaussian splatting framework for large-scale garage modeling and rendering. We develop a handheld scanner, Polar, equipped with IMU, LiDAR, and a fisheye camera, to facilitate accurate data acquisition. Using this Polar device, we present the GarageWorld dataset, consisting of eight expansive garage scenes with diverse geometric structures, which will be made publicly available for further research. Our approach demonstrates that LiDAR point clouds collected by the Polar device significantly enhance a suite of 3D Gaussian splatting algorithms for garage scene modeling and rendering. We introduce a novel depth regularizer that effectively eliminates floating artifacts in rendered images. Additionally, we propose a multi-resolution 3D Gaussian representation designed for Level-of-Detail (LOD) rendering. This includes adapted scaling factors for individual levels and a random-resolution-level training scheme to optimize the Gaussians across different resolutions. This representation enables efficient rendering of large-scale garage scenes on lightweight devices via a web-based renderer. Experimental results on our GarageWorld dataset, as well as on ScanNet++ and KITTI-360, demonstrate the superiority of our method in terms of rendering quality and resource efficiency.", + "arxiv_url": "http://arxiv.org/abs/2404.09748v3", + "pdf_url": "http://arxiv.org/pdf/2404.09748v3", + "published_date": "2024-04-15", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Splatting as Markov Chain Monte Carlo", + "authors": [ + "Shakiba Kheradmand", + "Daniel Rebain", + "Gopal Sharma", + "Weiwei Sun", + "Jeff Tseng", + "Hossam Isack", + "Abhishek Kar", + "Andrea Tagliasacchi", + "Kwang Moo Yi" + ], + "abstract": "While 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which can lead to poor-quality renderings, and reliance on a good initialization. In this work, we rethink the set of 3D Gaussians as a random sample drawn from an underlying probability distribution describing the physical representation of the scene-in other words, Markov Chain Monte Carlo (MCMC) samples. Under this view, we show that the 3D Gaussian updates can be converted as Stochastic Gradient Langevin Dynamics (SGLD) updates by simply introducing noise. We then rewrite the densification and pruning strategies in 3D Gaussian Splatting as simply a deterministic state transition of MCMC samples, removing these heuristics from the framework. To do so, we revise the 'cloning' of Gaussians into a relocalization scheme that approximately preserves sample probability. To encourage efficient use of Gaussians, we introduce a regularizer that promotes the removal of unused Gaussians. On various standard evaluation scenes, we show that our method provides improved rendering quality, easy control over the number of Gaussians, and robustness to initialization.", + "arxiv_url": "http://arxiv.org/abs/2404.09591v2", + "pdf_url": "http://arxiv.org/pdf/2404.09591v2", + "published_date": "2024-04-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CompGS: Efficient 3D Scene Representation via Compressed Gaussian Splatting", + "authors": [ + "Xiangrui Liu", + "Xinju Wu", + "Pingping Zhang", + "Shiqi Wang", + "Zhu Li", + "Sam Kwong" + ], + "abstract": "Gaussian splatting, renowned for its exceptional rendering quality and efficiency, has emerged as a prominent technique in 3D scene representation. However, the substantial data volume of Gaussian splatting impedes its practical utility in real-world applications. Herein, we propose an efficient 3D scene representation, named Compressed Gaussian Splatting (CompGS), which harnesses compact Gaussian primitives for faithful 3D scene modeling with a remarkably reduced data size. To ensure the compactness of Gaussian primitives, we devise a hybrid primitive structure that captures predictive relationships between each other. Then, we exploit a small set of anchor primitives for prediction, allowing the majority of primitives to be encapsulated into highly compact residual forms. Moreover, we develop a rate-constrained optimization scheme to eliminate redundancies within such hybrid primitives, steering our CompGS towards an optimal trade-off between bitrate consumption and representation efficacy. Experimental results show that the proposed CompGS significantly outperforms existing methods, achieving superior compactness in 3D scene representation without compromising model accuracy and rendering quality. Our code will be released on GitHub for further research.", + "arxiv_url": "http://arxiv.org/abs/2404.09458v1", + "pdf_url": "http://arxiv.org/pdf/2404.09458v1", + "published_date": "2024-04-15", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading", + "authors": [ + "Tong Wu", + "Jia-Mu Sun", + "Yu-Kun Lai", + "Yuewen Ma", + "Leif Kobbelt", + "Lin Gao" + ], + "abstract": "Reconstructing and editing 3D objects and scenes both play crucial roles in computer graphics and computer vision. Neural radiance fields (NeRFs) can achieve realistic reconstruction and editing results but suffer from inefficiency in rendering. Gaussian splatting significantly accelerates rendering by rasterizing Gaussian ellipsoids. However, Gaussian splatting utilizes a single Spherical Harmonic (SH) function to model both texture and lighting, limiting independent editing capabilities of these components. Recently, attempts have been made to decouple texture and lighting with the Gaussian splatting representation but may fail to produce plausible geometry and decomposition results on reflective scenes. Additionally, the forward shading technique they employ introduces noticeable blending artifacts during relighting, as the geometry attributes of Gaussians are optimized under the original illumination and may not be suitable for novel lighting conditions. To address these issues, we introduce DeferredGS, a method for decoupling and editing the Gaussian splatting representation using deferred shading. To achieve successful decoupling, we model the illumination with a learnable environment map and define additional attributes such as texture parameters and normal direction on Gaussians, where the normal is distilled from a jointly trained signed distance function. More importantly, we apply deferred shading, resulting in more realistic relighting effects compared to previous methods. Both qualitative and quantitative experiments demonstrate the superior performance of DeferredGS in novel view synthesis and editing tasks.", + "arxiv_url": "http://arxiv.org/abs/2404.09412v2", + "pdf_url": "http://arxiv.org/pdf/2404.09412v2", + "published_date": "2024-04-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling", + "authors": [ + "Xuening Yuan", + "Hongyu Yang", + "Yueming Zhao", + "Di Huang" + ], + "abstract": "Recent progress in text-to-3D creation has been propelled by integrating the potent prior of Diffusion Models from text-to-image generation into the 3D domain. Nevertheless, generating 3D scenes characterized by multiple instances and intricate arrangements remains challenging. In this study, we present DreamScape, a method for creating highly consistent 3D scenes solely from textual descriptions, leveraging the strong 3D representation capabilities of Gaussian Splatting and the complex arrangement abilities of large language models (LLMs). Our approach involves a 3D Gaussian Guide ($3{DG^2}$) for scene representation, consisting of semantic primitives (objects) and their spatial transformations and relationships derived directly from text prompts using LLMs. This compositional representation allows for local-to-global optimization of the entire scene. A progressive scale control is tailored during local object generation, ensuring that objects of different sizes and densities adapt to the scene, which addresses training instability issue arising from simple blending in the subsequent global optimization stage. To mitigate potential biases of LLM priors, we model collision relationships between objects at the global level, enhancing physical correctness and overall realism. Additionally, to generate pervasive objects like rain and snow distributed extensively across the scene, we introduce a sparse initialization and densification strategy. Experiments demonstrate that DreamScape offers high usability and controllability, enabling the generation of high-fidelity 3D scenes from only text prompts and achieving state-of-the-art performance compared to other methods.", + "arxiv_url": "http://arxiv.org/abs/2404.09227v2", + "pdf_url": "http://arxiv.org/pdf/2404.09227v2", + "published_date": "2024-04-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EGGS: Edge Guided Gaussian Splatting for Radiance Fields", + "authors": [ + "Yuanhao Gong" + ], + "abstract": "The Gaussian splatting methods are getting popular. However, their loss function only contains the $\\ell_1$ norm and the structural similarity between the rendered and input images, without considering the edges in these images. It is well-known that the edges in an image provide important information. Therefore, in this paper, we propose an Edge Guided Gaussian Splatting (EGGS) method that leverages the edges in the input images. More specifically, we give the edge region a higher weight than the flat region. With such edge guidance, the resulting Gaussian particles focus more on the edges instead of the flat regions. Moreover, such edge guidance does not crease the computation cost during the training and rendering stage. The experiments confirm that such simple edge-weighted loss function indeed improves about $1\\sim2$ dB on several difference data sets. With simply plugging in the edge guidance, the proposed method can improve all Gaussian splatting methods in different scenarios, such as human head modeling, building 3D reconstruction, etc.", + "arxiv_url": "http://arxiv.org/abs/2404.09105v2", + "pdf_url": "http://arxiv.org/pdf/2404.09105v2", + "published_date": "2024-04-14", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field", + "authors": [ + "Jiyang Li", + "Lechao Cheng", + "Zhangye Wang", + "Tingting Mu", + "Jingxuan He" + ], + "abstract": "Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatting (3D-GS), we propose LoopGaussian to elevate cinemagraph from 2D image space to 3D space using 3D Gaussian modeling. To achieve this, we first employ the 3D-GS method to reconstruct 3D Gaussian point clouds from multi-view images of static scenes,incorporating shape regularization terms to prevent blurring or artifacts caused by object deformation. We then adopt an autoencoder tailored for 3D Gaussian to project it into feature space. To maintain the local continuity of the scene, we devise SuperGaussian for clustering based on the acquired features. By calculating the similarity between clusters and employing a two-stage estimation method, we derive an Eulerian motion field to describe velocities across the entire scene. The 3D Gaussian points then move within the estimated Eulerian motion field. Through bidirectional animation techniques, we ultimately generate a 3D Cinemagraph that exhibits natural and seamlessly loopable dynamics. Experiment results validate the effectiveness of our approach, demonstrating high-quality and visually appealing scene generation. The project is available at https://pokerlishao.github.io/LoopGaussian/.", + "arxiv_url": "http://arxiv.org/abs/2404.08966v2", + "pdf_url": "http://arxiv.org/pdf/2404.08966v2", + "published_date": "2024-04-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OccGaussian: 3D Gaussian Splatting for Occluded Human Rendering", + "authors": [ + "Jingrui Ye", + "Zongkai Zhang", + "Yujiao Jiang", + "Qingmin Liao", + "Wenming Yang", + "Zongqing Lu" + ], + "abstract": "Rendering dynamic 3D human from monocular videos is crucial for various applications such as virtual reality and digital entertainment. Most methods assume the people is in an unobstructed scene, while various objects may cause the occlusion of body parts in real-life scenarios. Previous method utilizing NeRF for surface rendering to recover the occluded areas, but it requiring more than one day to train and several seconds to render, failing to meet the requirements of real-time interactive applications. To address these issues, we propose OccGaussian based on 3D Gaussian Splatting, which can be trained within 6 minutes and produces high-quality human renderings up to 160 FPS with occluded input. OccGaussian initializes 3D Gaussian distributions in the canonical space, and we perform occlusion feature query at occluded regions, the aggregated pixel-align feature is extracted to compensate for the missing information. Then we use Gaussian Feature MLP to further process the feature along with the occlusion-aware loss functions to better perceive the occluded area. Extensive experiments both in simulated and real-world occlusions, demonstrate that our method achieves comparable or even superior performance compared to the state-of-the-art method. And we improving training and inference speeds by 250x and 800x, respectively. Our code will be available for research purposes.", + "arxiv_url": "http://arxiv.org/abs/2404.08449v2", + "pdf_url": "http://arxiv.org/pdf/2404.08449v2", + "published_date": "2024-04-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh", + "authors": [ + "Jing Wen", + "Xiaoming Zhao", + "Zhongzheng Ren", + "Alexander G. Schwing", + "Shenlong Wang" + ], + "abstract": "We introduce GoMAvatar, a novel approach for real-time, memory-efficient, high-quality animatable human modeling. GoMAvatar takes as input a single monocular video to create a digital avatar capable of re-articulation in new poses and real-time rendering from novel viewpoints, while seamlessly integrating with rasterization-based graphics pipelines. Central to our method is the Gaussians-on-Mesh representation, a hybrid 3D model combining rendering quality and speed of Gaussian splatting with geometry modeling and compatibility of deformable meshes. We assess GoMAvatar on ZJU-MoCap data and various YouTube videos. GoMAvatar matches or surpasses current monocular human modeling algorithms in rendering quality and significantly outperforms them in computational efficiency (43 FPS) while being memory-efficient (3.63 MB per subject).", + "arxiv_url": "http://arxiv.org/abs/2404.07991v1", + "pdf_url": "http://arxiv.org/pdf/2404.07991v1", + "published_date": "2024-04-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion", + "authors": [ + "Jaidev Shriram", + "Alex Trevithick", + "Lingjie Liu", + "Ravi Ramamoorthi" + ], + "abstract": "We introduce RealmDreamer, a technique for generation of general forward-facing 3D scenes from text descriptions. Our technique optimizes a 3D Gaussian Splatting representation to match complex text prompts. We initialize these splats by utilizing the state-of-the-art text-to-image generators, lifting their samples into 3D, and computing the occlusion volume. We then optimize this representation across multiple views as a 3D inpainting task with image-conditional diffusion models. To learn correct geometric structure, we incorporate a depth diffusion model by conditioning on the samples from the inpainting model, giving rich geometric structure. Finally, we finetune the model using sharpened samples from image generators. Notably, our technique does not require video or multi-view data and can synthesize a variety of high-quality 3D scenes in different styles, consisting of multiple objects. Its generality additionally allows 3D synthesis from a single image.", + "arxiv_url": "http://arxiv.org/abs/2404.07199v1", + "pdf_url": "http://arxiv.org/pdf/2404.07199v1", + "published_date": "2024-04-10", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian-LIC: Real-Time Photo-Realistic SLAM with Gaussian Splatting and LiDAR-Inertial-Camera Fusion", + "authors": [ + "Xiaolei Lang", + "Laijian Li", + "Chenming Wu", + "Chen Zhao", + "Lina Liu", + "Yong Liu", + "Jiajun Lv", + "Xingxing Zuo" + ], + "abstract": "In this paper, we present a real-time photo-realistic SLAM method based on marrying Gaussian Splatting with LiDAR-Inertial-Camera SLAM. Most existing radiance-field-based SLAM systems mainly focus on bounded indoor environments, equipped with RGB-D or RGB sensors. However, they are prone to decline when expanding to unbounded scenes or encountering adverse conditions, such as violent motions and changing illumination. In contrast, oriented to general scenarios, our approach additionally tightly fuses LiDAR, IMU, and camera for robust pose estimation and photo-realistic online mapping. To compensate for regions unobserved by the LiDAR, we propose to integrate both the triangulated visual points from images and LiDAR points for initializing 3D Gaussians. In addition, the modeling of the sky and varying camera exposure have been realized for high-quality rendering. Notably, we implement our system purely with C++ and CUDA, and meticulously design a series of strategies to accelerate the online optimization of the Gaussian-based scene representation. Extensive experiments demonstrate that our method outperforms its counterparts while maintaining real-time capability. Impressively, regarding photo-realistic mapping, our method with our estimated poses even surpasses all the compared approaches that utilize privileged ground-truth poses for mapping. Our code will be released on project page https://xingxingzuo.github.io/gaussian_lic.", + "arxiv_url": "http://arxiv.org/abs/2404.06926v2", + "pdf_url": "http://arxiv.org/pdf/2404.06926v2", + "published_date": "2024-04-10", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting", + "authors": [ + "Shijie Zhou", + "Zhiwen Fan", + "Dejia Xu", + "Haoran Chang", + "Pradyumna Chari", + "Tejas Bharadwaj", + "Suya You", + "Zhangyang Wang", + "Achuta Kadambi" + ], + "abstract": "The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement to create a high-quality and globally coherent panoramic image. This image acts as a preliminary \"flat\" (2D) scene representation. Subsequently, it is lifted into 3D Gaussians, employing splatting techniques to enable real-time exploration. To produce consistent 3D geometry, our pipeline constructs a spatially coherent structure by aligning the 2D monocular depth into a globally optimized point cloud. This point cloud serves as the initial state for the centroids of 3D Gaussians. In order to address invisible issues inherent in single-view inputs, we impose semantic and geometric constraints on both synthesized and input camera views as regularizations. These guide the optimization of Gaussians, aiding in the reconstruction of unseen regions. In summary, our method offers a globally consistent 3D scene within a 360$^{\\circ}$ perspective, providing an enhanced immersive experience over existing techniques. Project website at: http://dreamscene360.github.io/", + "arxiv_url": "http://arxiv.org/abs/2404.06903v2", + "pdf_url": "http://arxiv.org/pdf/2404.06903v2", + "published_date": "2024-04-10", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatPose & Detect: Pose-Agnostic 3D Anomaly Detection", + "authors": [ + "Mathis Kruse", + "Marco Rudolph", + "Dominik Woiwode", + "Bodo Rosenhahn" + ], + "abstract": "Detecting anomalies in images has become a well-explored problem in both academia and industry. State-of-the-art algorithms are able to detect defects in increasingly difficult settings and data modalities. However, most current methods are not suited to address 3D objects captured from differing poses. While solutions using Neural Radiance Fields (NeRFs) have been proposed, they suffer from excessive computation requirements, which hinder real-world usability. For this reason, we propose the novel 3D Gaussian splatting-based framework SplatPose which, given multi-view images of a 3D object, accurately estimates the pose of unseen views in a differentiable manner, and detects anomalies in them. We achieve state-of-the-art results in both training and inference speed, and detection performance, even when using less training data than competing methods. We thoroughly evaluate our framework using the recently proposed Pose-agnostic Anomaly Detection benchmark and its multi-pose anomaly detection (MAD) data set.", + "arxiv_url": "http://arxiv.org/abs/2404.06832v1", + "pdf_url": "http://arxiv.org/pdf/2404.06832v1", + "published_date": "2024-04-10", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Zero-shot Point Cloud Completion Via 2D Priors", + "authors": [ + "Tianxin Huang", + "Zhiwen Yan", + "Yuyang Zhao", + "Gim Hee Lee" + ], + "abstract": "3D point cloud completion is designed to recover complete shapes from partially observed point clouds. Conventional completion methods typically depend on extensive point cloud data for training %, with their effectiveness often constrained to object categories similar to those seen during training. In contrast, we propose a zero-shot framework aimed at completing partially observed point clouds across any unseen categories. Leveraging point rendering via Gaussian Splatting, we develop techniques of Point Cloud Colorization and Zero-shot Fractal Completion that utilize 2D priors from pre-trained diffusion models to infer missing regions. Experimental results on both synthetic and real-world scanned point clouds demonstrate that our approach outperforms existing methods in completing a variety of objects without any requirement for specific training data.", + "arxiv_url": "http://arxiv.org/abs/2404.06814v1", + "pdf_url": "http://arxiv.org/pdf/2404.06814v1", + "published_date": "2024-04-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SpikeNVS: Enhancing Novel View Synthesis from Blurry Images via Spike Camera", + "authors": [ + "Gaole Dai", + "Zhenyu Wang", + "Qinwen Xu", + "Ming Lu", + "Wen Chen", + "Boxin Shi", + "Shanghang Zhang", + "Tiejun Huang" + ], + "abstract": "One of the most critical factors in achieving sharp Novel View Synthesis (NVS) using neural field methods like Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) is the quality of the training images. However, Conventional RGB cameras are susceptible to motion blur. In contrast, neuromorphic cameras like event and spike cameras inherently capture more comprehensive temporal information, which can provide a sharp representation of the scene as additional training data. Recent methods have explored the integration of event cameras to improve the quality of NVS. The event-RGB approaches have some limitations, such as high training costs and the inability to work effectively in the background. Instead, our study introduces a new method that uses the spike camera to overcome these limitations. By considering texture reconstruction from spike streams as ground truth, we design the Texture from Spike (TfS) loss. Since the spike camera relies on temporal integration instead of temporal differentiation used by event cameras, our proposed TfS loss maintains manageable training costs. It handles foreground objects with backgrounds simultaneously. We also provide a real-world dataset captured with our spike-RGB camera system to facilitate future research endeavors. We conduct extensive experiments using synthetic and real-world datasets to demonstrate that our design can enhance novel view synthesis across NeRF and 3DGS. The code and dataset will be made available for public access.", + "arxiv_url": "http://arxiv.org/abs/2404.06710v3", + "pdf_url": "http://arxiv.org/pdf/2404.06710v3", + "published_date": "2024-04-10", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "End-to-End Rate-Distortion Optimized 3D Gaussian Representation", + "authors": [ + "Henan Wang", + "Hanxin Zhu", + "Tianyu He", + "Runsen Feng", + "Jiajun Deng", + "Jiang Bian", + "Zhibo Chen" + ], + "abstract": "3D Gaussian Splatting (3DGS) has become an emerging technique with remarkable potential in 3D representation and image rendering. However, the substantial storage overhead of 3DGS significantly impedes its practical applications. In this work, we formulate the compact 3D Gaussian learning as an end-to-end Rate-Distortion Optimization (RDO) problem and propose RDO-Gaussian that can achieve flexible and continuous rate control. RDO-Gaussian addresses two main issues that exist in current schemes: 1) Different from prior endeavors that minimize the rate under the fixed distortion, we introduce dynamic pruning and entropy-constrained vector quantization (ECVQ) that optimize the rate and distortion at the same time. 2) Previous works treat the colors of each Gaussian equally, while we model the colors of different regions and materials with learnable numbers of parameters. We verify our method on both real and synthetic scenes, showcasing that RDO-Gaussian greatly reduces the size of 3D Gaussian over 40x, and surpasses existing methods in rate-distortion performance.", + "arxiv_url": "http://arxiv.org/abs/2406.01597v2", + "pdf_url": "http://arxiv.org/pdf/2406.01597v2", + "published_date": "2024-04-09", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Geometry-aware Deformable Gaussian Splatting for Dynamic View Synthesis", + "authors": [ + "Zhicheng Lu", + "Xiang Guo", + "Le Hui", + "Tianrui Chen", + "Min Yang", + "Xiao Tang", + "Feng Zhu", + "Yuchao Dai" + ], + "abstract": "In this paper, we propose a 3D geometry-aware deformable Gaussian Splatting method for dynamic view synthesis. Existing neural radiance fields (NeRF) based solutions learn the deformation in an implicit manner, which cannot incorporate 3D scene geometry. Therefore, the learned deformation is not necessarily geometrically coherent, which results in unsatisfactory dynamic view synthesis and 3D dynamic reconstruction. Recently, 3D Gaussian Splatting provides a new representation of the 3D scene, building upon which the 3D geometry could be exploited in learning the complex 3D deformation. Specifically, the scenes are represented as a collection of 3D Gaussian, where each 3D Gaussian is optimized to move and rotate over time to model the deformation. To enforce the 3D scene geometry constraint during deformation, we explicitly extract 3D geometry features and integrate them in learning the 3D deformation. In this way, our solution achieves 3D geometry-aware deformation modeling, which enables improved dynamic view synthesis and 3D dynamic reconstruction. Extensive experimental results on both synthetic and real datasets prove the superiority of our solution, which achieves new state-of-the-art performance. The project is available at https://npucvr.github.io/GaGS/", + "arxiv_url": "http://arxiv.org/abs/2404.06270v2", + "pdf_url": "http://arxiv.org/pdf/2404.06270v2", + "published_date": "2024-04-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction", + "authors": [ + "Sierra Bonilla", + "Shuai Zhang", + "Dimitrios Psychogyios", + "Danail Stoyanov", + "Francisco Vasconcelos", + "Sophia Bano" + ], + "abstract": "Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer.", + "arxiv_url": "http://arxiv.org/abs/2404.06128v2", + "pdf_url": "http://arxiv.org/pdf/2404.06128v2", + "published_date": "2024-04-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Revising Densification in Gaussian Splatting", + "authors": [ + "Samuel Rota Bulò", + "Lorenzo Porzi", + "Peter Kontschieder" + ], + "abstract": "In this paper, we address the limitations of Adaptive Density Control (ADC) in 3D Gaussian Splatting (3DGS), a scene representation method achieving high-quality, photorealistic results for novel view synthesis. ADC has been introduced for automatic 3D point primitive management, controlling densification and pruning, however, with certain limitations in the densification logic. Our main contribution is a more principled, pixel-error driven formulation for density control in 3DGS, leveraging an auxiliary, per-pixel error function as the criterion for densification. We further introduce a mechanism to control the total number of primitives generated per scene and correct a bias in the current opacity handling strategy of ADC during cloning operations. Our approach leads to consistent quality improvements across a variety of benchmark scenes, without sacrificing the method's efficiency.", + "arxiv_url": "http://arxiv.org/abs/2404.06109v1", + "pdf_url": "http://arxiv.org/pdf/2404.06109v1", + "published_date": "2024-04-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Hash3D: Training-free Acceleration for 3D Generation", + "authors": [ + "Xingyi Yang", + "Xinchao Wang" + ], + "abstract": "The evolution of 3D generative modeling has been notably propelled by the adoption of 2D diffusion models. Despite this progress, the cumbersome optimization process per se presents a critical hurdle to efficiency. In this paper, we introduce Hash3D, a universal acceleration for 3D generation without model training. Central to Hash3D is the insight that feature-map redundancy is prevalent in images rendered from camera positions and diffusion time-steps in close proximity. By effectively hashing and reusing these feature maps across neighboring timesteps and camera angles, Hash3D substantially prevents redundant calculations, thus accelerating the diffusion model's inference in 3D generation tasks. We achieve this through an adaptive grid-based hashing. Surprisingly, this feature-sharing mechanism not only speed up the generation but also enhances the smoothness and view consistency of the synthesized 3D objects. Our experiments covering 5 text-to-3D and 3 image-to-3D models, demonstrate Hash3D's versatility to speed up optimization, enhancing efficiency by 1.3 to 4 times. Additionally, Hash3D's integration with 3D Gaussian splatting largely speeds up 3D model creation, reducing text-to-3D processing to about 10 minutes and image-to-3D conversion to roughly 30 seconds. The project page is at https://adamdad.github.io/hash3D/.", + "arxiv_url": "http://arxiv.org/abs/2404.06091v1", + "pdf_url": "http://arxiv.org/pdf/2404.06091v1", + "published_date": "2024-04-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "StylizedGS: Controllable Stylization for 3D Gaussian Splatting", + "authors": [ + "Dingxi Zhang", + "Yu-Jie Yuan", + "Zhuoxun Chen", + "Fang-Lue Zhang", + "Zhenliang He", + "Shiguang Shan", + "Lin Gao" + ], + "abstract": "As XR technology continues to advance rapidly, 3D generation and editing are increasingly crucial. Among these, stylization plays a key role in enhancing the appearance of 3D models. By utilizing stylization, users can achieve consistent artistic effects in 3D editing using a single reference style image, making it a user-friendly editing method. However, recent NeRF-based 3D stylization methods encounter efficiency issues that impact the user experience, and their implicit nature limits their ability to accurately transfer geometric pattern styles. Additionally, the ability for artists to apply flexible control over stylized scenes is considered highly desirable to foster an environment conducive to creative exploration. To address the above issues, we introduce StylizedGS, an efficient 3D neural style transfer framework with adaptable control over perceptual factors based on 3D Gaussian Splatting (3DGS) representation. We propose a filter-based refinement to eliminate floaters that affect the stylization effects in the scene reconstruction process. The nearest neighbor-based style loss is introduced to achieve stylization by fine-tuning the geometry and color parameters of 3DGS, while a depth preservation loss with other regularizations is proposed to prevent the tampering of geometry content. Moreover, facilitated by specially designed losses, StylizedGS enables users to control color, stylized scale, and regions during the stylization to possess customization capabilities. Our method achieves high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls. Extensive experiments across various scenes and styles demonstrate the effectiveness and efficiency of our method concerning both stylization quality and inference speed.", + "arxiv_url": "http://arxiv.org/abs/2404.05220v2", + "pdf_url": "http://arxiv.org/pdf/2404.05220v2", + "published_date": "2024-04-08", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dual-Camera Smooth Zoom on Mobile Phones", + "authors": [ + "Renlong Wu", + "Zhilu Zhang", + "Yu Yang", + "Wangmeng Zuo" + ], + "abstract": "When zooming between dual cameras on a mobile, noticeable jumps in geometric content and image color occur in the preview, inevitably affecting the user's zoom experience. In this work, we introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview. The frame interpolation (FI) technique is a potential solution but struggles with ground-truth collection. To address the issue, we suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene. In particular, we propose a novel dual-camera smooth zoom Gaussian Splatting (ZoomGS), where a camera-specific encoding is introduced to construct a specific 3D model for each virtual camera. With the proposed data factory, we construct a synthetic dataset for DCSZ, and we utilize it to fine-tune FI models. In addition, we collect real-world dual-zoom images without ground-truth for evaluation. Extensive experiments are conducted with multiple FI methods. The results show that the fine-tuned FI models achieve a significant performance improvement over the original ones on DCSZ task. The datasets, codes, and pre-trained models will are available at https://github.com/ZcsrenlongZ/ZoomGS.", + "arxiv_url": "http://arxiv.org/abs/2404.04908v2", + "pdf_url": "http://arxiv.org/pdf/2404.04908v2", + "published_date": "2024-04-07", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/ZcsrenlongZ/ZoomGS", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF", + "authors": [ + "Butian Xiong", + "Nanjun Zheng", + "Junhua Liu", + "Zhen Li" + ], + "abstract": "We introduce a novel, multimodal large-scale scene reconstruction benchmark that utilizes newly developed 3D representation approaches: Gaussian Splatting and Neural Radiance Fields (NeRF). Our expansive U-Scene dataset surpasses any previously existing real large-scale outdoor LiDAR and image dataset in both area and point count. GauU-Scene encompasses over 6.5 square kilometers and features a comprehensive RGB dataset coupled with LiDAR ground truth. Additionally, we are the first to propose a LiDAR and image alignment method for a drone-based dataset. Our assessment of GauU-Scene includes a detailed analysis across various novel viewpoints, employing image-based metrics such as SSIM, LPIPS, and PSNR on NeRF and Gaussian Splatting based methods. This analysis reveals contradictory results when applying geometric-based metrics like Chamfer distance. The experimental results on our multimodal dataset highlight the unreliability of current image-based metrics and reveal significant drawbacks in geometric reconstruction using the current Gaussian Splatting-based method, further illustrating the necessity of our dataset for assessing geometry reconstruction tasks. We also provide detailed supplementary information on data collection protocols and make the dataset available on the following anonymous project page", + "arxiv_url": "http://arxiv.org/abs/2404.04880v2", + "pdf_url": "http://arxiv.org/pdf/2404.04880v2", + "published_date": "2024-04-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Z-Splat: Z-Axis Gaussian Splatting for Camera-Sonar Fusion", + "authors": [ + "Ziyuan Qu", + "Omkar Vengurlekar", + "Mohamad Qadri", + "Kevin Zhang", + "Michael Kaess", + "Christopher Metzler", + "Suren Jayasuriya", + "Adithya Pediredla" + ], + "abstract": "Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view ($360^{\\circ}$ viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known 'missing cone' problem, which results in poor reconstruction along the depth axis. In this manuscript, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).", + "arxiv_url": "http://arxiv.org/abs/2404.04687v2", + "pdf_url": "http://arxiv.org/pdf/2404.04687v2", + "published_date": "2024-04-06", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations", + "authors": [ + "Yang Zheng", + "Qingqing Zhao", + "Guandao Yang", + "Wang Yifan", + "Donglai Xiang", + "Florian Dubost", + "Dmitry Lagun", + "Thabo Beeler", + "Federico Tombari", + "Leonidas Guibas", + "Gordon Wetzstein" + ], + "abstract": "Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along with the physical parameters of the fabric of their clothes. For this purpose, we adopt a mesh-aligned 4D Gaussian technique for spatio-temporal mesh tracking as well as a physically based inverse renderer to estimate the intrinsic material properties. PhysAvatar integrates a physics simulator to estimate the physical parameters of the garments using gradient-based optimization in a principled manner. These novel capabilities enable PhysAvatar to create high-quality novel-view renderings of avatars dressed in loose-fitting clothes under motions and lighting conditions not seen in the training data. This marks a significant advancement towards modeling photorealistic digital humans using physically based inverse rendering with physics in the loop. Our project website is at: https://qingqing-zhao.github.io/PhysAvatar", + "arxiv_url": "http://arxiv.org/abs/2404.04421v2", + "pdf_url": "http://arxiv.org/pdf/2404.04421v2", + "published_date": "2024-04-05", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Robust Gaussian Splatting", + "authors": [ + "François Darmon", + "Lorenzo Porzi", + "Samuel Rota-Bulò", + "Peter Kontschieder" + ], + "abstract": "In this paper, we address common error sources for 3D Gaussian Splatting (3DGS) including blur, imperfect camera poses, and color inconsistencies, with the goal of improving its robustness for practical applications like reconstructions from handheld phone captures. Our main contribution involves modeling motion blur as a Gaussian distribution over camera poses, allowing us to address both camera pose refinement and motion blur correction in a unified way. Additionally, we propose mechanisms for defocus blur compensation and for addressing color in-consistencies caused by ambient light, shadows, or due to camera-related factors like varying white balancing settings. Our proposed solutions integrate in a seamless way with the 3DGS formulation while maintaining its benefits in terms of training efficiency and rendering speed. We experimentally validate our contributions on relevant benchmark datasets including Scannet++ and Deblur-NeRF, obtaining state-of-the-art results and thus consistent improvements over relevant baselines.", + "arxiv_url": "http://arxiv.org/abs/2404.04211v1", + "pdf_url": "http://arxiv.org/pdf/2404.04211v1", + "published_date": "2024-04-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes", + "authors": [ + "Chenyang Wu", + "Yifan Duan", + "Xinran Zhang", + "Yu Sheng", + "Jianmin Ji", + "Yanyong Zhang" + ], + "abstract": "Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.", + "arxiv_url": "http://arxiv.org/abs/2404.04026v1", + "pdf_url": "http://arxiv.org/pdf/2404.04026v1", + "published_date": "2024-04-05", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer", + "authors": [ + "Zijie Wu", + "Chaohui Yu", + "Yanqin Jiang", + "Chenjie Cao", + "Fan Wang", + "Xiang Bai" + ], + "abstract": "Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due to the implicit nature of NeRF or the intricate dense Gaussian motion prediction. To address these issues, this paper proposes an efficient, sparse-controlled video-to-4D framework named SC4D, that decouples motion and appearance to achieve superior video-to-4D generation. Moreover, we introduce Adaptive Gaussian (AG) initialization and Gaussian Alignment (GA) loss to mitigate shape degeneration issue, ensuring the fidelity of the learned motion and shape. Comprehensive experimental results demonstrate that our method surpasses existing methods in both quality and efficiency. In addition, facilitated by the disentangled modeling of motion and appearance of SC4D, we devise a novel application that seamlessly transfers the learned motion onto a diverse array of 4D entities according to textual descriptions.", + "arxiv_url": "http://arxiv.org/abs/2404.03736v2", + "pdf_url": "http://arxiv.org/pdf/2404.03736v2", + "published_date": "2024-04-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting", + "authors": [ + "Jeongmin Bae", + "Seoha Kim", + "Youngsik Yun", + "Hahyun Lee", + "Gun Bang", + "Youngjung Uh" + ], + "abstract": "As 3D Gaussian Splatting (3DGS) provides fast and high-quality novel view synthesis, it is a natural extension to deform a canonical 3DGS to multiple frames for representing a dynamic scene. However, previous works fail to accurately reconstruct complex dynamic scenes. We attribute the failure to the design of the deformation field, which is built as a coordinate-based function. This approach is problematic because 3DGS is a mixture of multiple fields centered at the Gaussians, not just a single coordinate-based framework. To resolve this problem, we define the deformation as a function of per-Gaussian embeddings and temporal embeddings. Moreover, we decompose deformations as coarse and fine deformations to model slow and fast movements, respectively. Also, we introduce a local smoothness regularization for per-Gaussian embedding to improve the details in dynamic regions. Project page: https://jeongminb.github.io/e-d3dgs/", + "arxiv_url": "http://arxiv.org/abs/2404.03613v5", + "pdf_url": "http://arxiv.org/pdf/2404.03613v5", + "published_date": "2024-04-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling", + "authors": [ + "Haoran Li", + "Haolin Shi", + "Wenli Zhang", + "Wenjun Wu", + "Yong Liao", + "Lin Wang", + "Lik-hang Lee", + "Pengyuan Zhou" + ], + "abstract": "Text-to-3D scene generation holds immense potential for the gaming, film, and architecture sectors. Despite significant progress, existing methods struggle with maintaining high quality, consistency, and editing flexibility. In this paper, we propose DreamScene, a 3D Gaussian-based novel text-to-3D scene generation framework, to tackle the aforementioned three challenges mainly via two strategies. First, DreamScene employs Formation Pattern Sampling (FPS), a multi-timestep sampling strategy guided by the formation patterns of 3D objects, to form fast, semantically rich, and high-quality representations. FPS uses 3D Gaussian filtering for optimization stability, and leverages reconstruction techniques to generate plausible textures. Second, DreamScene employs a progressive three-stage camera sampling strategy, specifically designed for both indoor and outdoor settings, to effectively ensure object-environment integration and scene-wide 3D consistency. Last, DreamScene enhances scene editing flexibility by integrating objects and environments, enabling targeted adjustments. Extensive experiments validate DreamScene's superiority over current state-of-the-art techniques, heralding its wide-ranging potential for diverse applications. Code and demos will be released at https://dreamscene-project.github.io .", + "arxiv_url": "http://arxiv.org/abs/2404.03575v2", + "pdf_url": "http://arxiv.org/pdf/2404.03575v2", + "published_date": "2024-04-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting", + "authors": [ + "Longwei Li", + "Huajian Huang", + "Sai-Kit Yeung", + "Hui Cheng" + ], + "abstract": "Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in various domains. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstruction. Specifically, we conduct a theoretical analysis of spherical camera model derivatives in 3D Gaussian Splatting. According to the derivatives, we then implement a new GPU-accelerated omnidirectional rasterizer that directly splats 3D Gaussians onto the equirectangular screen space for omnidirectional image rendering. We realize differentiable optimization of the omnidirectional radiance field without the requirement of cube-map rectification or tangent-plane approximation. Extensive experiments conducted in egocentric and roaming scenarios demonstrate that our method achieves state-of-the-art reconstruction quality and high rendering speed using omnidirectional images. The code will be publicly available.", + "arxiv_url": "http://arxiv.org/abs/2404.03202v5", + "pdf_url": "http://arxiv.org/pdf/2404.03202v5", + "published_date": "2024-04-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis", + "authors": [ + "Emmanouil Nikolakakis", + "Utkarsh Gupta", + "Jonathan Vengosh", + "Justin Bui", + "Razvan Marinescu" + ], + "abstract": "We present GaSpCT, a novel view synthesis and 3D scene representation method used to generate novel projection views for Computer Tomography (CT) scans. We adapt the Gaussian Splatting framework to enable novel view synthesis in CT based on limited sets of 2D image projections and without the need for Structure from Motion (SfM) methodologies. Therefore, we reduce the total scanning duration and the amount of radiation dose the patient receives during the scan. We adapted the loss function to our use-case by encouraging a stronger background and foreground distinction using two sparsity promoting regularizers: a beta loss and a total variation (TV) loss. Finally, we initialize the Gaussian locations across the 3D space using a uniform prior distribution of where the brain's positioning would be expected to be within the field of view. We evaluate the performance of our model using brain CT scans from the Parkinson's Progression Markers Initiative (PPMI) dataset and demonstrate that the rendered novel views closely match the original projection views of the simulated scan, and have better performance than other implicit 3D scene representations methodologies. Furthermore, we empirically observe reduced training time compared to neural network based image synthesis for sparse-view CT image reconstruction. Finally, the memory requirements of the Gaussian Splatting representations are reduced by 17% compared to the equivalent voxel grid image representations.", + "arxiv_url": "http://arxiv.org/abs/2404.03126v1", + "pdf_url": "http://arxiv.org/pdf/2404.03126v1", + "published_date": "2024-04-04", + "categories": [ + "eess.IV", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving", + "authors": [ + "Cheng Zhao", + "Su Sun", + "Ruoyu Wang", + "Yuliang Guo", + "Jun-Jun Wan", + "Zhou Huang", + "Xinyu Huang", + "Yingjie Victor Chen", + "Liu Ren" + ], + "abstract": "Most 3D Gaussian Splatting (3D-GS) based methods for urban scenes initialize 3D Gaussians directly with 3D LiDAR points, which not only underutilizes LiDAR data capabilities but also overlooks the potential advantages of fusing LiDAR with camera data. In this paper, we design a novel tightly coupled LiDAR-Camera Gaussian Splatting (TCLC-GS) to fully leverage the combined strengths of both LiDAR and camera sensors, enabling rapid, high-quality 3D reconstruction and novel view RGB/depth synthesis. TCLC-GS designs a hybrid explicit (colorized 3D mesh) and implicit (hierarchical octree feature) 3D representation derived from LiDAR-camera data, to enrich the properties of 3D Gaussians for splatting. 3D Gaussian's properties are not only initialized in alignment with the 3D mesh which provides more completed 3D shape and color information, but are also endowed with broader contextual information through retrieved octree implicit features. During the Gaussian Splatting optimization process, the 3D mesh offers dense depth information as supervision, which enhances the training process by learning of a robust geometry. Comprehensive evaluations conducted on the Waymo Open Dataset and nuScenes Dataset validate our method's state-of-the-art (SOTA) performance. Utilizing a single NVIDIA RTX 3090 Ti, our method demonstrates fast training and achieves real-time RGB and depth rendering at 90 FPS in resolution of 1920x1280 (Waymo), and 120 FPS in resolution of 1600x900 (nuScenes) in urban scenarios.", + "arxiv_url": "http://arxiv.org/abs/2404.02410v2", + "pdf_url": "http://arxiv.org/pdf/2404.02410v2", + "published_date": "2024-04-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation", + "authors": [ + "Wangguandong Zheng", + "Haifeng Xia", + "Rui Chen", + "Ming Shao", + "Siyu Xia", + "Zhengming Ding" + ], + "abstract": "Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input.", + "arxiv_url": "http://arxiv.org/abs/2404.01843v2", + "pdf_url": "http://arxiv.org/pdf/2404.01843v2", + "published_date": "2024-04-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views", + "authors": [ + "Yaniv Wolf", + "Amit Bracha", + "Ron Kimmel" + ], + "abstract": "Recently, 3D Gaussian Splatting (3DGS) has emerged as an efficient approach for accurately representing scenes. However, despite its superior novel view synthesis capabilities, extracting the geometry of the scene directly from the Gaussian properties remains a challenge, as those are optimized based on a photometric loss. While some concurrent models have tried adding geometric constraints during the Gaussian optimization process, they still produce noisy, unrealistic surfaces. We propose a novel approach for bridging the gap between the noisy 3DGS representation and the smooth 3D mesh representation, by injecting real-world knowledge into the depth extraction process. Instead of extracting the geometry of the scene directly from the Gaussian properties, we instead extract the geometry through a pre-trained stereo-matching model. We render stereo-aligned pairs of images corresponding to the original training poses, feed the pairs into a stereo model to get a depth profile, and finally fuse all of the profiles together to get a single mesh. The resulting reconstruction is smoother, more accurate and shows more intricate details compared to other methods for surface reconstruction from Gaussian Splatting, while only requiring a small overhead on top of the fairly short 3DGS optimization process. We performed extensive testing of the proposed method on in-the-wild scenes, obtained using a smartphone, showcasing its superior reconstruction abilities. Additionally, we tested the method on the Tanks and Temples and DTU benchmarks, achieving state-of-the-art results.", + "arxiv_url": "http://arxiv.org/abs/2404.01810v2", + "pdf_url": "http://arxiv.org/pdf/2404.01810v2", + "published_date": "2024-04-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing", + "authors": [ + "Ri-Zhao Qiu", + "Ge Yang", + "Weijia Zeng", + "Xiaolong Wang" + ], + "abstract": "Scene representations using 3D Gaussian primitives have produced excellent results in modeling the appearance of static and dynamic 3D scenes. Many graphics applications, however, demand the ability to manipulate both the appearance and the physical properties of objects. We introduce Feature Splatting, an approach that unifies physics-based dynamic scene synthesis with rich semantics from vision language foundation models that are grounded by natural language. Our first contribution is a way to distill high-quality, object-centric vision-language features into 3D Gaussians, that enables semi-automatic scene decomposition using text queries. Our second contribution is a way to synthesize physics-based dynamics from an otherwise static scene using a particle-based simulator, in which material properties are assigned automatically via text queries. We ablate key techniques used in this pipeline, to illustrate the challenge and opportunities in using feature-carrying 3D Gaussians as a unified format for appearance, geometry, material properties and semantics grounded on natural language. Project website: https://feature-splatting.github.io/", + "arxiv_url": "http://arxiv.org/abs/2404.01223v1", + "pdf_url": "http://arxiv.org/pdf/2404.01223v1", + "published_date": "2024-04-01", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting", + "authors": [ + "Jiarui Meng", + "Haijie Li", + "Yanmin Wu", + "Qiankun Gao", + "Shuzhou Yang", + "Jian Zhang", + "Siwei Ma" + ], + "abstract": "3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. Our code will be made publicly available for reproducible research.", + "arxiv_url": "http://arxiv.org/abs/2404.01168v1", + "pdf_url": "http://arxiv.org/pdf/2404.01168v1", + "published_date": "2024-04-01", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians", + "authors": [ + "Yang Liu", + "He Guan", + "Chuanchen Luo", + "Lue Fan", + "Naiyan Wang", + "Junran Peng", + "Zhaoxiang Zhang" + ], + "abstract": "The advancement of real-time 3D scene reconstruction and novel view synthesis has been significantly propelled by 3D Gaussian Splatting (3DGS). However, effectively training large-scale 3DGS and rendering it in real-time across various scales remains challenging. This paper introduces CityGaussian (CityGS), which employs a novel divide-and-conquer training approach and Level-of-Detail (LoD) strategy for efficient large-scale 3DGS training and rendering. Specifically, the global scene prior and adaptive training data selection enables efficient training and seamless fusion. Based on fused Gaussian primitives, we generate different detail levels through compression, and realize fast rendering across various scales through the proposed block-wise detail levels selection and aggregation strategy. Extensive experimental results on large-scale scenes demonstrate that our approach attains state-of-theart rendering quality, enabling consistent real-time rendering of largescale scenes across vastly different scales. Our project page is available at https://dekuliutesla.github.io/citygs/.", + "arxiv_url": "http://arxiv.org/abs/2404.01133v3", + "pdf_url": "http://arxiv.org/pdf/2404.01133v3", + "published_date": "2024-04-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior", + "authors": [ + "David Svitov", + "Pietro Morerio", + "Lourdes Agapito", + "Alessio Del Bue" + ], + "abstract": "We present HAHA - a novel approach for animatable human avatar generation from monocular input videos. The proposed method relies on learning the trade-off between the use of Gaussian splatting and a textured mesh for efficient and high fidelity rendering. We demonstrate its efficiency to animate and render full-body human avatars controlled via the SMPL-X parametric model. Our model learns to apply Gaussian splatting only in areas of the SMPL-X mesh where it is necessary, like hair and out-of-mesh clothing. This results in a minimal number of Gaussians being used to represent the full avatar, and reduced rendering artifacts. This allows us to handle the animation of small body parts such as fingers that are traditionally disregarded. We demonstrate the effectiveness of our approach on two open datasets: SnapshotPeople and X-Humans. Our method demonstrates on par reconstruction quality to the state-of-the-art on SnapshotPeople, while using less than a third of Gaussians. HAHA outperforms previous state-of-the-art on novel poses from X-Humans both quantitatively and qualitatively.", + "arxiv_url": "http://arxiv.org/abs/2404.01053v2", + "pdf_url": "http://arxiv.org/pdf/2404.01053v2", + "published_date": "2024-04-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements", + "authors": [ + "Lisong C. Sun", + "Neel P. Bhatt", + "Jonathan C. Liu", + "Zhiwen Fan", + "Zhangyang Wang", + "Todd E. Humphreys", + "Ufuk Topcu" + ], + "abstract": "Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage: https://vita-group.github.io/MM3DGS-SLAM", + "arxiv_url": "http://arxiv.org/abs/2404.00923v1", + "pdf_url": "http://arxiv.org/pdf/2404.00923v1", + "published_date": "2024-04-01", + "categories": [ + "cs.CV", + "cs.AI", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGSR: Implicit Surface Reconstruction with 3D Gaussian Splatting", + "authors": [ + "Xiaoyang Lyu", + "Yang-Tian Sun", + "Yi-Hua Huang", + "Xiuzhe Wu", + "Ziyi Yang", + "Yilun Chen", + "Jiangmiao Pang", + "Xiaojuan Qi" + ], + "abstract": "In this paper, we present an implicit surface reconstruction method with 3D Gaussian Splatting (3DGS), namely 3DGSR, that allows for accurate 3D reconstruction with intricate details while inheriting the high efficiency and rendering quality of 3DGS. The key insight is incorporating an implicit signed distance field (SDF) within 3D Gaussians to enable them to be aligned and jointly optimized. First, we introduce a differentiable SDF-to-opacity transformation function that converts SDF values into corresponding Gaussians' opacities. This function connects the SDF and 3D Gaussians, allowing for unified optimization and enforcing surface constraints on the 3D Gaussians. During learning, optimizing the 3D Gaussians provides supervisory signals for SDF learning, enabling the reconstruction of intricate details. However, this only provides sparse supervisory signals to the SDF at locations occupied by Gaussians, which is insufficient for learning a continuous SDF. Then, to address this limitation, we incorporate volumetric rendering and align the rendered geometric attributes (depth, normal) with those derived from 3D Gaussians. This consistency regularization introduces supervisory signals to locations not covered by discrete 3D Gaussians, effectively eliminating redundant surfaces outside the Gaussian sampling range. Our extensive experimental results demonstrate that our 3DGSR method enables high-quality 3D surface reconstruction while preserving the efficiency and rendering quality of 3DGS. Besides, our method competes favorably with leading surface reconstruction techniques while offering a more efficient learning process and much better rendering qualities. The code will be available at https://github.com/CVMI-Lab/3DGSR.", + "arxiv_url": "http://arxiv.org/abs/2404.00409v1", + "pdf_url": "http://arxiv.org/pdf/2404.00409v1", + "published_date": "2024-03-30", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "https://github.com/CVMI-Lab/3DGSR", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds", + "authors": [ + "Zhiwen Fan", + "Wenyan Cong", + "Kairun Wen", + "Kevin Wang", + "Jian Zhang", + "Xinghao Ding", + "Danfei Xu", + "Boris Ivanovic", + "Marco Pavone", + "Georgios Pavlakos", + "Zhangyang Wang", + "Yue Wang" + ], + "abstract": "While novel view synthesis (NVS) from a sparse set of images has advanced significantly in 3D computer vision, it relies on precise initial estimation of camera parameters using Structure-from-Motion (SfM). For instance, the recently developed Gaussian Splatting depends heavily on the accuracy of SfM-derived points and poses. However, SfM processes are time-consuming and often prove unreliable in sparse-view scenarios, where matched features are scarce, leading to accumulated errors and limited generalization capability across datasets. In this study, we introduce a novel and efficient framework to enhance robust NVS from sparse-view images. Our framework, InstantSplat, integrates multi-view stereo(MVS) predictions with point-based representations to construct 3D Gaussians of large-scale scenes from sparse-view data within seconds, addressing the aforementioned performance and efficiency issues by SfM. Specifically, InstantSplat generates densely populated surface points across all training views and determines the initial camera parameters using pixel-alignment. Nonetheless, the MVS points are not globally accurate, and the pixel-wise prediction from all views results in an excessive Gaussian number, yielding a overparameterized scene representation that compromises both training speed and accuracy. To address this issue, we employ a grid-based, confidence-aware Farthest Point Sampling to strategically position point primitives at representative locations in parallel. Next, we enhance pose accuracy and tune scene parameters through a gradient-based joint optimization framework from self-supervision. By employing this simplified framework, InstantSplat achieves a substantial reduction in training time, from hours to mere seconds, and demonstrates robust performance across various numbers of views in diverse datasets.", + "arxiv_url": "http://arxiv.org/abs/2403.20309v3", + "pdf_url": "http://arxiv.org/pdf/2403.20309v3", + "published_date": "2024-03-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Snap-it, Tap-it, Splat-it: Tactile-Informed 3D Gaussian Splatting for Reconstructing Challenging Surfaces", + "authors": [ + "Mauro Comi", + "Alessio Tonioni", + "Max Yang", + "Jonathan Tremblay", + "Valts Blukis", + "Yijiong Lin", + "Nathan F. Lepora", + "Laurence Aitchison" + ], + "abstract": "Touch and vision go hand in hand, mutually enhancing our ability to understand the world. From a research perspective, the problem of mixing touch and vision is underexplored and presents interesting challenges. To this end, we propose Tactile-Informed 3DGS, a novel approach that incorporates touch data (local depth maps) with multi-view vision data to achieve surface reconstruction and novel view synthesis. Our method optimises 3D Gaussian primitives to accurately model the object's geometry at points of contact. By creating a framework that decreases the transmittance at touch locations, we achieve a refined surface reconstruction, ensuring a uniformly smooth depth map. Touch is particularly useful when considering non-Lambertian objects (e.g. shiny or reflective surfaces) since contemporary methods tend to fail to reconstruct with fidelity specular highlights. By combining vision and tactile sensing, we achieve more accurate geometry reconstructions with fewer images than prior methods. We conduct evaluation on objects with glossy and reflective surfaces and demonstrate the effectiveness of our approach, offering significant improvements in reconstruction quality.", + "arxiv_url": "http://arxiv.org/abs/2403.20275v1", + "pdf_url": "http://arxiv.org/pdf/2403.20275v1", + "published_date": "2024-03-29", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes", + "authors": [ + "Ke Wu", + "Kaizhao Zhang", + "Zhiwei Zhang", + "Shanshuai Yuan", + "Muer Tie", + "Julong Wei", + "Zijun Xu", + "Jieru Zhao", + "Zhongxue Gan", + "Wenchao Ding" + ], + "abstract": "Online dense mapping of urban scenes forms a fundamental cornerstone for scene understanding and navigation of autonomous vehicles. Recent advancements in mapping methods are mainly based on NeRF, whose rendering speed is too slow to meet online requirements. 3D Gaussian Splatting (3DGS), with its rendering speed hundreds of times faster than NeRF, holds greater potential in online dense mapping. However, integrating 3DGS into a street-view dense mapping framework still faces two challenges, including incomplete reconstruction due to the absence of geometric information beyond the LiDAR coverage area and extensive computation for reconstruction in large urban scenes. To this end, we propose HGS-Mapping, an online dense mapping framework in unbounded large-scale scenes. To attain complete construction, our framework introduces Hybrid Gaussian Representation, which models different parts of the entire scene using Gaussians with distinct properties. Furthermore, we employ a hybrid Gaussian initialization mechanism and an adaptive update method to achieve high-fidelity and rapid reconstruction. To the best of our knowledge, we are the first to integrate Gaussian representation into online dense mapping of urban scenes. Our approach achieves SOTA reconstruction accuracy while only employing 66% number of Gaussians, leading to 20% faster reconstruction speed.", + "arxiv_url": "http://arxiv.org/abs/2403.20159v1", + "pdf_url": "http://arxiv.org/pdf/2403.20159v1", + "published_date": "2024-03-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior", + "authors": [ + "Zhongrui Yu", + "Haoran Wang", + "Jinze Yang", + "Hanzhang Wang", + "Zeke Xie", + "Yunfeng Cai", + "Jiale Cao", + "Zhong Ji", + "Mingming Sun" + ], + "abstract": "Novel View Synthesis (NVS) for street scenes play a critical role in the autonomous driving simulation. The current mainstream technique to achieve it is neural rendering, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). Although thrilling progress has been made, when handling street scenes, current methods struggle to maintain rendering quality at the viewpoint that deviates significantly from the training viewpoints. This issue stems from the sparse training views captured by a fixed camera on a moving vehicle. To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data. Specifically, we first fine-tune a Diffusion Model by adding images from adjacent frames as condition, meanwhile exploiting depth data from LiDAR point clouds to supply additional spatial information. Then we apply the Diffusion Model to regularize the 3DGS at unseen views during training. Experimental results validate the effectiveness of our method compared with current state-of-the-art models, and demonstrate its advance in rendering images from broader views.", + "arxiv_url": "http://arxiv.org/abs/2403.20079v1", + "pdf_url": "http://arxiv.org/pdf/2403.20079v1", + "published_date": "2024-03-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes", + "authors": [ + "Zhuopeng Li", + "Yilin Zhang", + "Chenming Wu", + "Jianke Zhu", + "Liangjun Zhang" + ], + "abstract": "The rapid growth of 3D Gaussian Splatting (3DGS) has revolutionized neural rendering, enabling real-time production of high-quality renderings. However, the previous 3DGS-based methods have limitations in urban scenes due to reliance on initial Structure-from-Motion(SfM) points and difficulties in rendering distant, sky and low-texture areas. To overcome these challenges, we propose a hybrid optimization method named HO-Gaussian, which combines a grid-based volume with the 3DGS pipeline. HO-Gaussian eliminates the dependency on SfM point initialization, allowing for rendering of urban scenes, and incorporates the Point Densitification to enhance rendering quality in problematic regions during training. Furthermore, we introduce Gaussian Direction Encoding as an alternative for spherical harmonics in the rendering pipeline, which enables view-dependent color representation. To account for multi-camera systems, we introduce neural warping to enhance object consistency across different cameras. Experimental results on widely used autonomous driving datasets demonstrate that HO-Gaussian achieves photo-realistic rendering in real-time on multi-camera urban datasets.", + "arxiv_url": "http://arxiv.org/abs/2403.20032v1", + "pdf_url": "http://arxiv.org/pdf/2403.20032v1", + "published_date": "2024-03-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond", + "authors": [ + "Chongjie Ye", + "Yinyu Nie", + "Jiahao Chang", + "Yuantao Chen", + "Yihao Zhi", + "Xiaoguang Han" + ], + "abstract": "We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdoor scenes and improves novel view synthesis. Finally, we propose Gaussian Splatting Surface Reconstruction (GauS), a novel render-then-fuse approach for high-fidelity mesh reconstruction from 3DGS inputs without fine-tuning. Overall, our GauStudio framework, hybrid representation, and GauS approach enhance 3DGS modeling and rendering capabilities, enabling higher-quality novel view synthesis and surface reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2403.19632v1", + "pdf_url": "http://arxiv.org/pdf/2403.19632v1", + "published_date": "2024-03-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing", + "authors": [ + "Xiaowei Song", + "Jv Zheng", + "Shiran Yuan", + "Huan-ang Gao", + "Jingwei Zhao", + "Xiang He", + "Weihao Gu", + "Hao Zhao" + ], + "abstract": "In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art method Mip-Splatting needs modifying the training procedure of Gaussian splatting, our method functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained Gaussian splatting field as a plugin to significantly improve the field's anti-alising performance. The core technique is to apply 2D scale-adaptive filters to each Gaussian during test time. As pointed out by Mip-Splatting, observing Gaussians at different frequencies leads to mismatches between the Gaussian scales during training and testing. Mip-Splatting resolves this issue using 3D smoothing and 2D Mip filters, which are unfortunately not aware of testing frequency. In this work, we show that a 2D scale-adaptive filter that is informed of testing frequency can effectively match the Gaussian scale, thus making the Gaussian primitive distribution remain consistent across different testing frequencies. When scale inconsistency is eliminated, sampling rates smaller than the scene frequency result in conventional jaggedness, and we propose to integrate the projected 2D Gaussian within each pixel during testing. This integration is actually a limiting case of super-sampling, which significantly improves anti-aliasing performance over vanilla Gaussian Splatting. Through extensive experiments using various settings and both bounded and unbounded scenes, we show SA-GS performs comparably with or better than Mip-Splatting. Note that super-sampling and integration are only effective when our scale-adaptive filtering is activated. Our codes, data and models are available at https://github.com/zsy1987/SA-GS.", + "arxiv_url": "http://arxiv.org/abs/2403.19615v1", + "pdf_url": "http://arxiv.org/pdf/2403.19615v1", + "published_date": "2024-03-28", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/zsy1987/SA-GS", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering", + "authors": [ + "Shuai Zhang", + "Huangxuan Zhao", + "Zhenghong Zhou", + "Guanjun Wu", + "Chuansheng Zheng", + "Xinggang Wang", + "Wenyu Liu" + ], + "abstract": "Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status and location of lesions. The current methods exhibit inadequate rendering quality in sparse views and suffer from slow rendering speed. To overcome these limitations, we propose TOGS, a Gaussian splatting method with opacity offset over time, which can effectively improve the rendering quality and speed of 4D DSA. We introduce an opacity offset table for each Gaussian to model the opacity offsets of the Gaussian, using these opacity-varying Gaussians to model the temporal variations in the radiance of the contrast agent. By interpolating the opacity offset table, the opacity variation of the Gaussian at different time points can be determined. This enables us to render the 2D DSA image at that specific moment. Additionally, we introduced a Smooth loss term in the loss function to mitigate overfitting issues that may arise in the model when dealing with sparse view scenarios. During the training phase, we randomly prune Gaussians, thereby reducing the storage overhead of the model. The experimental results demonstrate that compared to previous methods, this model achieves state-of-the-art render quality under the same number of training views. Additionally, it enables real-time rendering while maintaining low storage overhead. The code is available at https://github.com/hustvl/TOGS.", + "arxiv_url": "http://arxiv.org/abs/2403.19586v2", + "pdf_url": "http://arxiv.org/pdf/2403.19586v2", + "published_date": "2024-03-28", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "https://github.com/hustvl/TOGS", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians", + "authors": [ + "Avinash Paliwal", + "Wei Ye", + "Jinhui Xiong", + "Dmytro Kotovenko", + "Rakesh Ranjan", + "Vikas Chandra", + "Nima Khademi Kalantari" + ], + "abstract": "The field of 3D reconstruction from images has rapidly evolved in the past few years, first with the introduction of Neural Radiance Field (NeRF) and more recently with 3D Gaussian Splatting (3DGS). The latter provides a significant edge over NeRF in terms of the training and inference speed, as well as the reconstruction quality. Although 3DGS works well for dense input images, the unstructured point-cloud like representation quickly overfits to the more challenging setup of extremely sparse input images (e.g., 3 images), creating a representation that appears as a jumble of needles from novel views. To address this issue, we propose regularized optimization and depth-based initialization. Our key idea is to introduce a structured Gaussian representation that can be controlled in 2D image space. We then constraint the Gaussians, in particular their position, and prevent them from moving independently during optimization. Specifically, we introduce single and multiview constraints through an implicit convolutional decoder and a total variation loss, respectively. With the coherency introduced to the Gaussians, we further constrain the optimization through a flow-based loss function. To support our regularized optimization, we propose an approach to initialize the Gaussians using monocular depth estimates at each input view. We demonstrate significant improvements compared to the state-of-the-art sparse-view NeRF-based approaches on a variety of scenes.", + "arxiv_url": "http://arxiv.org/abs/2403.19495v1", + "pdf_url": "http://arxiv.org/pdf/2403.19495v1", + "published_date": "2024-03-28", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstruction", + "authors": [ + "Qiuhong Shen", + "Zike Wu", + "Xuanyu Yi", + "Pan Zhou", + "Hanwang Zhang", + "Shuicheng Yan", + "Xinchao Wang" + ], + "abstract": "We tackle the challenge of efficiently reconstructing a 3D asset from a single image at millisecond speed. Existing methods for single-image 3D reconstruction are primarily based on Score Distillation Sampling (SDS) with Neural 3D representations. Despite promising results, these approaches encounter practical limitations due to lengthy optimizations and significant memory consumption. In this work, we introduce Gamba, an end-to-end 3D reconstruction model from a single-view image, emphasizing two main insights: (1) Efficient Backbone Design: introducing a Mamba-based GambaFormer network to model 3D Gaussian Splatting (3DGS) reconstruction as sequential prediction with linear scalability of token length, thereby accommodating a substantial number of Gaussians; (2) Robust Gaussian Constraints: deriving radial mask constraints from multi-view masks to eliminate the need for warmup supervision of 3D point clouds in training. We trained Gamba on Objaverse and assessed it against existing optimization-based and feed-forward 3D reconstruction approaches on the GSO Dataset, among which Gamba is the only end-to-end trained single-view reconstruction model with 3DGS. Experimental results demonstrate its competitive generation capabilities both qualitatively and quantitatively and highlight its remarkable speed: Gamba completes reconstruction within 0.05 seconds on a single NVIDIA A100 GPU, which is about $1,000\\times$ faster than optimization-based methods. Please see our project page at https://florinshen.github.io/gamba-project.", + "arxiv_url": "http://arxiv.org/abs/2403.18795v3", + "pdf_url": "http://arxiv.org/pdf/2403.18795v3", + "published_date": "2024-03-27", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface", + "authors": [ + "Jiahao Luo", + "Jing Liu", + "James Davis" + ], + "abstract": "We present SplatFace, a novel Gaussian splatting framework designed for 3D human face reconstruction without reliance on accurate pre-determined geometry. Our method is designed to simultaneously deliver both high-quality novel view rendering and accurate 3D mesh reconstructions. We incorporate a generic 3D Morphable Model (3DMM) to provide a surface geometric structure, making it possible to reconstruct faces with a limited set of input images. We introduce a joint optimization strategy that refines both the Gaussians and the morphable surface through a synergistic non-rigid alignment process. A novel distance metric, splat-to-surface, is proposed to improve alignment by considering both the Gaussian position and covariance. The surface information is also utilized to incorporate a world-space densification process, resulting in superior reconstruction quality. Our experimental analysis demonstrates that the proposed method is competitive with both other Gaussian splatting techniques in novel view synthesis and other 3D reconstruction methods in producing 3D face meshes with high geometric precision.", + "arxiv_url": "http://arxiv.org/abs/2403.18784v3", + "pdf_url": "http://arxiv.org/pdf/2403.18784v3", + "published_date": "2024-03-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Modeling uncertainty for Gaussian Splatting", + "authors": [ + "Luca Savant", + "Diego Valsesia", + "Enrico Magli" + ], + "abstract": "We present Stochastic Gaussian Splatting (SGS): the first framework for uncertainty estimation using Gaussian Splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of Neural Radiance Fields (NeRF). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this paper, we introduce a Variational Inference-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. Additionally, we introduce the Area Under Sparsification Error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the LLFF dataset demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications.", + "arxiv_url": "http://arxiv.org/abs/2403.18476v1", + "pdf_url": "http://arxiv.org/pdf/2403.18476v1", + "published_date": "2024-03-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EgoLifter: Open-world 3D Segmentation for Egocentric Perception", + "authors": [ + "Qiao Gu", + "Zhaoyang Lv", + "Duncan Frost", + "Simon Green", + "Julian Straub", + "Chris Sweeney" + ], + "abstract": "In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale.", + "arxiv_url": "http://arxiv.org/abs/2403.18118v2", + "pdf_url": "http://arxiv.org/pdf/2403.18118v2", + "published_date": "2024-03-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians", + "authors": [ + "Kerui Ren", + "Lihan Jiang", + "Tao Lu", + "Mulin Yu", + "Linning Xu", + "Zhangkai Ni", + "Bo Dai" + ], + "abstract": "The recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularly noticeable in zoom-out views and can lead to inconsistent rendering speeds in scenes with varying details. Moreover, it often struggles to capture the corresponding level of details at different scales with its heuristic density control operation. Inspired by the Level-of-Detail (LOD) techniques, we introduce Octree-GS, featuring an LOD-structured 3D Gaussian approach supporting level-of-detail decomposition for scene representation that contributes to the final rendering results. Our model dynamically selects the appropriate level from the set of multi-resolution anchor points, ensuring consistent rendering performance with adaptive LOD adjustments while maintaining high-fidelity rendering results.", + "arxiv_url": "http://arxiv.org/abs/2403.17898v2", + "pdf_url": "http://arxiv.org/pdf/2403.17898v2", + "published_date": "2024-03-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "2D Gaussian Splatting for Geometrically Accurate Radiance Fields", + "authors": [ + "Binbin Huang", + "Zehao Yu", + "Anpei Chen", + "Andreas Geiger", + "Shenghua Gao" + ], + "abstract": "3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance fields from multi-view images. Our key idea is to collapse the 3D volume into a set of 2D oriented planar Gaussian disks. Unlike 3D Gaussians, 2D Gaussians provide view-consistent geometry while modeling surfaces intrinsically. To accurately recover thin surfaces and achieve stable optimization, we introduce a perspective-correct 2D splatting process utilizing ray-splat intersection and rasterization. Additionally, we incorporate depth distortion and normal consistency terms to further enhance the quality of the reconstructions. We demonstrate that our differentiable renderer allows for noise-free and detailed geometry reconstruction while maintaining competitive appearance quality, fast training speed, and real-time rendering.", + "arxiv_url": "http://arxiv.org/abs/2403.17888v2", + "pdf_url": "http://arxiv.org/pdf/2403.17888v2", + "published_date": "2024-03-26", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DN-Splatter: Depth and Normal Priors for Gaussian Splatting and Meshing", + "authors": [ + "Matias Turkulainen", + "Xuqian Ren", + "Iaroslav Melekhov", + "Otto Seiskari", + "Esa Rahtu", + "Juho Kannala" + ], + "abstract": "High-fidelity 3D reconstruction of common indoor scenes is crucial for VR and AR applications. 3D Gaussian splatting, a novel differentiable rendering technique, has achieved state-of-the-art novel view synthesis results with high rendering speeds and relatively low training times. However, its performance on scenes commonly seen in indoor datasets is poor due to the lack of geometric constraints during optimization. In this work, we explore the use of readily accessible geometric cues to enhance Gaussian splatting optimization in challenging, ill-posed, and textureless scenes. We extend 3D Gaussian splatting with depth and normal cues to tackle challenging indoor datasets and showcase techniques for efficient mesh extraction. Specifically, we regularize the optimization procedure with depth information, enforce local smoothness of nearby Gaussians, and use off-the-shelf monocular networks to achieve better alignment with the true scene geometry. We propose an adaptive depth loss based on the gradient of color images, improving depth estimation and novel view synthesis results over various baselines. Our simple yet effective regularization technique enables direct mesh extraction from the Gaussian representation, yielding more physically accurate reconstructions of indoor scenes.", + "arxiv_url": "http://arxiv.org/abs/2403.17822v3", + "pdf_url": "http://arxiv.org/pdf/2403.17822v3", + "published_date": "2024-03-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamPolisher: Towards High-Quality Text-to-3D Generation via Geometric Diffusion", + "authors": [ + "Yuanze Lin", + "Ronald Clark", + "Philip Torr" + ], + "abstract": "We present DreamPolisher, a novel Gaussian Splatting based method with geometric guidance, tailored to learn cross-view consistency and intricate detail from textual descriptions. While recent progress on text-to-3D generation methods have been promising, prevailing methods often fail to ensure view-consistency and textural richness. This problem becomes particularly noticeable for methods that work with text input alone. To address this, we propose a two-stage Gaussian Splatting based approach that enforces geometric consistency among views. Initially, a coarse 3D generation undergoes refinement via geometric optimization. Subsequently, we use a ControlNet driven refiner coupled with the geometric consistency term to improve both texture fidelity and overall consistency of the generated 3D asset. Empirical evaluations across diverse textual prompts spanning various object categories demonstrate the efficacy of DreamPolisher in generating consistent and realistic 3D objects, aligning closely with the semantics of the textual instructions.", + "arxiv_url": "http://arxiv.org/abs/2403.17237v1", + "pdf_url": "http://arxiv.org/pdf/2403.17237v1", + "published_date": "2024-03-25", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction", + "authors": [ + "Mulin Yu", + "Tao Lu", + "Linning Xu", + "Lihan Jiang", + "Yuanbo Xiangli", + "Bo Dai" + ], + "abstract": "Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural implicit surfaces is sparked from the success of neural rendering. Current works either constrain the distribution of density fields or the shape of primitives, resulting in degraded rendering quality and flaws on the learned scene surfaces. The efficacy of such methods is limited by the inherent constraints of the chosen neural representation, which struggles to capture fine surface details, especially for larger, more intricate scenes. To address these issues, we introduce GSDF, a novel dual-branch architecture that combines the benefits of a flexible and efficient 3D Gaussian Splatting (3DGS) representation with neural Signed Distance Fields (SDF). The core idea is to leverage and enhance the strengths of each branch while alleviating their limitation through mutual guidance and joint supervision. We show on diverse scenes that our design unlocks the potential for more accurate and detailed surface reconstructions, and at the meantime benefits 3DGS rendering with structures that are more aligned with the underlying geometry.", + "arxiv_url": "http://arxiv.org/abs/2403.16964v2", + "pdf_url": "http://arxiv.org/pdf/2403.16964v2", + "published_date": "2024-03-25", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction", + "authors": [ + "Christopher Wewer", + "Kevin Raj", + "Eddy Ilg", + "Bernt Schiele", + "Jan Eric Lenssen" + ], + "abstract": "We present latentSplat, a method to predict semantic Gaussians in a 3D latent space that can be splatted and decoded by a light-weight generative 2D architecture. Existing methods for generalizable 3D reconstruction either do not scale to large scenes and resolutions, or are limited to interpolation of close input views. latentSplat combines the strengths of regression-based and generative approaches while being trained purely on readily available real video data. The core of our method are variational 3D Gaussians, a representation that efficiently encodes varying uncertainty within a latent space consisting of 3D feature Gaussians. From these Gaussians, specific instances can be sampled and rendered via efficient splatting and a fast, generative decoder. We show that latentSplat outperforms previous works in reconstruction quality and generalization, while being fast and scalable to high-resolution data.", + "arxiv_url": "http://arxiv.org/abs/2403.16292v2", + "pdf_url": "http://arxiv.org/pdf/2403.16292v2", + "published_date": "2024-03-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field", + "authors": [ + "Jiarui Hu", + "Xianhao Chen", + "Boyin Feng", + "Guanglin Li", + "Liangjing Yang", + "Hujun Bao", + "Guofeng Zhang", + "Zhaopeng Cui" + ], + "abstract": "Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system, i.e., CG-SLAM, based on a novel uncertainty-aware 3D Gaussian field with high consistency and geometric stability. Through an in-depth analysis of Gaussian Splatting, we propose several techniques to construct a consistent and stable 3D Gaussian field suitable for tracking and mapping. Additionally, a novel depth uncertainty model is proposed to ensure the selection of valuable Gaussian primitives during optimization, thereby improving tracking efficiency and accuracy. Experiments on various datasets demonstrate that CG-SLAM achieves superior tracking and mapping performance with a notable tracking speed of up to 15 Hz. We will make our source code publicly available. Project page: https://zju3dv.github.io/cg-slam.", + "arxiv_url": "http://arxiv.org/abs/2403.16095v1", + "pdf_url": "http://arxiv.org/pdf/2403.16095v1", + "published_date": "2024-03-24", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian in the Wild: 3D Gaussian Splatting for Unconstrained Image Collections", + "authors": [ + "Dongbin Zhang", + "Chuming Wang", + "Weitao Wang", + "Peihao Li", + "Minghan Qin", + "Haoqian Wang" + ], + "abstract": "Novel view synthesis from unconstrained in-the-wild images remains a meaningful but challenging task. The photometric variation and transient occluders in those unconstrained images make it difficult to reconstruct the original scene accurately. Previous approaches tackle the problem by introducing a global appearance feature in Neural Radiance Fields (NeRF). However, in the real world, the unique appearance of each tiny point in a scene is determined by its independent intrinsic material attributes and the varying environmental impacts it receives. Inspired by this fact, we propose Gaussian in the wild (GS-W), a method that uses 3D Gaussian points to reconstruct the scene and introduces separated intrinsic and dynamic appearance feature for each point, capturing the unchanged scene appearance along with dynamic variation like illumination and weather. Additionally, an adaptive sampling strategy is presented to allow each Gaussian point to focus on the local and detailed information more effectively. We also reduce the impact of transient occluders using a 2D visibility map. More experiments have demonstrated better reconstruction quality and details of GS-W compared to NeRF-based methods, with a faster rendering speed. Video results and code are available at https://eastbeanzhang.github.io/GS-W/.", + "arxiv_url": "http://arxiv.org/abs/2403.15704v2", + "pdf_url": "http://arxiv.org/pdf/2403.15704v2", + "published_date": "2024-03-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting", + "authors": [ + "Jun Guo", + "Xiaojian Ma", + "Yue Fan", + "Huaping Liu", + "Qing Li" + ], + "abstract": "Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, with wide-ranging applications in embodied agents and augmented reality systems. Existing methods adopt neurel rendering methods as 3D representations and jointly optimize color and semantic features to achieve rendering and scene understanding simultaneously. In this paper, we introduce Semantic Gaussians, a novel open-vocabulary scene understanding approach based on 3D Gaussian Splatting. Our key idea is to distill knowledge from 2D pre-trained models to 3D Gaussians. Unlike existing methods, we design a versatile projection approach that maps various 2D semantic features from pre-trained image encoders into a novel semantic component of 3D Gaussians, which is based on spatial relationship and need no additional training. We further build a 3D semantic network that directly predicts the semantic component from raw 3D Gaussians for fast inference. The quantitative results on ScanNet segmentation and LERF object localization demonstates the superior performance of our method. Additionally, we explore several applications of Semantic Gaussians including object part segmentation, instance segmentation, scene editing, and spatiotemporal segmentation with better qualitative results over 2D and 3D baselines, highlighting its versatility and effectiveness on supporting diverse downstream tasks.", + "arxiv_url": "http://arxiv.org/abs/2403.15624v2", + "pdf_url": "http://arxiv.org/pdf/2403.15624v2", + "published_date": "2024-03-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting", + "authors": [ + "Zheng Zhang", + "Wenbo Hu", + "Yixing Lao", + "Tong He", + "Hengshuang Zhao" + ], + "abstract": "3D Gaussian Splatting (3DGS) has demonstrated impressive novel view synthesis results while advancing real-time rendering performance. However, it relies heavily on the quality of the initial point cloud, resulting in blurring and needle-like artifacts in areas with insufficient initializing points. This is mainly attributed to the point cloud growth condition in 3DGS that only considers the average gradient magnitude of points from observable views, thereby failing to grow for large Gaussians that are observable for many viewpoints while many of them are only covered in the boundaries. To this end, we propose a novel method, named Pixel-GS, to take into account the number of pixels covered by the Gaussian in each view during the computation of the growth condition. We regard the covered pixel numbers as the weights to dynamically average the gradients from different views, such that the growth of large Gaussians can be prompted. As a result, points within the areas with insufficient initializing points can be grown more effectively, leading to a more accurate and detailed reconstruction. In addition, we propose a simple yet effective strategy to scale the gradient field according to the distance to the camera, to suppress the growth of floaters near the camera. Extensive experiments both qualitatively and quantitatively demonstrate that our method achieves state-of-the-art rendering quality while maintaining real-time rendering speed, on the challenging Mip-NeRF 360 and Tanks & Temples datasets.", + "arxiv_url": "http://arxiv.org/abs/2403.15530v1", + "pdf_url": "http://arxiv.org/pdf/2403.15530v1", + "published_date": "2024-03-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries using Gaussian Splatting", + "authors": [ + "Kailing Wang", + "Chen Yang", + "Yuehao Wang", + "Sikuang Li", + "Yan Wang", + "Qi Dou", + "Xiaokang Yang", + "Wei Shen" + ], + "abstract": "Precise camera tracking, high-fidelity 3D tissue reconstruction, and real-time online visualization are critical for intrabody medical imaging devices such as endoscopes and capsule robots. However, existing SLAM (Simultaneous Localization and Mapping) methods often struggle to achieve both complete high-quality surgical field reconstruction and efficient computation, restricting their intraoperative applications among endoscopic surgeries. In this paper, we introduce EndoGSLAM, an efficient SLAM approach for endoscopic surgeries, which integrates streamlined Gaussian representation and differentiable rasterization to facilitate over 100 fps rendering speed during online camera tracking and tissue reconstructing. Extensive experiments show that EndoGSLAM achieves a better trade-off between intraoperative availability and reconstruction quality than traditional or neural SLAM approaches, showing tremendous potential for endoscopic surgeries. The project page is at https://EndoGSLAM.loping151.com", + "arxiv_url": "http://arxiv.org/abs/2403.15124v1", + "pdf_url": "http://arxiv.org/pdf/2403.15124v1", + "published_date": "2024-03-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians", + "authors": [ + "Yifei Zeng", + "Yanqin Jiang", + "Siyu Zhu", + "Yuanxun Lu", + "Youtian Lin", + "Hao Zhu", + "Weiming Hu", + "Xun Cao", + "Yao Yao" + ], + "abstract": "Recent progress in pre-trained diffusion models and 3D generation have spurred interest in 4D content creation. However, achieving high-fidelity 4D generation with spatial-temporal consistency remains a challenge. In this work, we propose STAG4D, a novel framework that combines pre-trained diffusion models with dynamic 3D Gaussian splatting for high-fidelity 4D generation. Drawing inspiration from 3D generation techniques, we utilize a multi-view diffusion model to initialize multi-view images anchoring on the input video frames, where the video can be either real-world captured or generated by a video diffusion model. To ensure the temporal consistency of the multi-view sequence initialization, we introduce a simple yet effective fusion strategy to leverage the first frame as a temporal anchor in the self-attention computation. With the almost consistent multi-view sequences, we then apply the score distillation sampling to optimize the 4D Gaussian point cloud. The 4D Gaussian spatting is specially crafted for the generation task, where an adaptive densification strategy is proposed to mitigate the unstable Gaussian gradient for robust optimization. Notably, the proposed pipeline does not require any pre-training or fine-tuning of diffusion networks, offering a more accessible and practical solution for the 4D generation task. Extensive experiments demonstrate that our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video.", + "arxiv_url": "http://arxiv.org/abs/2403.14939v1", + "pdf_url": "http://arxiv.org/pdf/2403.14939v1", + "published_date": "2024-03-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images", + "authors": [ + "Yuedong Chen", + "Haofei Xu", + "Chuanxia Zheng", + "Bohan Zhuang", + "Marc Pollefeys", + "Andreas Geiger", + "Tat-Jen Cham", + "Jianfei Cai" + ], + "abstract": "We introduce MVSplat, an efficient model that, given sparse multi-view images as input, predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we build a cost volume representation via plane sweeping, where the cross-view feature similarities stored in the cost volume can provide valuable geometry cues to the estimation of depth. We also learn other Gaussian primitives' parameters jointly with the Gaussian centers while only relying on photometric supervision. We demonstrate the importance of the cost volume representation in learning feed-forward Gaussians via extensive experimental evaluations. On the large-scale RealEstate10K and ACID benchmarks, MVSplat achieves state-of-the-art performance with the fastest feed-forward inference speed (22~fps). More impressively, compared to the latest state-of-the-art method pixelSplat, MVSplat uses $10\\times$ fewer parameters and infers more than $2\\times$ faster while providing higher appearance and geometry quality as well as better cross-dataset generalization.", + "arxiv_url": "http://arxiv.org/abs/2403.14627v2", + "pdf_url": "http://arxiv.org/pdf/2403.14627v2", + "published_date": "2024-03-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation", + "authors": [ + "Yinghao Xu", + "Zifan Shi", + "Wang Yifan", + "Hansheng Chen", + "Ceyuan Yang", + "Sida Peng", + "Yujun Shen", + "Gordon Wetzstein" + ], + "abstract": "We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer architecture and the use of 3D Gaussians unlock a scalable and efficient reconstruction framework. Extensive experimental results demonstrate the superiority of our method over alternatives regarding both reconstruction quality and efficiency. We also showcase the potential of GRM in generative tasks, i.e., text-to-3D and image-to-3D, by integrating it with existing multi-view diffusion models. Our project website is at: https://justimyhxu.github.io/projects/grm/.", + "arxiv_url": "http://arxiv.org/abs/2403.14621v1", + "pdf_url": "http://arxiv.org/pdf/2403.14621v1", + "published_date": "2024-03-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering", + "authors": [ + "Antoine Guédon", + "Vincent Lepetit" + ], + "abstract": "We propose Gaussian Frosting, a novel mesh-based representation for high-quality rendering and editing of complex 3D effects in real-time. Our approach builds on the recent 3D Gaussian Splatting framework, which optimizes a set of 3D Gaussians to approximate a radiance field from images. We propose first extracting a base mesh from Gaussians during optimization, then building and refining an adaptive layer of Gaussians with a variable thickness around the mesh to better capture the fine details and volumetric effects near the surface, such as hair or grass. We call this layer Gaussian Frosting, as it resembles a coating of frosting on a cake. The fuzzier the material, the thicker the frosting. We also introduce a parameterization of the Gaussians to enforce them to stay inside the frosting layer and automatically adjust their parameters when deforming, rescaling, editing or animating the mesh. Our representation allows for efficient rendering using Gaussian splatting, as well as editing and animation by modifying the base mesh. We demonstrate the effectiveness of our method on various synthetic and real scenes, and show that it outperforms existing surface-based approaches. We will release our code and a web-based viewer as additional contributions. Our project page is the following: https://anttwo.github.io/frosting/", + "arxiv_url": "http://arxiv.org/abs/2403.14554v1", + "pdf_url": "http://arxiv.org/pdf/2403.14554v1", + "published_date": "2024-03-21", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression", + "authors": [ + "Yihang Chen", + "Qianyi Wu", + "Weiyao Lin", + "Mehrtash Harandi", + "Jianfei Cai" + ], + "abstract": "3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity. However, the substantial Gaussians and their associated attributes necessitate effective compression techniques. Nevertheless, the sparse and unorganized nature of the point cloud of Gaussians (or anchors in our paper) presents challenges for compression. To address this, we make use of the relations between the unorganized anchors and the structured hash grid, leveraging their mutual information for context modeling, and propose a Hash-grid Assisted Context (HAC) framework for highly compact 3DGS representation. Our approach introduces a binary hash grid to establish continuous spatial consistencies, allowing us to unveil the inherent spatial relations of anchors through a carefully designed context model. To facilitate entropy coding, we utilize Gaussian distributions to accurately estimate the probability of each quantized attribute, where an adaptive quantization module is proposed to enable high-precision quantization of these attributes for improved fidelity restoration. Additionally, we incorporate an adaptive masking strategy to eliminate invalid Gaussians and anchors. Importantly, our work is the pioneer to explore context-based compression for 3DGS representation, resulting in a remarkable size reduction of over $75\\times$ compared to vanilla 3DGS, while simultaneously improving fidelity, and achieving over $11\\times$ size reduction over SOTA 3DGS compression approach Scaffold-GS. Our code is available here: https://github.com/YihangChen-ee/HAC", + "arxiv_url": "http://arxiv.org/abs/2403.14530v3", + "pdf_url": "http://arxiv.org/pdf/2403.14530v3", + "published_date": "2024-03-21", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/YihangChen-ee/HAC", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SyncTweedies: A General Generative Framework Based on Synchronized Diffusions", + "authors": [ + "Jaihoon Kim", + "Juil Koo", + "Kyeongmin Yeo", + "Minhyuk Sung" + ], + "abstract": "We introduce a general framework for generating diverse visual content, including ambiguous images, panorama images, mesh textures, and Gaussian splat textures, by synchronizing multiple diffusion processes. We present exhaustive investigation into all possible scenarios for synchronizing multiple diffusion processes through a canonical space and analyze their characteristics across applications. In doing so, we reveal a previously unexplored case: averaging the outputs of Tweedie's formula while conducting denoising in multiple instance spaces. This case also provides the best quality with the widest applicability to downstream tasks. We name this case SyncTweedies. In our experiments generating visual content aforementioned, we demonstrate the superior quality of generation by SyncTweedies compared to other synchronization methods, optimization-based and iterative-update-based methods.", + "arxiv_url": "http://arxiv.org/abs/2403.14370v4", + "pdf_url": "http://arxiv.org/pdf/2403.14370v4", + "published_date": "2024-03-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Isotropic Gaussian Splatting for Real-Time Radiance Field Rendering", + "authors": [ + "Yuanhao Gong", + "Lantao Yu", + "Guanghui Yue" + ], + "abstract": "The 3D Gaussian splatting method has drawn a lot of attention, thanks to its high performance in training and high quality of the rendered image. However, it uses anisotropic Gaussian kernels to represent the scene. Although such anisotropic kernels have advantages in representing the geometry, they lead to difficulties in terms of computation, such as splitting or merging two kernels. In this paper, we propose to use isotropic Gaussian kernels to avoid such difficulties in the computation, leading to a higher performance method. The experiments confirm that the proposed method is about {\\bf 100X} faster without losing the geometry representation accuracy. The proposed method can be applied in a large range applications where the radiance field is needed, such as 3D reconstruction, view synthesis, and dynamic object modeling.", + "arxiv_url": "http://arxiv.org/abs/2403.14244v1", + "pdf_url": "http://arxiv.org/pdf/2403.14244v1", + "published_date": "2024-03-21", + "categories": [ + "cs.CV", + "cs.AI", + "cs.LG", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians", + "authors": [ + "Guangchi Fang", + "Bing Wang" + ], + "abstract": "In this study, we explore the challenge of efficiently representing scenes with a constrained number of Gaussians. Our analysis shifts from traditional graphics and 2D computer vision to the perspective of point clouds, highlighting the inefficient spatial distribution of Gaussian representation as a key limitation in model performance. To address this, we introduce strategies for densification including blur split and depth reinitialization, and simplification through intersection preserving and sampling. These techniques reorganize the spatial positions of the Gaussians, resulting in significant improvements across various datasets and benchmarks in terms of rendering quality, resource consumption, and storage compression. Our Mini-Splatting integrates seamlessly with the original rasterization pipeline, providing a strong baseline for future research in Gaussian-Splatting-based works. \\href{https://github.com/fatPeter/mini-splatting}{Code is available}.", + "arxiv_url": "http://arxiv.org/abs/2403.14166v3", + "pdf_url": "http://arxiv.org/pdf/2403.14166v3", + "published_date": "2024-03-21", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/fatPeter/mini-splatting", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RadSplat: Radiance Field-Informed Gaussian Splatting for Robust Real-Time Rendering with 900+ FPS", + "authors": [ + "Michael Niemeyer", + "Fabian Manhardt", + "Marie-Julie Rakotosaona", + "Michael Oechsle", + "Daniel Duckworth", + "Rama Gosula", + "Keisuke Tateno", + "John Bates", + "Dominik Kaeser", + "Federico Tombari" + ], + "abstract": "Recent advances in view synthesis and real-time rendering have achieved photorealistic quality at impressive rendering speeds. While Radiance Field-based methods achieve state-of-the-art quality in challenging scenarios such as in-the-wild captures and large-scale scenes, they often suffer from excessively high compute requirements linked to volumetric rendering. Gaussian Splatting-based methods, on the other hand, rely on rasterization and naturally achieve real-time rendering but suffer from brittle optimization heuristics that underperform on more challenging scenes. In this work, we present RadSplat, a lightweight method for robust real-time rendering of complex scenes. Our main contributions are threefold. First, we use radiance fields as a prior and supervision signal for optimizing point-based scene representations, leading to improved quality and more robust optimization. Next, we develop a novel pruning technique reducing the overall point count while maintaining high quality, leading to smaller and more compact scene representations with faster inference speeds. Finally, we propose a novel test-time filtering approach that further accelerates rendering and allows to scale to larger, house-sized scenes. We find that our method enables state-of-the-art synthesis of complex captures at 900+ FPS.", + "arxiv_url": "http://arxiv.org/abs/2403.13806v1", + "pdf_url": "http://arxiv.org/pdf/2403.13806v1", + "published_date": "2024-03-20", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting on the Move: Blur and Rolling Shutter Compensation for Natural Camera Motion", + "authors": [ + "Otto Seiskari", + "Jerry Ylilammi", + "Valtteri Kaatrasalo", + "Pekka Rantalankila", + "Matias Turkulainen", + "Juho Kannala", + "Esa Rahtu", + "Arno Solin" + ], + "abstract": "High-quality scene reconstruction and novel view synthesis based on Gaussian Splatting (3DGS) typically require steady, high-quality photographs, often impractical to capture with handheld cameras. We present a method that adapts to camera motion and allows high-quality scene reconstruction with handheld video data suffering from motion blur and rolling shutter distortion. Our approach is based on detailed modelling of the physical image formation process and utilizes velocities estimated using visual-inertial odometry (VIO). Camera poses are considered non-static during the exposure time of a single image frame and camera poses are further optimized in the reconstruction process. We formulate a differentiable rendering pipeline that leverages screen space approximation to efficiently incorporate rolling-shutter and motion blur effects into the 3DGS framework. Our results with both synthetic and real data demonstrate superior performance in mitigating camera motion over existing methods, thereby advancing 3DGS in naturalistic settings.", + "arxiv_url": "http://arxiv.org/abs/2403.13327v3", + "pdf_url": "http://arxiv.org/pdf/2403.13327v3", + "published_date": "2024-03-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GVGEN: Text-to-3D Generation with Volumetric Representation", + "authors": [ + "Xianglong He", + "Junyi Chen", + "Sida Peng", + "Di Huang", + "Yangguang Li", + "Xiaoshui Huang", + "Chun Yuan", + "Wanli Ouyang", + "Tong He" + ], + "abstract": "In recent years, 3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities. To address these shortcomings, this paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input. We propose two innovative techniques:(1) Structured Volumetric Representation. We first arrange disorganized 3D Gaussian points as a structured form GaussianVolume. This transformation allows the capture of intricate texture details within a volume composed of a fixed number of Gaussians. To better optimize the representation of these details, we propose a unique pruning and densifying method named the Candidate Pool Strategy, enhancing detail fidelity through selective optimization. (2) Coarse-to-fine Generation Pipeline. To simplify the generation of GaussianVolume and empower the model to generate instances with detailed 3D geometry, we propose a coarse-to-fine pipeline. It initially constructs a basic geometric structure, followed by the prediction of complete Gaussian attributes. Our framework, GVGEN, demonstrates superior performance in qualitative and quantitative assessments compared to existing 3D generation methods. Simultaneously, it maintains a fast generation speed ($\\sim$7 seconds), effectively striking a balance between quality and efficiency. Our project page is: https://gvgen.github.io/", + "arxiv_url": "http://arxiv.org/abs/2403.12957v2", + "pdf_url": "http://arxiv.org/pdf/2403.12957v2", + "published_date": "2024-03-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting", + "authors": [ + "Hongyu Zhou", + "Jiahao Shao", + "Lu Xu", + "Dongfeng Bai", + "Weichao Qiu", + "Bingbing Liu", + "Yue Wang", + "Andreas Geiger", + "Yiyi Liao" + ], + "abstract": "Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manually annotated 3D bounding boxes. In this paper, we introduce a novel pipeline that utilizes 3D Gaussian Splatting for holistic urban scene understanding. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians, where moving object poses are regularized via physical constraints. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy, and reconstruct dynamic scenes, even in scenarios where 3D bounding box detection are highly noisy. Experimental results on KITTI, KITTI-360, and Virtual KITTI 2 demonstrate the effectiveness of our approach.", + "arxiv_url": "http://arxiv.org/abs/2403.12722v1", + "pdf_url": "http://arxiv.org/pdf/2403.12722v1", + "published_date": "2024-03-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "RGBD GS-ICP SLAM", + "authors": [ + "Seongbo Ha", + "Jiung Yeon", + "Hyeonwoo Yu" + ], + "abstract": "Simultaneous Localization and Mapping (SLAM) with dense representation plays a key role in robotics, Virtual Reality (VR), and Augmented Reality (AR) applications. Recent advancements in dense representation SLAM have highlighted the potential of leveraging neural scene representation and 3D Gaussian representation for high-fidelity spatial representation. In this paper, we propose a novel dense representation SLAM approach with a fusion of Generalized Iterative Closest Point (G-ICP) and 3D Gaussian Splatting (3DGS). In contrast to existing methods, we utilize a single Gaussian map for both tracking and mapping, resulting in mutual benefits. Through the exchange of covariances between tracking and mapping processes with scale alignment techniques, we minimize redundant computations and achieve an efficient system. Additionally, we enhance tracking accuracy and mapping quality through our keyframe selection methods. Experimental results demonstrate the effectiveness of our approach, showing an incredibly fast speed up to 107 FPS (for the entire system) and superior quality of the reconstructed map.", + "arxiv_url": "http://arxiv.org/abs/2403.12550v2", + "pdf_url": "http://arxiv.org/pdf/2403.12550v2", + "published_date": "2024-03-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "High-Fidelity SLAM Using Gaussian Splatting with Rendering-Guided Densification and Regularized Optimization", + "authors": [ + "Shuo Sun", + "Malcolm Mielle", + "Achim J. Lilienthal", + "Martin Magnusson" + ], + "abstract": "We propose a dense RGBD SLAM system based on 3D Gaussian Splatting that provides metrically accurate pose tracking and visually realistic reconstruction. To this end, we first propose a Gaussian densification strategy based on the rendering loss to map unobserved areas and refine reobserved areas. Second, we introduce extra regularization parameters to alleviate the forgetting problem in the continuous mapping problem, where parameters tend to overfit the latest frame and result in decreasing rendering quality for previous frames. Both mapping and tracking are performed with Gaussian parameters by minimizing re-rendering loss in a differentiable way. Compared to recent neural and concurrently developed gaussian splatting RGBD SLAM baselines, our method achieves state-of-the-art results on the synthetic dataset Replica and competitive results on the real-world dataset TUM.", + "arxiv_url": "http://arxiv.org/abs/2403.12535v2", + "pdf_url": "http://arxiv.org/pdf/2403.12535v2", + "published_date": "2024-03-19", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation", + "authors": [ + "Quankai Gao", + "Qiangeng Xu", + "Zhe Cao", + "Ben Mildenhall", + "Wenchao Ma", + "Le Chen", + "Danhang Tang", + "Ulrich Neumann" + ], + "abstract": "Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. Project page: https://zerg-overmind.github.io/GaussianFlow.github.io/", + "arxiv_url": "http://arxiv.org/abs/2403.12365v2", + "pdf_url": "http://arxiv.org/pdf/2403.12365v2", + "published_date": "2024-03-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VideoMV: Consistent Multi-View Generation Based on Large Video Generative Model", + "authors": [ + "Qi Zuo", + "Xiaodong Gu", + "Lingteng Qiu", + "Yuan Dong", + "Zhengyi Zhao", + "Weihao Yuan", + "Rui Peng", + "Siyu Zhu", + "Zilong Dong", + "Liefeng Bo", + "Qixing Huang" + ], + "abstract": "Generating multi-view images based on text or single-image prompts is a critical capability for the creation of 3D content. Two fundamental questions on this topic are what data we use for training and how to ensure multi-view consistency. This paper introduces a novel framework that makes fundamental contributions to both questions. Unlike leveraging images from 2D diffusion models for training, we propose a dense consistent multi-view generation model that is fine-tuned from off-the-shelf video generative models. Images from video generative models are more suitable for multi-view generation because the underlying network architecture that generates them employs a temporal module to enforce frame consistency. Moreover, the video data sets used to train these models are abundant and diverse, leading to a reduced train-finetuning domain gap. To enhance multi-view consistency, we introduce a 3D-Aware Denoising Sampling, which first employs a feed-forward reconstruction module to get an explicit global 3D model, and then adopts a sampling strategy that effectively involves images rendered from the global 3D model into the denoising sampling loop to improve the multi-view consistency of the final images. As a by-product, this module also provides a fast way to create 3D assets represented by 3D Gaussians within a few seconds. Our approach can generate 24 dense views and converges much faster in training than state-of-the-art approaches (4 GPU hours versus many thousand GPU hours) with comparable visual quality and consistency. By further fine-tuning, our approach outperforms existing state-of-the-art methods in both quantitative metrics and visual effects. Our project page is aigc3d.github.io/VideoMV.", + "arxiv_url": "http://arxiv.org/abs/2403.12010v1", + "pdf_url": "http://arxiv.org/pdf/2403.12010v1", + "published_date": "2024-03-18", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Reinforcement Learning with Generalizable Gaussian Splatting", + "authors": [ + "Jiaxu Wang", + "Qiang Zhang", + "Jingkai Sun", + "Jiahang Cao", + "Gang Han", + "Wen Zhao", + "Weining Zhang", + "Yecheng Shao", + "Yijie Guo", + "Renjing Xu" + ], + "abstract": "An excellent representation is crucial for reinforcement learning (RL) performance, especially in vision-based reinforcement learning tasks. The quality of the environment representation directly influences the achievement of the learning task. Previous vision-based RL typically uses explicit or implicit ways to represent environments, such as images, points, voxels, and neural radiance fields. However, these representations contain several drawbacks. They cannot either describe complex local geometries or generalize well to unseen scenes, or require precise foreground masks. Moreover, these implicit neural representations are akin to a ``black box\", significantly hindering interpretability. 3D Gaussian Splatting (3DGS), with its explicit scene representation and differentiable rendering nature, is considered a revolutionary change for reconstruction and representation methods. In this paper, we propose a novel Generalizable Gaussian Splatting framework to be the representation of RL tasks, called GSRL. Through validation in the RoboMimic environment, our method achieves better results than other baselines in multiple tasks, improving the performance by 10%, 44%, and 15% compared with baselines on the hardest task. This work is the first attempt to leverage generalizable 3DGS as a representation for RL.", + "arxiv_url": "http://arxiv.org/abs/2404.07950v3", + "pdf_url": "http://arxiv.org/pdf/2404.07950v3", + "published_date": "2024-03-18", + "categories": [ + "cs.CV", + "cs.AI", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "View-Consistent 3D Editing with Gaussian Splatting", + "authors": [ + "Yuxuan Wang", + "Xuanyu Yi", + "Zike Wu", + "Na Zhao", + "Long Chen", + "Hanwang Zhang" + ], + "abstract": "The advent of 3D Gaussian Splatting (3DGS) has revolutionized 3D editing, offering efficient, high-fidelity rendering and enabling precise local manipulations. Currently, diffusion-based 2D editing models are harnessed to modify multi-view rendered images, which then guide the editing of 3DGS models. However, this approach faces a critical issue of multi-view inconsistency, where the guidance images exhibit significant discrepancies across views, leading to mode collapse and visual artifacts of 3DGS. To this end, we introduce View-consistent Editing (VcEdit), a novel framework that seamlessly incorporates 3DGS into image editing processes, ensuring multi-view consistency in edited guidance images and effectively mitigating mode collapse issues. VcEdit employs two innovative consistency modules: the Cross-attention Consistency Module and the Editing Consistency Module, both designed to reduce inconsistencies in edited images. By incorporating these consistency modules into an iterative pattern, VcEdit proficiently resolves the issue of multi-view inconsistency, facilitating high-quality 3DGS editing across a diverse range of scenes. Further video results are shown in http://vcedit.github.io.", + "arxiv_url": "http://arxiv.org/abs/2403.11868v9", + "pdf_url": "http://arxiv.org/pdf/2403.11868v9", + "published_date": "2024-03-18", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting", + "authors": [ + "Lingzhe Zhao", + "Peng Wang", + "Peidong Liu" + ], + "abstract": "While neural rendering has demonstrated impressive capabilities in 3D scene reconstruction and novel view synthesis, it heavily relies on high-quality sharp images and accurate camera poses. Numerous approaches have been proposed to train Neural Radiance Fields (NeRF) with motion-blurred images, commonly encountered in real-world scenarios such as low-light or long-exposure conditions. However, the implicit representation of NeRF struggles to accurately recover intricate details from severely motion-blurred images and cannot achieve real-time rendering. In contrast, recent advancements in 3D Gaussian Splatting achieve high-quality 3D scene reconstruction and real-time rendering by explicitly optimizing point clouds as Gaussian spheres. In this paper, we introduce a novel approach, named BAD-Gaussians (Bundle Adjusted Deblur Gaussian Splatting), which leverages explicit Gaussian representation and handles severe motion-blurred images with inaccurate camera poses to achieve high-quality scene reconstruction. Our method models the physical image formation process of motion-blurred images and jointly learns the parameters of Gaussians while recovering camera motion trajectories during exposure time. In our experiments, we demonstrate that BAD-Gaussians not only achieves superior rendering quality compared to previous state-of-the-art deblur neural rendering methods on both synthetic and real datasets but also enables real-time rendering capabilities. Our project page and source code is available at https://lingzhezhao.github.io/BAD-Gaussians/", + "arxiv_url": "http://arxiv.org/abs/2403.11831v2", + "pdf_url": "http://arxiv.org/pdf/2403.11831v2", + "published_date": "2024-03-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NEDS-SLAM: A Neural Explicit Dense Semantic SLAM Framework using 3D Gaussian Splatting", + "authors": [ + "Yiming Ji", + "Yang Liu", + "Guanghu Xie", + "Boyu Ma", + "Zongwu Xie" + ], + "abstract": "We propose NEDS-SLAM, a dense semantic SLAM system based on 3D Gaussian representation, that enables robust 3D semantic mapping, accurate camera tracking, and high-quality rendering in real-time. In the system, we propose a Spatially Consistent Feature Fusion model to reduce the effect of erroneous estimates from pre-trained segmentation head on semantic reconstruction, achieving robust 3D semantic Gaussian mapping. Additionally, we employ a lightweight encoder-decoder to compress the high-dimensional semantic features into a compact 3D Gaussian representation, mitigating the burden of excessive memory consumption. Furthermore, we leverage the advantage of 3D Gaussian splatting, which enables efficient and differentiable novel view rendering, and propose a Virtual Camera View Pruning method to eliminate outlier gaussians, thereby effectively enhancing the quality of scene representations. Our NEDS-SLAM method demonstrates competitive performance over existing dense semantic SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in 3D dense semantic mapping.", + "arxiv_url": "http://arxiv.org/abs/2403.11679v3", + "pdf_url": "http://arxiv.org/pdf/2403.11679v3", + "published_date": "2024-03-18", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussNav: Gaussian Splatting for Visual Navigation", + "authors": [ + "Xiaohan Lei", + "Min Wang", + "Wengang Zhou", + "Houqiang Li" + ], + "abstract": "In embodied vision, Instance ImageGoal Navigation (IIN) requires an agent to locate a specific object depicted in a goal image within an unexplored environment. The primary difficulty of IIN stems from the necessity of recognizing the target object across varying viewpoints and rejecting potential distractors. Existing map-based navigation methods largely adopt the representation form of Bird's Eye View (BEV) maps, which, however, lack the representation of detailed textures in a scene. To address the above issues, we propose a new Gaussian Splatting Navigation (abbreviated as GaussNav) framework for IIN task, which constructs a novel map representation based on 3D Gaussian Splatting (3DGS). The proposed framework enables the agent to not only memorize the geometry and semantic information of the scene, but also retain the textural features of objects. Our GaussNav framework demonstrates a significant leap in performance, evidenced by an increase in Success weighted by Path Length (SPL) from 0.252 to 0.578 on the challenging Habitat-Matterport 3D (HM3D) dataset. Our code will be made publicly available.", + "arxiv_url": "http://arxiv.org/abs/2403.11625v2", + "pdf_url": "http://arxiv.org/pdf/2403.11625v2", + "published_date": "2024-03-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "UV Gaussians: Joint Learning of Mesh Deformation and Gaussian Textures for Human Avatar Modeling", + "authors": [ + "Yujiao Jiang", + "Qingmin Liao", + "Xiaoyu Li", + "Li Ma", + "Qi Zhang", + "Chaopeng Zhang", + "Zongqing Lu", + "Ying Shan" + ], + "abstract": "Reconstructing photo-realistic drivable human avatars from multi-view image sequences has been a popular and challenging topic in the field of computer vision and graphics. While existing NeRF-based methods can achieve high-quality novel view rendering of human models, both training and inference processes are time-consuming. Recent approaches have utilized 3D Gaussians to represent the human body, enabling faster training and rendering. However, they undermine the importance of the mesh guidance and directly predict Gaussians in 3D space with coarse mesh guidance. This hinders the learning procedure of the Gaussians and tends to produce blurry textures. Therefore, we propose UV Gaussians, which models the 3D human body by jointly learning mesh deformations and 2D UV-space Gaussian textures. We utilize the embedding of UV map to learn Gaussian textures in 2D space, leveraging the capabilities of powerful 2D networks to extract features. Additionally, through an independent Mesh network, we optimize pose-dependent geometric deformations, thereby guiding Gaussian rendering and significantly enhancing rendering quality. We collect and process a new dataset of human motion, which includes multi-view images, scanned models, parametric model registration, and corresponding texture maps. Experimental results demonstrate that our method achieves state-of-the-art synthesis of novel view and novel pose. The code and data will be made available on the homepage https://alex-jyj.github.io/UV-Gaussians/ once the paper is accepted.", + "arxiv_url": "http://arxiv.org/abs/2403.11589v1", + "pdf_url": "http://arxiv.org/pdf/2403.11589v1", + "published_date": "2024-03-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS-Calib: 3D Gaussian Splatting for Multimodal SpatioTemporal Calibration", + "authors": [ + "Quentin Herau", + "Moussab Bennehar", + "Arthur Moreau", + "Nathan Piasco", + "Luis Roldao", + "Dzmitry Tsishkou", + "Cyrille Migniot", + "Pascal Vasseur", + "Cédric Demonceaux" + ], + "abstract": "Reliable multimodal sensor fusion algorithms require accurate spatiotemporal calibration. Recently, targetless calibration techniques based on implicit neural representations have proven to provide precise and robust results. Nevertheless, such methods are inherently slow to train given the high computational overhead caused by the large number of sampled points required for volume rendering. With the recent introduction of 3D Gaussian Splatting as a faster alternative to implicit representation methods, we propose to leverage this new rendering approach to achieve faster multi-sensor calibration. We introduce 3DGS-Calib, a new calibration method that relies on the speed and rendering accuracy of 3D Gaussian Splatting to achieve multimodal spatiotemporal calibration that is accurate, robust, and with a substantial speed-up compared to methods relying on implicit neural representations. We demonstrate the superiority of our proposal with experimental results on sequences from KITTI-360, a widely used driving dataset.", + "arxiv_url": "http://arxiv.org/abs/2403.11577v2", + "pdf_url": "http://arxiv.org/pdf/2403.11577v2", + "published_date": "2024-03-18", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Fed3DGS: Scalable 3D Gaussian Splatting with Federated Learning", + "authors": [ + "Teppei Suzuki" + ], + "abstract": "In this work, we present Fed3DGS, a scalable 3D reconstruction framework based on 3D Gaussian splatting (3DGS) with federated learning. Existing city-scale reconstruction methods typically adopt a centralized approach, which gathers all data in a central server and reconstructs scenes. The approach hampers scalability because it places a heavy load on the server and demands extensive data storage when reconstructing scenes on a scale beyond city-scale. In pursuit of a more scalable 3D reconstruction, we propose a federated learning framework with 3DGS, which is a decentralized framework and can potentially use distributed computational resources across millions of clients. We tailor a distillation-based model update scheme for 3DGS and introduce appearance modeling for handling non-IID data in the scenario of 3D reconstruction with federated learning. We simulate our method on several large-scale benchmarks, and our method demonstrates rendered image quality comparable to centralized approaches. In addition, we also simulate our method with data collected in different seasons, demonstrating that our framework can reflect changes in the scenes and our appearance modeling captures changes due to seasonal variations.", + "arxiv_url": "http://arxiv.org/abs/2403.11460v1", + "pdf_url": "http://arxiv.org/pdf/2403.11460v1", + "published_date": "2024-03-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Bridging 3D Gaussian and Mesh for Freeview Video Rendering", + "authors": [ + "Yuting Xiao", + "Xuan Wang", + "Jiafei Li", + "Hongrui Cai", + "Yanbo Fan", + "Nan Xue", + "Minghui Yang", + "Yujun Shen", + "Shenghua Gao" + ], + "abstract": "This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the fuzzy geometry. Meanwhile, the point-based splatting (e.g. the 3D Gaussian Splatting) method usually produces artifacts or blurry pixels in the area with smooth geometry and sharp textures. As a result, it is difficult, even not impossible, to represent the complex and dynamic scene with a single type of primitive. To this end, we propose a novel approach, GauMesh, to bridge the 3D Gaussian and Mesh for modeling and rendering the dynamic scenes. Given a sequence of tracked mesh as initialization, our goal is to simultaneously optimize the mesh geometry, color texture, opacity maps, a set of 3D Gaussians, and the deformation field. At a specific time, we perform $\\alpha$-blending on the RGB and opacity values based on the merged and re-ordered z-buffers from mesh and 3D Gaussian rasterizations. This produces the final rendering, which is supervised by the ground-truth image. Experiments demonstrate that our approach adapts the appropriate type of primitives to represent the different parts of the dynamic scene and outperforms all the baseline methods in both quantitative and qualitative comparisons without losing render speed.", + "arxiv_url": "http://arxiv.org/abs/2403.11453v1", + "pdf_url": "http://arxiv.org/pdf/2403.11453v1", + "published_date": "2024-03-18", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Motion-aware 3D Gaussian Splatting for Efficient Dynamic Scene Reconstruction", + "authors": [ + "Zhiyang Guo", + "Wengang Zhou", + "Li Li", + "Min Wang", + "Houqiang Li" + ], + "abstract": "3D Gaussian Splatting (3DGS) has become an emerging tool for dynamic scene reconstruction. However, existing methods focus mainly on extending static 3DGS into a time-variant representation, while overlooking the rich motion information carried by 2D observations, thus suffering from performance degradation and model redundancy. To address the above problem, we propose a novel motion-aware enhancement framework for dynamic scene reconstruction, which mines useful motion cues from optical flow to improve different paradigms of dynamic 3DGS. Specifically, we first establish a correspondence between 3D Gaussian movements and pixel-level flow. Then a novel flow augmentation method is introduced with additional insights into uncertainty and loss collaboration. Moreover, for the prevalent deformation-based paradigm that presents a harder optimization problem, a transient-aware deformation auxiliary module is proposed. We conduct extensive experiments on both multi-view and monocular scenes to verify the merits of our work. Compared with the baselines, our method shows significant superiority in both rendering quality and efficiency.", + "arxiv_url": "http://arxiv.org/abs/2403.11447v1", + "pdf_url": "http://arxiv.org/pdf/2403.11447v1", + "published_date": "2024-03-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors", + "authors": [ + "Tingyang Zhang", + "Qingzhe Gao", + "Weiyu Li", + "Libin Liu", + "Baoquan Chen" + ], + "abstract": "Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2403.11427v1", + "pdf_url": "http://arxiv.org/pdf/2403.11427v1", + "published_date": "2024-03-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Beyond Uncertainty: Risk-Aware Active View Acquisition for Safe Robot Navigation and 3D Scene Understanding with FisherRF", + "authors": [ + "Guangyi Liu", + "Wen Jiang", + "Boshu Lei", + "Vivek Pandey", + "Kostas Daniilidis", + "Nader Motee" + ], + "abstract": "This work proposes a novel approach to bolster both the robot's risk assessment and safety measures while deepening its understanding of 3D scenes, which is achieved by leveraging Radiance Field (RF) models and 3D Gaussian Splatting. To further enhance these capabilities, we incorporate additional sampled views from the environment with the RF model. One of our key contributions is the introduction of Risk-aware Environment Masking (RaEM), which prioritizes crucial information by selecting the next-best-view that maximizes the expected information gain. This targeted approach aims to minimize uncertainties surrounding the robot's path and enhance the safety of its navigation. Our method offers a dual benefit: improved robot safety and increased efficiency in risk-aware 3D scene reconstruction and understanding. Extensive experiments in real-world scenarios demonstrate the effectiveness of our proposed approach, highlighting its potential to establish a robust and safety-focused framework for active robot exploration and 3D scene understanding.", + "arxiv_url": "http://arxiv.org/abs/2403.11396v1", + "pdf_url": "http://arxiv.org/pdf/2403.11396v1", + "published_date": "2024-03-18", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS-ReLoc: 3D Gaussian Splatting for Map Representation and Visual ReLocalization", + "authors": [ + "Peng Jiang", + "Gaurav Pandey", + "Srikanth Saripalli" + ], + "abstract": "This paper presents a novel system designed for 3D mapping and visual relocalization using 3D Gaussian Splatting. Our proposed method uses LiDAR and camera data to create accurate and visually plausible representations of the environment. By leveraging LiDAR data to initiate the training of the 3D Gaussian Splatting map, our system constructs maps that are both detailed and geometrically accurate. To mitigate excessive GPU memory usage and facilitate rapid spatial queries, we employ a combination of a 2D voxel map and a KD-tree. This preparation makes our method well-suited for visual localization tasks, enabling efficient identification of correspondences between the query image and the rendered image from the Gaussian Splatting map via normalized cross-correlation (NCC). Additionally, we refine the camera pose of the query image using feature-based matching and the Perspective-n-Point (PnP) technique. The effectiveness, adaptability, and precision of our system are demonstrated through extensive evaluation on the KITTI360 dataset.", + "arxiv_url": "http://arxiv.org/abs/2403.11367v1", + "pdf_url": "http://arxiv.org/pdf/2403.11367v1", + "published_date": "2024-03-17", + "categories": [ + "cs.CV", + "cs.GR", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Creating Seamless 3D Maps Using Radiance Fields", + "authors": [ + "Sai Tarun Sathyan", + "Thomas B. Kinsman" + ], + "abstract": "It is desirable to create 3D object models and 3D maps from 2D input images for applications such as navigation, virtual tourism, and urban planning. The traditional methods of creating 3D maps, (such as photogrammetry), require a large number of images and odometry. Additionally, traditional methods have difficulty with reflective surfaces and specular reflections; windows and chrome in the scene can be problematic. Google Road View is a familiar application, which uses traditional methods to fuse a collection of 2D input images into the illusion of a 3D map. However, Google Road View does not create an actual 3D object model, only a collection of views. The objective of this work is to create an actual 3D object model using updated techniques. Neural Radiance Fields (NeRF[1]) has emerged as a potential solution, offering the capability to produce more precise and intricate 3D maps. Gaussian Splatting[4] is another contemporary technique. This investigation compares Neural Radiance Fields to Gaussian Splatting, and describes some of their inner workings. Our primary contribution is a method for improving the results of the 3D reconstructed models. Our results indicate that Gaussian Splatting was superior to the NeRF technique.", + "arxiv_url": "http://arxiv.org/abs/2403.11364v1", + "pdf_url": "http://arxiv.org/pdf/2403.11364v1", + "published_date": "2024-03-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering", + "authors": [ + "Yanyan Li", + "Chenyu Lyu", + "Yan Di", + "Guangyao Zhai", + "Gim Hee Lee", + "Federico Tombari" + ], + "abstract": "During the Gaussian Splatting optimization process, the scene's geometry can gradually deteriorate if its structure is not deliberately preserved, especially in non-textured regions such as walls, ceilings, and furniture surfaces. This degradation significantly affects the rendering quality of novel views that deviate significantly from the viewpoints in the training data. To mitigate this issue, we propose a novel approach called GeoGaussian. Based on the smoothly connected areas observed from point clouds, this method introduces a novel pipeline to initialize thin Gaussians aligned with the surfaces, where the characteristic can be transferred to new generations through a carefully designed densification strategy. Finally, the pipeline ensures that the scene's geometry and texture are maintained through constrained optimization processes with explicit geometry constraints. Benefiting from the proposed architecture, the generative ability of 3D Gaussians is enhanced, especially in structured regions. Our proposed pipeline achieves state-of-the-art performance in novel view synthesis and geometric reconstruction, as evaluated qualitatively and quantitatively on public datasets.", + "arxiv_url": "http://arxiv.org/abs/2403.11324v2", + "pdf_url": "http://arxiv.org/pdf/2403.11324v2", + "published_date": "2024-03-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis", + "authors": [ + "Lutao Jiang", + "Xu Zheng", + "Yuanhuiyi Lyu", + "Jiazhou Zhou", + "Lin Wang" + ], + "abstract": "Text-to-3D synthesis has recently seen intriguing advances by combining the text-to-image priors with 3D representation methods, e.g., 3D Gaussian Splatting (3D GS), via Score Distillation Sampling (SDS). However, a hurdle of existing methods is the low efficiency, per-prompt optimization for a single 3D object. Therefore, it is imperative for a paradigm shift from per-prompt optimization to feed-forward generation for any unseen text prompts, which yet remains challenging. An obstacle is how to directly generate a set of millions of 3D Gaussians to represent a 3D object. This paper presents BrightDreamer, an end-to-end feed-forward approach that can achieve generalizable and fast (77 ms) text-to-3D generation. Our key idea is to formulate the generation process as estimating the 3D deformation from an anchor shape with predefined positions. For this, we first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions, used as the centers (one attribute) of 3D Gaussians. To estimate the other four attributes (i.e., scaling, rotation, opacity, and SH), we then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object. The center of each Gaussian enables us to transform the spatial feature into the four attributes. The generated 3D Gaussians can be finally rendered at 705 frames per second. Extensive experiments demonstrate the superiority of our method over existing methods. Also, BrightDreamer possesses a strong semantic understanding capability even for complex text prompts. The code is available in the project page.", + "arxiv_url": "http://arxiv.org/abs/2403.11273v2", + "pdf_url": "http://arxiv.org/pdf/2403.11273v2", + "published_date": "2024-03-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Compact 3D Gaussian Splatting For Dense Visual SLAM", + "authors": [ + "Tianchen Deng", + "Yaohui Chen", + "Leyan Zhang", + "Jianfei Yang", + "Shenghai Yuan", + "Jiuming Liu", + "Danwei Wang", + "Hesheng Wang", + "Weidong Chen" + ], + "abstract": "Recent work has shown that 3D Gaussian-based SLAM enables high-quality reconstruction, accurate pose estimation, and real-time rendering of scenes. However, these approaches are built on a tremendous number of redundant 3D Gaussian ellipsoids, leading to high memory and storage costs, and slow training speed. To address the limitation, we propose a compact 3D Gaussian Splatting SLAM system that reduces the number and the parameter size of Gaussian ellipsoids. A sliding window-based masking strategy is first proposed to reduce the redundant ellipsoids. Then we observe that the covariance matrix (geometry) of most 3D Gaussian ellipsoids are extremely similar, which motivates a novel geometry codebook to compress 3D Gaussian geometric attributes, i.e., the parameters. Robust and accurate pose estimation is achieved by a global bundle adjustment method with reprojection loss. Extensive experiments demonstrate that our method achieves faster training and rendering speed while maintaining the state-of-the-art (SOTA) quality of the scene representation.", + "arxiv_url": "http://arxiv.org/abs/2403.11247v2", + "pdf_url": "http://arxiv.org/pdf/2403.11247v2", + "published_date": "2024-03-17", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Recent Advances in 3D Gaussian Splatting", + "authors": [ + "Tong Wu", + "Yu-Jie Yuan", + "Ling-Xiao Zhang", + "Jie Yang", + "Yan-Pei Cao", + "Ling-Qi Yan", + "Lin Gao" + ], + "abstract": "The emergence of 3D Gaussian Splatting (3DGS) has greatly accelerated the rendering speed of novel view synthesis. Unlike neural implicit representations like Neural Radiance Fields (NeRF) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian Splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished by rasterizing Gaussian ellipsoids into images. Apart from the fast rendering speed, the explicit representation of 3D Gaussian Splatting facilitates editing tasks like dynamic reconstruction, geometry editing, and physical simulation. Considering the rapid change and growing number of works in this field, we present a literature review of recent 3D Gaussian Splatting methods, which can be roughly classified into 3D reconstruction, 3D editing, and other downstream applications by functionality. Traditional point-based rendering methods and the rendering formulation of 3D Gaussian Splatting are also illustrated for a better understanding of this technique. This survey aims to help beginners get into this field quickly and provide experienced researchers with a comprehensive overview, which can stimulate the future development of the 3D Gaussian Splatting representation.", + "arxiv_url": "http://arxiv.org/abs/2403.11134v2", + "pdf_url": "http://arxiv.org/pdf/2403.11134v2", + "published_date": "2024-03-17", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration", + "authors": [ + "Zhihao Liang", + "Qi Zhang", + "Wenbo Hu", + "Ying Feng", + "Lei Zhu", + "Kui Jia" + ], + "abstract": "The 3D Gaussian Splatting (3DGS) gained its popularity recently by combining the advantages of both primitive-based and volumetric 3D representations, resulting in improved quality and efficiency for 3D scene rendering. However, 3DGS is not alias-free, and its rendering at varying resolutions could produce severe blurring or jaggies. This is because 3DGS treats each pixel as an isolated, single point rather than as an area, causing insensitivity to changes in the footprints of pixels. Consequently, this discrete sampling scheme inevitably results in aliasing, owing to the restricted sampling bandwidth. In this paper, we derive an analytical solution to address this issue. More specifically, we use a conditioned logistic function as the analytic approximation of the cumulative distribution function (CDF) in a one-dimensional Gaussian signal and calculate the Gaussian integral by subtracting the CDFs. We then introduce this approximation in the two-dimensional pixel shading, and present Analytic-Splatting, which analytically approximates the Gaussian integral within the 2D-pixel window area to better capture the intensity response of each pixel. Moreover, we use the approximated response of the pixel window integral area to participate in the transmittance calculation of volume rendering, making Analytic-Splatting sensitive to the changes in pixel footprint at different resolutions. Experiments on various datasets validate that our approach has better anti-aliasing capability that gives more details and better fidelity.", + "arxiv_url": "http://arxiv.org/abs/2403.11056v2", + "pdf_url": "http://arxiv.org/pdf/2403.11056v2", + "published_date": "2024-03-17", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark", + "authors": [ + "Tianyi Zhang", + "Kaining Huang", + "Weiming Zhi", + "Matthew Johnson-Roberson" + ], + "abstract": "Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments.", + "arxiv_url": "http://arxiv.org/abs/2403.10814v2", + "pdf_url": "http://arxiv.org/pdf/2403.10814v2", + "published_date": "2024-03-16", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-Pose: Generalizable Segmentation-based 6D Object Pose Estimation with 3D Gaussian Splatting", + "authors": [ + "Dingding Cai", + "Janne Heikkilä", + "Esa Rahtu" + ], + "abstract": "This paper introduces GS-Pose, a unified framework for localizing and estimating the 6D pose of novel objects. GS-Pose begins with a set of posed RGB images of a previously unseen object and builds three distinct representations stored in a database. At inference, GS-Pose operates sequentially by locating the object in the input image, estimating its initial 6D pose using a retrieval approach, and refining the pose with a render-and-compare method. The key insight is the application of the appropriate object representation at each stage of the process. In particular, for the refinement step, we leverage 3D Gaussian splatting, a novel differentiable rendering technique that offers high rendering speed and relatively low optimization time. Off-the-shelf toolchains and commodity hardware, such as mobile phones, can be used to capture new objects to be added to the database. Extensive evaluations on the LINEMOD and OnePose-LowTexture datasets demonstrate excellent performance, establishing the new state-of-the-art. Project page: https://dingdingcai.github.io/gs-pose.", + "arxiv_url": "http://arxiv.org/abs/2403.10683v2", + "pdf_url": "http://arxiv.org/pdf/2403.10683v2", + "published_date": "2024-03-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians", + "authors": [ + "Hiba Dahmani", + "Moussab Bennehar", + "Nathan Piasco", + "Luis Roldao", + "Dzmitry Tsishkou" + ], + "abstract": "Implicit neural representation methods have shown impressive advancements in learning 3D scenes from unstructured in-the-wild photo collections but are still limited by the large computational cost of volumetric rendering. More recently, 3D Gaussian Splatting emerged as a much faster alternative with superior rendering quality and training efficiency, especially for small-scale and object-centric scenarios. Nevertheless, this technique suffers from poor performance on unstructured in-the-wild data. To tackle this, we extend over 3D Gaussian Splatting to handle unstructured image collections. We achieve this by modeling appearance to seize photometric variations in the rendered images. Additionally, we introduce a new mechanism to train transient Gaussians to handle the presence of scene occluders in an unsupervised manner. Experiments on diverse photo collection scenes and multi-pass acquisition of outdoor landmarks show the effectiveness of our method over prior works achieving state-of-the-art results with improved efficiency.", + "arxiv_url": "http://arxiv.org/abs/2403.10427v2", + "pdf_url": "http://arxiv.org/pdf/2403.10427v2", + "published_date": "2024-03-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting", + "authors": [ + "Qijun Feng", + "Zhen Xing", + "Zuxuan Wu", + "Yu-Gang Jiang" + ], + "abstract": "We introduce GeoGS3D, a novel two-stage framework for reconstructing detailed 3D objects from single-view images. Inspired by the success of pre-trained 2D diffusion models, our method incorporates an orthogonal plane decomposition mechanism to extract 3D geometric features from the 2D input, facilitating the generation of multi-view consistent images. During the following Gaussian Splatting, these images are fused with epipolar attention, fully utilizing the geometric correlations across views. Moreover, we propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization, significantly accelerating the reconstruction process. Extensive experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects, both qualitatively and quantitatively.", + "arxiv_url": "http://arxiv.org/abs/2403.10242v2", + "pdf_url": "http://arxiv.org/pdf/2403.10242v2", + "published_date": "2024-03-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time", + "authors": [ + "Hao Li", + "Yuanyuan Gao", + "Chenming Wu", + "Dingwen Zhang", + "Yalun Dai", + "Chen Zhao", + "Haocheng Feng", + "Errui Ding", + "Jingdong Wang", + "Junwei Han" + ], + "abstract": "This paper presents GGRt, a novel approach to generalizable novel view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger applicability of 3D Gaussian Splatting (3D-GS) in real-world scenarios. Specifically, we design a novel joint learning framework that consists of an Iterative Pose Optimization Network (IPO-Net) and a Generalizable 3D-Gaussians (G-3DG) model. With the joint learning mechanism, the proposed framework can inherently estimate robust relative pose information from the image observations and thus primarily alleviate the requirement of real camera poses. Moreover, we implement a deferred back-propagation mechanism that enables high-resolution training and inference, overcoming the resolution constraints of previous methods. To enhance the speed and efficiency, we further introduce a progressive Gaussian cache module that dynamically adjusts during training and inference. As the first pose-free generalizable 3D-GS framework, GGRt achieves inference at $\\ge$ 5 FPS and real-time rendering at $\\ge$ 100 FPS. Through extensive experimentation, we demonstrate that our method outperforms existing NeRF-based pose-free techniques in terms of inference speed and effectiveness. It can also approach the real pose-based 3D-GS methods. Our contributions provide a significant leap forward for the integration of computer vision and computer graphics into practical applications, offering state-of-the-art results on LLFF, KITTI, and Waymo Open datasets and enabling real-time rendering for immersive experiences.", + "arxiv_url": "http://arxiv.org/abs/2403.10147v2", + "pdf_url": "http://arxiv.org/pdf/2403.10147v2", + "published_date": "2024-03-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing", + "authors": [ + "Tian-Xing Xu", + "Wenbo Hu", + "Yu-Kun Lai", + "Ying Shan", + "Song-Hai Zhang" + ], + "abstract": "3D Gaussian splatting, emerging as a groundbreaking approach, has drawn increasing attention for its capabilities of high-fidelity reconstruction and real-time rendering. However, it couples the appearance and geometry of the scene within the Gaussian attributes, which hinders the flexibility of editing operations, such as texture swapping. To address this issue, we propose a novel approach, namely Texture-GS, to disentangle the appearance from the geometry by representing it as a 2D texture mapped onto the 3D surface, thereby facilitating appearance editing. Technically, the disentanglement is achieved by our proposed texture mapping module, which consists of a UV mapping MLP to learn the UV coordinates for the 3D Gaussian centers, a local Taylor expansion of the MLP to efficiently approximate the UV coordinates for the ray-Gaussian intersections, and a learnable texture to capture the fine-grained appearance. Extensive experiments on the DTU dataset demonstrate that our method not only facilitates high-fidelity appearance editing but also achieves real-time rendering on consumer-level devices, e.g. a single RTX 2080 Ti GPU.", + "arxiv_url": "http://arxiv.org/abs/2403.10050v1", + "pdf_url": "http://arxiv.org/pdf/2403.10050v1", + "published_date": "2024-03-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting", + "authors": [ + "Zhiqi Li", + "Yiming Chen", + "Lingzhe Zhao", + "Peidong Liu" + ], + "abstract": "While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.", + "arxiv_url": "http://arxiv.org/abs/2403.09981v2", + "pdf_url": "http://arxiv.org/pdf/2403.09981v2", + "published_date": "2024-03-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Den-SOFT: Dense Space-Oriented Light Field DataseT for 6-DOF Immersive Experience", + "authors": [ + "Xiaohang Yu", + "Zhengxian Yang", + "Shi Pan", + "Yuqi Han", + "Haoxiang Wang", + "Jun Zhang", + "Shi Yan", + "Borong Lin", + "Lei Yang", + "Tao Yu", + "Lu Fang" + ], + "abstract": "We have built a custom mobile multi-camera large-space dense light field capture system, which provides a series of high-quality and sufficiently dense light field images for various scenarios. Our aim is to contribute to the development of popular 3D scene reconstruction algorithms such as IBRnet, NeRF, and 3D Gaussian splitting. More importantly, the collected dataset, which is much denser than existing datasets, may also inspire space-oriented light field reconstruction, which is potentially different from object-centric 3D reconstruction, for immersive VR/AR experiences. We utilized a total of 40 GoPro 10 cameras, capturing images of 5k resolution. The number of photos captured for each scene is no less than 1000, and the average density (view number within a unit sphere) is 134.68. It is also worth noting that our system is capable of efficiently capturing large outdoor scenes. Addressing the current lack of large-space and dense light field datasets, we made efforts to include elements such as sky, reflections, lights and shadows that are of interest to researchers in the field of 3D reconstruction during the data capture process. Finally, we validated the effectiveness of our provided dataset on three popular algorithms and also integrated the reconstructed 3DGS results into the Unity engine, demonstrating the potential of utilizing our datasets to enhance the realism of virtual reality (VR) and create feasible interactive spaces. The dataset is available at our project website.", + "arxiv_url": "http://arxiv.org/abs/2403.09973v1", + "pdf_url": "http://arxiv.org/pdf/2403.09973v1", + "published_date": "2024-03-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Touch-GS: Visual-Tactile Supervised 3D Gaussian Splatting", + "authors": [ + "Aiden Swann", + "Matthew Strong", + "Won Kyung Do", + "Gadiel Sznaier Camps", + "Mac Schwager", + "Monroe Kennedy III" + ], + "abstract": "In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at http://armlabstanford.github.io/touch-gs", + "arxiv_url": "http://arxiv.org/abs/2403.09875v3", + "pdf_url": "http://arxiv.org/pdf/2403.09875v3", + "published_date": "2024-03-14", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping", + "authors": [ + "Yuhang Zheng", + "Xiangyu Chen", + "Yupeng Zheng", + "Songen Gu", + "Runyi Yang", + "Bu Jin", + "Pengfei Li", + "Chengliang Zhong", + "Zengmao Wang", + "Lina Liu", + "Chao Yang", + "Dawei Wang", + "Zhen Chen", + "Xiaoxiao Long", + "Meiqing Wang" + ], + "abstract": "Constructing a 3D scene capable of accommodating open-ended language queries, is a pivotal pursuit, particularly within the domain of robotics. Such technology facilitates robots in executing object manipulations based on human language directives. To tackle this challenge, some research efforts have been dedicated to the development of language-embedded implicit fields. However, implicit fields (e.g. NeRF) encounter limitations due to the necessity of processing a large number of input views for reconstruction, coupled with their inherent inefficiencies in inference. Thus, we present the GaussianGrasper, which utilizes 3D Gaussian Splatting to explicitly represent the scene as a collection of Gaussian primitives. Our approach takes a limited set of RGB-D views and employs a tile-based splatting technique to create a feature field. In particular, we propose an Efficient Feature Distillation (EFD) module that employs contrastive learning to efficiently and accurately distill language embeddings derived from foundational models. With the reconstructed geometry of the Gaussian field, our method enables the pre-trained grasping model to generate collision-free grasp pose candidates. Furthermore, we propose a normal-guided grasp module to select the best grasp pose. Through comprehensive real-world experiments, we demonstrate that GaussianGrasper enables robots to accurately query and grasp objects with language instructions, providing a new solution for language-guided manipulation tasks. Data and codes can be available at https://github.com/MrSecant/GaussianGrasper.", + "arxiv_url": "http://arxiv.org/abs/2403.09637v1", + "pdf_url": "http://arxiv.org/pdf/2403.09637v1", + "published_date": "2024-03-14", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "https://github.com/MrSecant/GaussianGrasper", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians", + "authors": [ + "Licheng Zhong", + "Hong-Xing Yu", + "Jiajun Wu", + "Yunzhu Li" + ], + "abstract": "Reconstructing and simulating elastic objects from visual observations is crucial for applications in computer vision and robotics. Existing methods, such as 3D Gaussians, model 3D appearance and geometry, but lack the ability to estimate physical properties for objects and simulate them. The core challenge lies in integrating an expressive yet efficient physical dynamics model. We propose Spring-Gaus, a 3D physical object representation for reconstructing and simulating elastic objects from videos of the object from multiple viewpoints. In particular, we develop and integrate a 3D Spring-Mass model into 3D Gaussian kernels, enabling the reconstruction of the visual appearance, shape, and physical dynamics of the object. Our approach enables future prediction and simulation under various initial states and environmental properties. We evaluate Spring-Gaus on both synthetic and real-world datasets, demonstrating accurate reconstruction and simulation of elastic objects. Project page: https://zlicheng.com/spring_gaus/.", + "arxiv_url": "http://arxiv.org/abs/2403.09434v3", + "pdf_url": "http://arxiv.org/pdf/2403.09434v3", + "published_date": "2024-03-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Relaxing Accurate Initialization Constraint for 3D Gaussian Splatting", + "authors": [ + "Jaewoo Jung", + "Jisang Han", + "Honggyu An", + "Jiwon Kang", + "Seonghoon Park", + "Seungryong Kim" + ], + "abstract": "3D Gaussian splatting (3DGS) has recently demonstrated impressive capabilities in real-time novel view synthesis and 3D reconstruction. However, 3DGS heavily depends on the accurate initialization derived from Structure-from-Motion (SfM) methods. When the quality of the initial point cloud deteriorates, such as in the presence of noise or when using randomly initialized point cloud, 3DGS often undergoes large performance drops. To address this limitation, we propose a novel optimization strategy dubbed RAIN-GS (Relaing Accurate Initialization Constraint for 3D Gaussian Splatting). Our approach is based on an in-depth analysis of the original 3DGS optimization scheme and the analysis of the SfM initialization in the frequency domain. Leveraging simple modifications based on our analyses, RAIN-GS successfully trains 3D Gaussians from sub-optimal point cloud (e.g., randomly initialized point cloud), effectively relaxing the need for accurate initialization. We demonstrate the efficacy of our strategy through quantitative and qualitative comparisons on multiple datasets, where RAIN-GS trained with random point cloud achieves performance on-par with or even better than 3DGS trained with accurate SfM point cloud. Our project page and code can be found at https://ku-cvlab.github.io/RAIN-GS.", + "arxiv_url": "http://arxiv.org/abs/2403.09413v2", + "pdf_url": "http://arxiv.org/pdf/2403.09413v2", + "published_date": "2024-03-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph", + "authors": [ + "Donglin Di", + "Jiahui Yang", + "Chaofan Luo", + "Zhou Xue", + "Wei Chen", + "Xun Yang", + "Yue Gao" + ], + "abstract": "Text-to-3D generation represents an exciting field that has seen rapid advancements, facilitating the transformation of textual descriptions into detailed 3D models. However, current progress often neglects the intricate high-order correlation of geometry and texture within 3D objects, leading to challenges such as over-smoothness, over-saturation and the Janus problem. In this work, we propose a method named ``3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects. Our framework is anchored by a well-established mainflow and an essential module, named ``Geometry and Texture Hypergraph Refiner (HGRefiner)''. This module not only refines the representation of 3D Gaussians but also accelerates the update process of these 3D Gaussians by conducting the Patch-3DGS Hypergraph Learning on both explicit attributes and latent visual features. Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation. Extensive experimentation has shown that our proposed method significantly enhances the quality of 3D generation while incurring no additional computational overhead for the underlying framework. (Project code: https://github.com/yjhboy/Hyper3DG)", + "arxiv_url": "http://arxiv.org/abs/2403.09236v1", + "pdf_url": "http://arxiv.org/pdf/2403.09236v1", + "published_date": "2024-03-14", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/yjhboy/Hyper3DG", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A New Split Algorithm for 3D Gaussian Splatting", + "authors": [ + "Qiyuan Feng", + "Gengchen Cao", + "Haoxiang Chen", + "Tai-Jiang Mu", + "Ralph R. Martin", + "Shi-Min Hu" + ], + "abstract": "3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to flatten large untextured regions, yielding a very sparse point cloud. These problems are caused by the non-uniform nature of 3D Gaussian splatting models, so in this paper, we propose a new 3D Gaussian splitting algorithm, which can produce a more uniform and surface-bounded 3D Gaussian splatting model. Our algorithm splits an $N$-dimensional Gaussian into two N-dimensional Gaussians. It ensures consistency of mathematical characteristics and similarity of appearance, allowing resulting 3D Gaussian splatting models to be more uniform and a better fit to the underlying surface, and thus more suitable for explicit editing, point cloud extraction and other tasks. Meanwhile, our 3D Gaussian splitting approach has a very simple closed-form solution, making it readily applicable to any 3D Gaussian model.", + "arxiv_url": "http://arxiv.org/abs/2403.09143v1", + "pdf_url": "http://arxiv.org/pdf/2403.09143v1", + "published_date": "2024-03-14", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing", + "authors": [ + "Jing Wu", + "Jia-Wang Bian", + "Xinghui Li", + "Guangrun Wang", + "Ian Reid", + "Philip Torr", + "Victor Adrian Prisacariu" + ], + "abstract": "We propose GaussCtrl, a text-driven method to edit a 3D scene reconstructed by the 3D Gaussian Splatting (3DGS). Our method first renders a collection of images by using the 3DGS and edits them by using a pre-trained 2D diffusion model (ControlNet) based on the input prompt, which is then used to optimise the 3D model. Our key contribution is multi-view consistent editing, which enables editing all images together instead of iteratively editing one image while updating the 3D model as in previous works. It leads to faster editing as well as higher visual quality. This is achieved by the two terms: (a) depth-conditioned editing that enforces geometric consistency across multi-view images by leveraging naturally consistent depth maps. (b) attention-based latent code alignment that unifies the appearance of edited images by conditioning their editing to several reference views through self and cross-view attention between images' latent representations. Experiments demonstrate that our method achieves faster editing and better visual results than previous state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2403.08733v4", + "pdf_url": "http://arxiv.org/pdf/2403.08733v4", + "published_date": "2024-03-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting", + "authors": [ + "Xinjie Zhang", + "Xingtong Ge", + "Tongda Xu", + "Dailan He", + "Yan Wang", + "Hongwei Qin", + "Guo Lu", + "Jing Geng", + "Jun Zhang" + ], + "abstract": "Implicit neural representations (INRs) recently achieved great success in image representation and compression, offering high visual quality and fast rendering speeds with 10-1000 FPS, assuming sufficient GPU resources are available. However, this requirement often hinders their use on low-end devices with limited memory. In response, we propose a groundbreaking paradigm of image representation and compression by 2D Gaussian Splatting, named GaussianImage. We first introduce 2D Gaussian to represent the image, where each Gaussian has 8 parameters including position, covariance and color. Subsequently, we unveil a novel rendering algorithm based on accumulated summation. Remarkably, our method with a minimum of 3$\\times$ lower GPU memory usage and 5$\\times$ faster fitting time not only rivals INRs (e.g., WIRE, I-NGP) in representation performance, but also delivers a faster rendering speed of 1500-2000 FPS regardless of parameter size. Furthermore, we integrate existing vector quantization technique to build an image codec. Experimental results demonstrate that our codec attains rate-distortion performance comparable to compression-based INRs such as COIN and COIN++, while facilitating decoding speeds of approximately 2000 FPS. Additionally, preliminary proof of concept shows that our codec surpasses COIN and COIN++ in performance when using partial bits-back coding. Code is available at https://github.com/Xinjie-Q/GaussianImage.", + "arxiv_url": "http://arxiv.org/abs/2403.08551v5", + "pdf_url": "http://arxiv.org/pdf/2403.08551v5", + "published_date": "2024-03-13", + "categories": [ + "eess.IV", + "cs.AI", + "cs.CV", + "cs.MM" + ], + "github_url": "https://github.com/Xinjie-Q/GaussianImage", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting in Style", + "authors": [ + "Abhishek Saroha", + "Mariia Gladkova", + "Cecilia Curreli", + "Dominik Muhle", + "Tarun Yenamandra", + "Daniel Cremers" + ], + "abstract": "3D scene stylization extends the work of neural style transfer to 3D. A vital challenge in this problem is to maintain the uniformity of the stylized appearance across multiple views. A vast majority of the previous works achieve this by training a 3D model for every stylized image and a set of multi-view images. In contrast, we propose a novel architecture trained on a collection of style images that, at test time, produces real time high-quality stylized novel views. We choose the underlying 3D scene representation for our model as 3D Gaussian splatting. We take the 3D Gaussians and process them using a multi-resolution hash grid and a tiny MLP to obtain stylized views. The MLP is conditioned on different style codes for generalization to different styles during test time. The explicit nature of 3D Gaussians gives us inherent advantages over NeRF-based methods, including geometric consistency and a fast training and rendering regime. This enables our method to be useful for various practical use cases, such as augmented or virtual reality. We demonstrate that our method achieves state-of-the-art performance with superior visual quality on various indoor and outdoor real-world data.", + "arxiv_url": "http://arxiv.org/abs/2403.08498v2", + "pdf_url": "http://arxiv.org/pdf/2403.08498v2", + "published_date": "2024-03-13", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", + "authors": [ + "Guanxing Lu", + "Shiyi Zhang", + "Ziwei Wang", + "Changliu Liu", + "Jiwen Lu", + "Yansong Tang" + ], + "abstract": "Performing language-conditioned robotic manipulation tasks in unstructured environments is highly demanded for general intelligent robots. Conventional robotic manipulation methods usually learn semantic representation of the observation for action prediction, which ignores the scene-level spatiotemporal dynamics for human goal completion. In this paper, we propose a dynamic Gaussian Splatting method named ManiGaussian for multi-task robotic manipulation, which mines scene dynamics via future scene reconstruction. Specifically, we first formulate the dynamic Gaussian Splatting framework that infers the semantics propagation in the Gaussian embedding space, where the semantic representation is leveraged to predict the optimal robot action. Then, we build a Gaussian world model to parameterize the distribution in our dynamic Gaussian Splatting framework, which provides informative supervision in the interactive environment via future scene reconstruction. We evaluate our ManiGaussian on 10 RLBench tasks with 166 variations, and the results demonstrate our framework can outperform the state-of-the-art methods by 13.1\\% in average success rate. Project page: https://guanxinglu.github.io/ManiGaussian/.", + "arxiv_url": "http://arxiv.org/abs/2403.08321v2", + "pdf_url": "http://arxiv.org/pdf/2403.08321v2", + "published_date": "2024-03-13", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting", + "authors": [ + "Kunhao Liu", + "Fangneng Zhan", + "Muyu Xu", + "Christian Theobalt", + "Ling Shao", + "Shijian Lu" + ], + "abstract": "We introduce StyleGaussian, a novel 3D style transfer technique that allows instant transfer of any image's style to a 3D scene at 10 frames per second (fps). Leveraging 3D Gaussian Splatting (3DGS), StyleGaussian achieves style transfer without compromising its real-time rendering ability and multi-view consistency. It achieves instant style transfer with three steps: embedding, transfer, and decoding. Initially, 2D VGG scene features are embedded into reconstructed 3D Gaussians. Next, the embedded features are transformed according to a reference style image. Finally, the transformed features are decoded into the stylized RGB. StyleGaussian has two novel designs. The first is an efficient feature rendering strategy that first renders low-dimensional features and then maps them into high-dimensional features while embedding VGG features. It cuts the memory consumption significantly and enables 3DGS to render the high-dimensional memory-intensive features. The second is a K-nearest-neighbor-based 3D CNN. Working as the decoder for the stylized features, it eliminates the 2D CNN operations that compromise strict multi-view consistency. Extensive experiments show that StyleGaussian achieves instant 3D stylization with superior stylization quality while preserving real-time rendering and strict multi-view consistency. Project page: https://kunhao-liu.github.io/StyleGaussian/", + "arxiv_url": "http://arxiv.org/abs/2403.07807v1", + "pdf_url": "http://arxiv.org/pdf/2403.07807v1", + "published_date": "2024-03-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SemGauss-SLAM: Dense Semantic Gaussian Splatting SLAM", + "authors": [ + "Siting Zhu", + "Renjie Qin", + "Guangming Wang", + "Jiuming Liu", + "Hesheng Wang" + ], + "abstract": "We propose SemGauss-SLAM, a dense semantic SLAM system utilizing 3D Gaussian representation, that enables accurate 3D semantic mapping, robust camera tracking, and high-quality rendering simultaneously. In this system, we incorporate semantic feature embedding into 3D Gaussian representation, which effectively encodes semantic information within the spatial layout of the environment for precise semantic scene representation. Furthermore, we propose feature-level loss for updating 3D Gaussian representation, enabling higher-level guidance for 3D Gaussian optimization. In addition, to reduce cumulative drift in tracking and improve semantic reconstruction accuracy, we introduce semantic-informed bundle adjustment leveraging multi-frame semantic associations for joint optimization of 3D Gaussian representation and camera poses, leading to low-drift tracking and accurate mapping. Our SemGauss-SLAM method demonstrates superior performance over existing radiance field-based SLAM methods in terms of mapping and tracking accuracy on Replica and ScanNet datasets, while also showing excellent capabilities in high-precision semantic segmentation and dense semantic mapping.", + "arxiv_url": "http://arxiv.org/abs/2403.07494v3", + "pdf_url": "http://arxiv.org/pdf/2403.07494v3", + "published_date": "2024-03-12", + "categories": [ + "cs.RO", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization", + "authors": [ + "Jiahe Li", + "Jiawei Zhang", + "Xiao Bai", + "Jin Zheng", + "Xin Ning", + "Jun Zhou", + "Lin Gu" + ], + "abstract": "Radiance fields have demonstrated impressive performance in synthesizing novel views from sparse input views, yet prevailing methods suffer from high training costs and slow inference speed. This paper introduces DNGaussian, a depth-regularized framework based on 3D Gaussian radiance fields, offering real-time and high-quality few-shot novel view synthesis at low costs. Our motivation stems from the highly efficient representation and surprising quality of the recent 3D Gaussian Splatting, despite it will encounter a geometry degradation when input views decrease. In the Gaussian radiance fields, we find this degradation in scene geometry primarily lined to the positioning of Gaussian primitives and can be mitigated by depth constraint. Consequently, we propose a Hard and Soft Depth Regularization to restore accurate scene geometry under coarse monocular depth supervision while maintaining a fine-grained color appearance. To further refine detailed geometry reshaping, we introduce Global-Local Depth Normalization, enhancing the focus on small local depth changes. Extensive experiments on LLFF, DTU, and Blender datasets demonstrate that DNGaussian outperforms state-of-the-art methods, achieving comparable or better results with significantly reduced memory cost, a $25 \\times$ reduction in training time, and over $3000 \\times$ faster rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2403.06912v3", + "pdf_url": "http://arxiv.org/pdf/2403.06912v3", + "published_date": "2024-03-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization", + "authors": [ + "Jiahui Zhang", + "Fangneng Zhan", + "Muyu Xu", + "Shijian Lu", + "Eric Xing" + ], + "abstract": "3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (e.g., Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently.", + "arxiv_url": "http://arxiv.org/abs/2403.06908v2", + "pdf_url": "http://arxiv.org/pdf/2403.06908v2", + "published_date": "2024-03-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "V3D: Video Diffusion Models are Effective 3D Generators", + "authors": [ + "Zilong Chen", + "Yikai Wang", + "Feng Wang", + "Zhengyi Wang", + "Huaping Liu" + ], + "abstract": "Automatic 3D generation has recently attracted widespread attention. Recent methods have greatly accelerated the generation speed, but usually produce less-detailed objects due to limited model capacity or 3D data. Motivated by recent advancements in video diffusion models, we introduce V3D, which leverages the world simulation capacity of pre-trained video diffusion models to facilitate 3D generation. To fully unleash the potential of video diffusion to perceive the 3D world, we further introduce geometrical consistency prior and extend the video diffusion model to a multi-view consistent 3D generator. Benefiting from this, the state-of-the-art video diffusion model could be fine-tuned to generate 360degree orbit frames surrounding an object given a single image. With our tailored reconstruction pipelines, we can generate high-quality meshes or 3D Gaussians within 3 minutes. Furthermore, our method can be extended to scene-level novel view synthesis, achieving precise control over the camera path with sparse input views. Extensive experiments demonstrate the superior performance of the proposed approach, especially in terms of generation quality and multi-view consistency. Our code is available at https://github.com/heheyas/V3D", + "arxiv_url": "http://arxiv.org/abs/2403.06738v1", + "pdf_url": "http://arxiv.org/pdf/2403.06738v1", + "published_date": "2024-03-11", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/heheyas/V3D", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GSEdit: Efficient Text-Guided Editing of 3D Objects via Gaussian Splatting", + "authors": [ + "Francesco Palandra", + "Andrea Sanchietti", + "Daniele Baieri", + "Emanuele Rodolà" + ], + "abstract": "We present GSEdit, a pipeline for text-guided 3D object editing based on Gaussian Splatting models. Our method enables the editing of the style and appearance of 3D objects without altering their main details, all in a matter of minutes on consumer hardware. We tackle the problem by leveraging Gaussian splatting to represent 3D scenes, and we optimize the model while progressively varying the image supervision by means of a pretrained image-based diffusion model. The input object may be given as a 3D triangular mesh, or directly provided as Gaussians from a generative model such as DreamGaussian. GSEdit ensures consistency across different viewpoints, maintaining the integrity of the original object's information. Compared to previously proposed methods relying on NeRF-like MLP models, GSEdit stands out for its efficiency, making 3D editing tasks much faster. Our editing process is refined via the application of the SDS loss, ensuring that our edits are both precise and accurate. Our comprehensive evaluation demonstrates that GSEdit effectively alters object shape and appearance following the given textual instructions while preserving their coherence and detail.", + "arxiv_url": "http://arxiv.org/abs/2403.05154v2", + "pdf_url": "http://arxiv.org/pdf/2403.05154v2", + "published_date": "2024-03-08", + "categories": [ + "cs.CV", + "cs.GR", + "68T45", + "I.2.10; I.3.8" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting", + "authors": [ + "Zhijing Shao", + "Zhaolong Wang", + "Zhuang Li", + "Duotun Wang", + "Xiangru Lin", + "Yu Zhang", + "Mingming Fan", + "Zeyu Wang" + ], + "abstract": "We present SplattingAvatar, a hybrid 3D representation of photorealistic human avatars with Gaussian Splatting embedded on a triangle mesh, which renders over 300 FPS on a modern GPU and 30 FPS on a mobile device. We disentangle the motion and appearance of a virtual human with explicit mesh geometry and implicit appearance modeling with Gaussian Splatting. The Gaussians are defined by barycentric coordinates and displacement on a triangle mesh as Phong surfaces. We extend lifted optimization to simultaneously optimize the parameters of the Gaussians while walking on the triangle mesh. SplattingAvatar is a hybrid representation of virtual humans where the mesh represents low-frequency motion and surface deformation, while the Gaussians take over the high-frequency geometry and detailed appearance. Unlike existing deformation methods that rely on an MLP-based linear blend skinning (LBS) field for motion, we control the rotation and translation of the Gaussians directly by mesh, which empowers its compatibility with various animation techniques, e.g., skeletal animation, blend shapes, and mesh editing. Trainable from monocular videos for both full-body and head avatars, SplattingAvatar shows state-of-the-art rendering quality across multiple datasets.", + "arxiv_url": "http://arxiv.org/abs/2403.05087v1", + "pdf_url": "http://arxiv.org/pdf/2403.05087v1", + "published_date": "2024-03-08", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling", + "authors": [ + "Cheng Peng", + "Yutao Tang", + "Yifan Zhou", + "Nengyu Wang", + "Xijun Liu", + "Deming Li", + "Rama Chellappa" + ], + "abstract": "Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, \\etc. Under these degradations, Gaussian-Splatting-based methods tend to overfit and produce worse results than Neural-Radiance-Field-based methods. To address this issue, we propose Blur Agnostic Gaussian Splatting (BAGS). BAGS introduces additional 2D modeling capacities such that a 3D-consistent and high quality scene can be reconstructed despite image-wise blur. Specifically, we model blur by estimating per-pixel convolution kernels from a Blur Proposal Network (BPN). BPN is designed to consider spatial, color, and depth variations of the scene to maximize modeling capacity. Additionally, BPN also proposes a quality-assessing mask, which indicates regions where blur occur. Finally, we introduce a coarse-to-fine kernel optimization scheme; this optimization scheme is fast and avoids sub-optimal solutions due to a sparse point cloud initialization, which often occurs when we apply Structure-from-Motion on blurry images. We demonstrate that BAGS achieves photorealistic renderings under various challenging blur conditions and imaging geometry, while significantly improving upon existing approaches.", + "arxiv_url": "http://arxiv.org/abs/2403.04926v2", + "pdf_url": "http://arxiv.org/pdf/2403.04926v2", + "published_date": "2024-03-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis", + "authors": [ + "Yuanhao Cai", + "Yixun Liang", + "Jiahao Wang", + "Angtian Wang", + "Yulun Zhang", + "Xiaokang Yang", + "Zongwei Zhou", + "Alan Yuille" + ], + "abstract": "X-ray is widely applied for transmission imaging due to its stronger penetration than natural light. When rendering novel view X-ray projections, existing methods mainly based on NeRF suffer from long training time and slow inference speed. In this paper, we propose a 3D Gaussian splatting-based framework, namely X-Gaussian, for X-ray novel view synthesis. Firstly, we redesign a radiative Gaussian point cloud model inspired by the isotropic nature of X-ray imaging. Our model excludes the influence of view direction when learning to predict the radiation intensity of 3D points. Based on this model, we develop a Differentiable Radiative Rasterization (DRR) with CUDA implementation. Secondly, we customize an Angle-pose Cuboid Uniform Initialization (ACUI) strategy that directly uses the parameters of the X-ray scanner to compute the camera information and then uniformly samples point positions within a cuboid enclosing the scanned object. Experiments show that our X-Gaussian outperforms state-of-the-art methods by 6.5 dB while enjoying less than 15% training time and over 73x inference speed. The application on sparse-view CT reconstruction also reveals the practical values of our method. Code is publicly available at https://github.com/caiyuanhao1998/X-Gaussian . A video demo of the training process visualization is at https://www.youtube.com/watch?v=gDVf_Ngeghg .", + "arxiv_url": "http://arxiv.org/abs/2403.04116v3", + "pdf_url": "http://arxiv.org/pdf/2403.04116v3", + "published_date": "2024-03-07", + "categories": [ + "eess.IV", + "cs.CV" + ], + "github_url": "https://github.com/caiyuanhao1998/X-Gaussian", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Online Photon Guiding with 3D Gaussians for Caustics Rendering", + "authors": [ + "Jiawei Huang", + "Hajime Tanaka", + "Taku Komura", + "Yoshifumi Kitamura" + ], + "abstract": "In production rendering systems, caustics are typically rendered via photon mapping and gathering, a process often hindered by insufficient photon density. In this paper, we propose a novel photon guiding method to improve the photon density and overall quality for caustic rendering. The key insight of our approach is the application of a global 3D Gaussian mixture model, used in conjunction with an adaptive light sampler. This combination effectively guides photon emission in expansive 3D scenes with multiple light sources. By employing a global 3D Gaussian mixture, our method precisely models the distribution of the points of interest. To sample emission directions from the distribution at any observation point, we introduce a novel directional transform of the 3D Gaussian, which ensures accurate photon emission guiding. Furthermore, our method integrates a global light cluster tree, which models the contribution distribution of light sources to the image, facilitating effective light source selection. We conduct experiments demonstrating that our approach robustly outperforms existing photon guiding techniques across a variety of scenarios, significantly advancing the quality of caustic rendering.", + "arxiv_url": "http://arxiv.org/abs/2403.03641v2", + "pdf_url": "http://arxiv.org/pdf/2403.03641v2", + "published_date": "2024-03-06", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps", + "authors": [ + "Timothy Chen", + "Ola Shorinwa", + "Joseph Bruno", + "Javier Yu", + "Weijia Zeng", + "Keiko Nagami", + "Philip Dames", + "Mac Schwager" + ], + "abstract": "We present Splat-Nav, a real-time navigation pipeline designed to work with environment representations generated by Gaussian Splatting (GSplat), a popular emerging 3D scene representation from computer vision. Splat-Nav consists of two components: 1) Splat-Plan, a safe planning module, and 2) Splat-Loc, a robust pose estimation module. Splat-Plan builds a safe-by-construction polytope corridor through the map based on mathematically rigorous collision constraints and then constructs a B\\'ezier curve trajectory through this corridor. Splat-Loc provides a robust state estimation module, leveraging the point-cloud representation inherent in GSplat scenes for global pose initialization, in the absence of prior knowledge, and recursive real-time pose localization, given only RGB images. The most compute-intensive procedures in our navigation pipeline, such as the computation of the B\\'ezier trajectories and the pose optimization problem run primarily on the CPU, freeing up GPU resources for GPU-intensive tasks, such as online training of Gaussian Splats. We demonstrate the safety and robustness of our pipeline in both simulation and hardware experiments, where we show online re-planning at 5 Hz and pose estimation at about 25 Hz, an order of magnitude faster than Neural Radiance Field (NeRF)-based navigation methods, thereby enabling real-time navigation.", + "arxiv_url": "http://arxiv.org/abs/2403.02751v2", + "pdf_url": "http://arxiv.org/pdf/2403.02751v2", + "published_date": "2024-03-05", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos", + "authors": [ + "Jiakai Sun", + "Han Jiao", + "Guangyuan Li", + "Zhanjie Zhang", + "Lei Zhao", + "Wei Xing" + ], + "abstract": "Constructing photo-realistic Free-Viewpoint Videos (FVVs) of dynamic scenes from multi-view videos remains a challenging endeavor. Despite the remarkable advancements achieved by current neural rendering techniques, these methods generally require complete video sequences for offline training and are not capable of real-time rendering. To address these constraints, we introduce 3DGStream, a method designed for efficient FVV streaming of real-world dynamic scenes. Our method achieves fast on-the-fly per-frame reconstruction within 12 seconds and real-time rendering at 200 FPS. Specifically, we utilize 3D Gaussians (3DGs) to represent the scene. Instead of the na\\\"ive approach of directly optimizing 3DGs per-frame, we employ a compact Neural Transformation Cache (NTC) to model the translations and rotations of 3DGs, markedly reducing the training time and storage required for each FVV frame. Furthermore, we propose an adaptive 3DG addition strategy to handle emerging objects in dynamic scenes. Experiments demonstrate that 3DGStream achieves competitive performance in terms of rendering speed, image quality, training time, and model storage when compared with state-of-the-art methods.", + "arxiv_url": "http://arxiv.org/abs/2403.01444v4", + "pdf_url": "http://arxiv.org/pdf/2403.01444v4", + "published_date": "2024-03-03", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "neural rendering", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Model for Animation and Texturing", + "authors": [ + "Xiangzhi Eric Wang", + "Zackary P. T. Sin" + ], + "abstract": "3D Gaussian Splatting has made a marked impact on neural rendering by achieving impressive fidelity and performance. Despite this achievement, however, it is not readily applicable to developing interactive applications. Real-time applications like XR apps and games require functions such as animation, UV-mapping, and model editing simultaneously manipulated through the usage of a 3D model. We propose a modeling that is analogous to typical 3D models, which we call 3D Gaussian Model (3DGM); it provides a manipulatable proxy for novel animation and texture transfer. By binding the 3D Gaussians in texture space and re-projecting them back to world space through implicit shell mapping, we show how our 3D modeling can serve as a valid rendering methodology for interactive applications. It is further noted that recently, 3D mesh reconstruction works have been able to produce high-quality mesh for rendering. Our work, on the other hand, only requires an approximated geometry for rendering an object in high fidelity. Applicationwise, we will show that our proxy-based 3DGM is capable of driving novel animation without animated training data and texture transferring via UV mapping of the 3D Gaussians. We believe the result indicates the potential of our work for enabling interactive applications for 3D Gaussian Splatting.", + "arxiv_url": "http://arxiv.org/abs/2402.19441v1", + "pdf_url": "http://arxiv.org/pdf/2402.19441v1", + "published_date": "2024-02-29", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction", + "authors": [ + "Jiaqi Lin", + "Zhihao Li", + "Xiao Tang", + "Jianzhuang Liu", + "Shiyong Liu", + "Jiayue Liu", + "Yangdi Lu", + "Xiaofei Wu", + "Songcen Xu", + "Youliang Yan", + "Wenming Yang" + ], + "abstract": "Existing NeRF-based methods for large scene reconstruction often have limitations in visual quality and rendering speed. While the recent 3D Gaussian Splatting works well on small-scale and object-centric scenes, scaling it up to large scenes poses challenges due to limited video memory, long optimization time, and noticeable appearance variations. To address these challenges, we present VastGaussian, the first method for high-quality reconstruction and real-time rendering on large scenes based on 3D Gaussian Splatting. We propose a progressive partitioning strategy to divide a large scene into multiple cells, where the training cameras and point cloud are properly distributed with an airspace-aware visibility criterion. These cells are merged into a complete scene after parallel optimization. We also introduce decoupled appearance modeling into the optimization process to reduce appearance variations in the rendered images. Our approach outperforms existing NeRF-based methods and achieves state-of-the-art results on multiple large scene datasets, enabling fast optimization and high-fidelity real-time rendering.", + "arxiv_url": "http://arxiv.org/abs/2402.17427v1", + "pdf_url": "http://arxiv.org/pdf/2402.17427v1", + "published_date": "2024-02-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GVA: Reconstructing Vivid 3D Gaussian Avatars from Monocular Videos", + "authors": [ + "Xinqi Liu", + "Chenming Wu", + "Jialun Liu", + "Xing Liu", + "Jinbo Wu", + "Chen Zhao", + "Haocheng Feng", + "Errui Ding", + "Jingdong Wang" + ], + "abstract": "In this paper, we present a novel method that facilitates the creation of vivid 3D Gaussian avatars from monocular video inputs (GVA). Our innovation lies in addressing the intricate challenges of delivering high-fidelity human body reconstructions and aligning 3D Gaussians with human skin surfaces accurately. The key contributions of this paper are twofold. Firstly, we introduce a pose refinement technique to improve hand and foot pose accuracy by aligning normal maps and silhouettes. Precise pose is crucial for correct shape and appearance reconstruction. Secondly, we address the problems of unbalanced aggregation and initialization bias that previously diminished the quality of 3D Gaussian avatars, through a novel surface-guided re-initialization method that ensures accurate alignment of 3D Gaussian points with avatar surfaces. Experimental results demonstrate that our proposed method achieves high-fidelity and vivid 3D Gaussian avatar reconstruction. Extensive experimental analyses validate the performance qualitatively and quantitatively, demonstrating that it achieves state-of-the-art performance in photo-realistic novel view synthesis while offering fine-grained control over the human body and hand pose. Project page: https://3d-aigc.github.io/GVA/.", + "arxiv_url": "http://arxiv.org/abs/2402.16607v2", + "pdf_url": "http://arxiv.org/pdf/2402.16607v2", + "published_date": "2024-02-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting", + "authors": [ + "Ziyi Yang", + "Xinyu Gao", + "Yangtian Sun", + "Yihua Huang", + "Xiaoyang Lyu", + "Wen Zhou", + "Shaohui Jiao", + "Xiaojuan Qi", + "Xiaogang Jin" + ], + "abstract": "The recent advancements in 3D Gaussian splatting (3D-GS) have not only facilitated real-time rendering through modern GPU rasterization pipelines but have also attained state-of-the-art rendering quality. Nevertheless, despite its exceptional rendering quality and performance on standard datasets, 3D-GS frequently encounters difficulties in accurately modeling specular and anisotropic components. This issue stems from the limited ability of spherical harmonics (SH) to represent high-frequency information. To overcome this challenge, we introduce Spec-Gaussian, an approach that utilizes an anisotropic spherical Gaussian (ASG) appearance field instead of SH for modeling the view-dependent appearance of each 3D Gaussian. Additionally, we have developed a coarse-to-fine training strategy to improve learning efficiency and eliminate floaters caused by overfitting in real-world scenes. Our experimental results demonstrate that our method surpasses existing approaches in terms of rendering quality. Thanks to ASG, we have significantly improved the ability of 3D-GS to model scenes with specular and anisotropic components without increasing the number of 3D Gaussians. This improvement extends the applicability of 3D GS to handle intricate scenarios with specular and anisotropic surfaces. Project page is https://ingra14m.github.io/Spec-Gaussian-website/.", + "arxiv_url": "http://arxiv.org/abs/2402.15870v2", + "pdf_url": "http://arxiv.org/pdf/2402.15870v2", + "published_date": "2024-02-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianPro: 3D Gaussian Splatting with Progressive Propagation", + "authors": [ + "Kai Cheng", + "Xiaoxiao Long", + "Kaizhi Yang", + "Yao Yao", + "Wei Yin", + "Yuexin Ma", + "Wenping Wang", + "Xuejin Chen" + ], + "abstract": "The advent of 3D Gaussian Splatting (3DGS) has recently brought about a revolution in the field of neural rendering, facilitating high-quality renderings at real-time speed. However, 3DGS heavily depends on the initialized point cloud produced by Structure-from-Motion (SfM) techniques. When tackling with large-scale scenes that unavoidably contain texture-less surfaces, the SfM techniques always fail to produce enough points in these surfaces and cannot provide good initialization for 3DGS. As a result, 3DGS suffers from difficult optimization and low-quality renderings. In this paper, inspired by classical multi-view stereo (MVS) techniques, we propose GaussianPro, a novel method that applies a progressive propagation strategy to guide the densification of the 3D Gaussians. Compared to the simple split and clone strategies used in 3DGS, our method leverages the priors of the existing reconstructed geometries of the scene and patch matching techniques to produce new Gaussians with accurate positions and orientations. Experiments on both large-scale and small-scale scenes validate the effectiveness of our method, where our method significantly surpasses 3DGS on the Waymo dataset, exhibiting an improvement of 1.15dB in terms of PSNR.", + "arxiv_url": "http://arxiv.org/abs/2402.14650v1", + "pdf_url": "http://arxiv.org/pdf/2402.14650v1", + "published_date": "2024-02-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Identifying Unnecessary 3D Gaussians using Clustering for Fast Rendering of 3D Gaussian Splatting", + "authors": [ + "Joongho Jo", + "Hyeongwon Kim", + "Jongsun Park" + ], + "abstract": "3D Gaussian splatting (3D-GS) is a new rendering approach that outperforms the neural radiance field (NeRF) in terms of both speed and image quality. 3D-GS represents 3D scenes by utilizing millions of 3D Gaussians and projects these Gaussians onto the 2D image plane for rendering. However, during the rendering process, a substantial number of unnecessary 3D Gaussians exist for the current view direction, resulting in significant computation costs associated with their identification. In this paper, we propose a computational reduction technique that quickly identifies unnecessary 3D Gaussians in real-time for rendering the current view without compromising image quality. This is accomplished through the offline clustering of 3D Gaussians that are close in distance, followed by the projection of these clusters onto a 2D image plane during runtime. Additionally, we analyze the bottleneck associated with the proposed technique when executed on GPUs and propose an efficient hardware architecture that seamlessly supports the proposed scheme. For the Mip-NeRF360 dataset, the proposed technique excludes 63% of 3D Gaussians on average before the 2D image projection, which reduces the overall rendering computation by almost 38.3% without sacrificing peak-signal-to-noise-ratio (PSNR). The proposed accelerator also achieves a speedup of 10.7x compared to a GPU.", + "arxiv_url": "http://arxiv.org/abs/2402.13827v2", + "pdf_url": "http://arxiv.org/pdf/2402.13827v2", + "published_date": "2024-02-21", + "categories": [ + "cs.CV", + "cs.AR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "How NeRFs and 3D Gaussian Splatting are Reshaping SLAM: a Survey", + "authors": [ + "Fabio Tosi", + "Youmin Zhang", + "Ziren Gong", + "Erik Sandström", + "Stefano Mattoccia", + "Martin R. Oswald", + "Matteo Poggi" + ], + "abstract": "Over the past two decades, research in the field of Simultaneous Localization and Mapping (SLAM) has undergone a significant evolution, highlighting its critical role in enabling autonomous exploration of unknown environments. This evolution ranges from hand-crafted methods, through the era of deep learning, to more recent developments focused on Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) representations. Recognizing the growing body of research and the absence of a comprehensive survey on the topic, this paper aims to provide the first comprehensive overview of SLAM progress through the lens of the latest advancements in radiance fields. It sheds light on the background, evolutionary path, inherent strengths and limitations, and serves as a fundamental reference to highlight the dynamic progress and specific challenges.", + "arxiv_url": "http://arxiv.org/abs/2402.13255v2", + "pdf_url": "http://arxiv.org/pdf/2402.13255v2", + "published_date": "2024-02-20", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianHair: Hair Modeling and Rendering with Light-aware Gaussians", + "authors": [ + "Haimin Luo", + "Min Ouyang", + "Zijun Zhao", + "Suyi Jiang", + "Longwen Zhang", + "Qixuan Zhang", + "Wei Yang", + "Lan Xu", + "Jingyi Yu" + ], + "abstract": "Hairstyle reflects culture and ethnicity at first glance. In the digital era, various realistic human hairstyles are also critical to high-fidelity digital human assets for beauty and inclusivity. Yet, realistic hair modeling and real-time rendering for animation is a formidable challenge due to its sheer number of strands, complicated structures of geometry, and sophisticated interaction with light. This paper presents GaussianHair, a novel explicit hair representation. It enables comprehensive modeling of hair geometry and appearance from images, fostering innovative illumination effects and dynamic animation capabilities. At the heart of GaussianHair is the novel concept of representing each hair strand as a sequence of connected cylindrical 3D Gaussian primitives. This approach not only retains the hair's geometric structure and appearance but also allows for efficient rasterization onto a 2D image plane, facilitating differentiable volumetric rendering. We further enhance this model with the \"GaussianHair Scattering Model\", adept at recreating the slender structure of hair strands and accurately capturing their local diffuse color in uniform lighting. Through extensive experiments, we substantiate that GaussianHair achieves breakthroughs in both geometric and appearance fidelity, transcending the limitations encountered in state-of-the-art methods for hair reconstruction. Beyond representation, GaussianHair extends to support editing, relighting, and dynamic rendering of hair, offering seamless integration with conventional CG pipeline workflows. Complementing these advancements, we have compiled an extensive dataset of real human hair, each with meticulously detailed strand geometry, to propel further research in this field.", + "arxiv_url": "http://arxiv.org/abs/2402.10483v1", + "pdf_url": "http://arxiv.org/pdf/2402.10483v1", + "published_date": "2024-02-16", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianObject: High-Quality 3D Object Reconstruction from Four Views with Gaussian Splatting", + "authors": [ + "Chen Yang", + "Sikuang Li", + "Jiemin Fang", + "Ruofan Liang", + "Lingxi Xie", + "Xiaopeng Zhang", + "Wei Shen", + "Qi Tian" + ], + "abstract": "Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience. However, images from sparse views only contain very limited 3D information, leading to two significant challenges: 1) Difficulty in building multi-view consistency as images for matching are too few; 2) Partially omitted or highly compressed object information as view coverage is insufficient. To tackle these challenges, we propose GaussianObject, a framework to represent and render the 3D object with Gaussian splatting that achieves high rendering quality with only 4 input images. We first introduce techniques of visual hull and floater elimination, which explicitly inject structure priors into the initial optimization process to help build multi-view consistency, yielding a coarse 3D Gaussian representation. Then we construct a Gaussian repair model based on diffusion models to supplement the omitted object information, where Gaussians are further refined. We design a self-generating strategy to obtain image pairs for training the repair model. We further design a COLMAP-free variant, where pre-given accurate camera poses are not required, which achieves competitive quality and facilitates wider applications. GaussianObject is evaluated on several challenging datasets, including MipNeRF360, OmniObject3D, OpenIllumination, and our-collected unposed images, achieving superior performance from only four views and significantly outperforming previous SOTA methods. Our demo is available at https://gaussianobject.github.io/, and the code has been released at https://github.com/GaussianObject/GaussianObject.", + "arxiv_url": "http://arxiv.org/abs/2402.10259v4", + "pdf_url": "http://arxiv.org/pdf/2402.10259v4", + "published_date": "2024-02-15", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "https://github.com/GaussianObject/GaussianObject", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering", + "authors": [ + "Abdullah Hamdi", + "Luke Melas-Kyriazi", + "Jinjie Mai", + "Guocheng Qian", + "Ruoshi Liu", + "Carl Vondrick", + "Bernard Ghanem", + "Andrea Vedaldi" + ], + "abstract": "Advancements in 3D Gaussian Splatting have significantly accelerated 3D reconstruction and generation. However, it may require a large number of Gaussians, which creates a substantial memory footprint. This paper introduces GES (Generalized Exponential Splatting), a novel representation that employs Generalized Exponential Function (GEF) to model 3D scenes, requiring far fewer particles to represent a scene and thus significantly outperforming Gaussian Splatting methods in efficiency with a plug-and-play replacement ability for Gaussian-based utilities. GES is validated theoretically and empirically in both principled 1D setup and realistic 3D scenes. It is shown to represent signals with sharp edges more accurately, which are typically challenging for Gaussians due to their inherent low-pass characteristics. Our empirical analysis demonstrates that GEF outperforms Gaussians in fitting natural-occurring signals (e.g. squares, triangles, and parabolic signals), thereby reducing the need for extensive splitting operations that increase the memory footprint of Gaussian Splatting. With the aid of a frequency-modulated loss, GES achieves competitive performance in novel-view synthesis benchmarks while requiring less than half the memory storage of Gaussian Splatting and increasing the rendering speed by up to 39%. The code is available on the project website https://abdullahamdi.com/ges .", + "arxiv_url": "http://arxiv.org/abs/2402.10128v2", + "pdf_url": "http://arxiv.org/pdf/2402.10128v2", + "published_date": "2024-02-15", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Magic-Me: Identity-Specific Video Customized Diffusion", + "authors": [ + "Ze Ma", + "Daquan Zhou", + "Chun-Hsiao Yeh", + "Xue-She Wang", + "Xiuyu Li", + "Huanrui Yang", + "Zhen Dong", + "Kurt Keutzer", + "Jiashi Feng" + ], + "abstract": "Creating content with specified identities (ID) has attracted significant interest in the field of generative models. In the field of text-to-image generation (T2I), subject-driven creation has achieved great progress with the identity controlled via reference images. However, its extension to video generation is not well explored. In this work, we propose a simple yet effective subject identity controllable video generation framework, termed Video Custom Diffusion (VCD). With a specified identity defined by a few images, VCD reinforces the identity characteristics and injects frame-wise correlation at the initialization stage for stable video outputs. To achieve this, we propose three novel components that are essential for high-quality identity preservation and stable video generation: 1) a noise initialization method with 3D Gaussian Noise Prior for better inter-frame stability; 2) an ID module based on extended Textual Inversion trained with the cropped identity to disentangle the ID information from the background 3) Face VCD and Tiled VCD modules to reinforce faces and upscale the video to higher resolution while preserving the identity's features. We conducted extensive experiments to verify that VCD is able to generate stable videos with better ID over the baselines. Besides, with the transferability of the encoded identity in the ID module, VCD is also working well with personalized text-to-image models available publicly. The codes are available at https://github.com/Zhen-Dong/Magic-Me.", + "arxiv_url": "http://arxiv.org/abs/2402.09368v2", + "pdf_url": "http://arxiv.org/pdf/2402.09368v2", + "published_date": "2024-02-14", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "https://github.com/Zhen-Dong/Magic-Me", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "IM-3D: Iterative Multiview Diffusion and Reconstruction for High-Quality 3D Generation", + "authors": [ + "Luke Melas-Kyriazi", + "Iro Laina", + "Christian Rupprecht", + "Natalia Neverova", + "Andrea Vedaldi", + "Oran Gafni", + "Filippos Kokkinos" + ], + "abstract": "Most text-to-3D generators build upon off-the-shelf text-to-image models trained on billions of images. They use variants of Score Distillation Sampling (SDS), which is slow, somewhat unstable, and prone to artifacts. A mitigation is to fine-tune the 2D generator to be multi-view aware, which can help distillation or can be combined with reconstruction networks to output 3D objects directly. In this paper, we further explore the design space of text-to-3D models. We significantly improve multi-view generation by considering video instead of image generators. Combined with a 3D reconstruction algorithm which, by using Gaussian splatting, can optimize a robust image-based loss, we directly produce high-quality 3D outputs from the generated views. Our new method, IM-3D, reduces the number of evaluations of the 2D generator network 10-100x, resulting in a much more efficient pipeline, better quality, fewer geometric inconsistencies, and higher yield of usable 3D assets.", + "arxiv_url": "http://arxiv.org/abs/2402.08682v1", + "pdf_url": "http://arxiv.org/pdf/2402.08682v1", + "published_date": "2024-02-13", + "categories": [ + "cs.CV", + "cs.AI", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting", + "authors": [ + "Xiaoyu Zhou", + "Xingjian Ran", + "Yajiao Xiong", + "Jinlin He", + "Zhiwei Lin", + "Yongtao Wang", + "Deqing Sun", + "Ming-Hsuan Yang" + ], + "abstract": "We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation. We first utilize large language models (LLMs) to generate the initial layout and introduce a layout-guided 3D Gaussian representation for 3D content generation with adaptive geometric constraints. We then propose an instance-scene compositional optimization mechanism with conditioned diffusion to collaboratively generate realistic 3D scenes with consistent geometry, texture, scale, and accurate interactions among multiple objects while simultaneously adjusting the coarse layout priors extracted from the LLMs to align with the generated scene. Experiments show that GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. The source codes and models will be available at gala3d.github.io.", + "arxiv_url": "http://arxiv.org/abs/2402.07207v2", + "pdf_url": "http://arxiv.org/pdf/2402.07207v2", + "published_date": "2024-02-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian as a New Era: A Survey", + "authors": [ + "Ben Fei", + "Jingyi Xu", + "Rui Zhang", + "Qingyuan Zhou", + "Weidong Yang", + "Ying He" + ], + "abstract": "3D Gaussian Splatting (3D-GS) has emerged as a significant advancement in the field of Computer Graphics, offering explicit scene representation and novel view synthesis without the reliance on neural networks, such as Neural Radiance Fields (NeRF). This technique has found diverse applications in areas such as robotics, urban mapping, autonomous navigation, and virtual reality/augmented reality, just name a few. Given the growing popularity and expanding research in 3D Gaussian Splatting, this paper presents a comprehensive survey of relevant papers from the past year. We organize the survey into taxonomies based on characteristics and applications, providing an introduction to the theoretical underpinnings of 3D Gaussian Splatting. Our goal through this survey is to acquaint new researchers with 3D Gaussian Splatting, serve as a valuable reference for seminal works in the field, and inspire future research directions, as discussed in our concluding section.", + "arxiv_url": "http://arxiv.org/abs/2402.07181v2", + "pdf_url": "http://arxiv.org/pdf/2402.07181v2", + "published_date": "2024-02-11", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Deepfake for the Good: Generating Avatars through Face-Swapping with Implicit Deepfake Generation", + "authors": [ + "Georgii Stanishevskii", + "Jakub Steczkiewicz", + "Tomasz Szczepanik", + "Sławomir Tadeja", + "Jacek Tabor", + "Przemysław Spurek" + ], + "abstract": "Numerous emerging deep-learning techniques have had a substantial impact on computer graphics. Among the most promising breakthroughs are the rise of Neural Radiance Fields (NeRFs) and Gaussian Splatting (GS). NeRFs encode the object's shape and color in neural network weights using a handful of images with known camera positions to generate novel views. In contrast, GS provides accelerated training and inference without a decrease in rendering quality by encoding the object's characteristics in a collection of Gaussian distributions. These two techniques have found many use cases in spatial computing and other domains. On the other hand, the emergence of deepfake methods has sparked considerable controversy. Deepfakes refers to artificial intelligence-generated videos that closely mimic authentic footage. Using generative models, they can modify facial features, enabling the creation of altered identities or expressions that exhibit a remarkably realistic appearance to a real person. Despite these controversies, deepfake can offer a next-generation solution for avatar creation and gaming when of desirable quality. To that end, we show how to combine all these emerging technologies to obtain a more plausible outcome. Our ImplicitDeepfake uses the classical deepfake algorithm to modify all training images separately and then train NeRF and GS on modified faces. Such simple strategies can produce plausible 3D deepfake-based avatars.", + "arxiv_url": "http://arxiv.org/abs/2402.06390v2", + "pdf_url": "http://arxiv.org/pdf/2402.06390v2", + "published_date": "2024-02-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-CLIP: Gaussian Splatting for Contrastive Language-Image-3D Pretraining from Real-World Data", + "authors": [ + "Haoyuan Li", + "Yanpeng Zhou", + "Yihan Zeng", + "Hang Xu", + "Xiaodan Liang" + ], + "abstract": "3D Shape represented as point cloud has achieve advancements in multimodal pre-training to align image and language descriptions, which is curial to object identification, classification, and retrieval. However, the discrete representations of point cloud lost the object's surface shape information and creates a gap between rendering results and 2D correspondences. To address this problem, we propose GS-CLIP for the first attempt to introduce 3DGS (3D Gaussian Splatting) into multimodal pre-training to enhance 3D representation. GS-CLIP leverages a pre-trained vision-language model for a learned common visual and textual space on massive real world image-text pairs and then learns a 3D Encoder for aligning 3DGS optimized per object. Additionally, a novel Gaussian-Aware Fusion is proposed to extract and fuse global explicit feature. As a general framework for language-image-3D pre-training, GS-CLIP is agnostic to 3D backbone networks. Experiments on challenging shows that GS-CLIP significantly improves the state-of-the-art, outperforming the previously best results.", + "arxiv_url": "http://arxiv.org/abs/2402.06198v2", + "pdf_url": "http://arxiv.org/pdf/2402.06198v2", + "published_date": "2024-02-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting", + "authors": [ + "Zhenglin Zhou", + "Fan Ma", + "Hehe Fan", + "Yi Yang" + ], + "abstract": "Creating digital avatars from textual prompts has long been a desirable yet challenging task. Despite the promising outcomes obtained through 2D diffusion priors in recent works, current methods face challenges in achieving high-quality and animated avatars effectively. In this paper, we present $\\textbf{HeadStudio}$, a novel framework that utilizes 3D Gaussian splatting to generate realistic and animated avatars from text prompts. Our method drives 3D Gaussians semantically to create a flexible and achievable appearance through the intermediate FLAME representation. Specifically, we incorporate the FLAME into both 3D representation and score distillation: 1) FLAME-based 3D Gaussian splatting, driving 3D Gaussian points by rigging each point to a FLAME mesh. 2) FLAME-based score distillation sampling, utilizing FLAME-based fine-grained control signal to guide score distillation from the text prompt. Extensive experiments demonstrate the efficacy of HeadStudio in generating animatable avatars from textual prompts, exhibiting visually appealing appearances. The avatars are capable of rendering high-quality real-time ($\\geq 40$ fps) novel views at a resolution of 1024. They can be smoothly controlled by real-world speech and video. We hope that HeadStudio can advance digital avatar creation and that the present method can widely be applied across various domains.", + "arxiv_url": "http://arxiv.org/abs/2402.06149v1", + "pdf_url": "http://arxiv.org/pdf/2402.06149v1", + "published_date": "2024-02-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mesh-based Gaussian Splatting for Real-time Large-scale Deformation", + "authors": [ + "Lin Gao", + "Jie Yang", + "Bo-Tao Zhang", + "Jia-Mu Sun", + "Yu-Jie Yuan", + "Hongbo Fu", + "Yu-Kun Lai" + ], + "abstract": "Neural implicit representations, including Neural Distance Fields and Neural Radiance Fields, have demonstrated significant capabilities for reconstructing surfaces with complicated geometry and topology, and generating novel views of a scene. Nevertheless, it is challenging for users to directly deform or manipulate these implicit representations with large deformations in the real-time fashion. Gaussian Splatting(GS) has recently become a promising method with explicit geometry for representing static scenes and facilitating high-quality and real-time synthesis of novel views. However,it cannot be easily deformed due to the use of discrete Gaussians and lack of explicit topology. To address this, we develop a novel GS-based method that enables interactive deformation. Our key idea is to design an innovative mesh-based GS representation, which is integrated into Gaussian learning and manipulation. 3D Gaussians are defined over an explicit mesh, and they are bound with each other: the rendering of 3D Gaussians guides the mesh face split for adaptive refinement, and the mesh face split directs the splitting of 3D Gaussians. Moreover, the explicit mesh constraints help regularize the Gaussian distribution, suppressing poor-quality Gaussians(e.g. misaligned Gaussians,long-narrow shaped Gaussians), thus enhancing visual quality and avoiding artifacts during deformation. Based on this representation, we further introduce a large-scale Gaussian deformation technique to enable deformable GS, which alters the parameters of 3D Gaussians according to the manipulation of the associated mesh. Our method benefits from existing mesh deformation datasets for more realistic data-driven Gaussian deformation. Extensive experiments show that our approach achieves high-quality reconstruction and effective deformation, while maintaining the promising rendering results at a high frame rate(65 FPS on average).", + "arxiv_url": "http://arxiv.org/abs/2402.04796v1", + "pdf_url": "http://arxiv.org/pdf/2402.04796v1", + "published_date": "2024-02-07", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Rig3DGS: Creating Controllable Portraits from Casual Monocular Videos", + "authors": [ + "Alfredo Rivero", + "ShahRukh Athar", + "Zhixin Shu", + "Dimitris Samaras" + ], + "abstract": "Creating controllable 3D human portraits from casual smartphone videos is highly desirable due to their immense value in AR/VR applications. The recent development of 3D Gaussian Splatting (3DGS) has shown improvements in rendering quality and training efficiency. However, it still remains a challenge to accurately model and disentangle head movements and facial expressions from a single-view capture to achieve high-quality renderings. In this paper, we introduce Rig3DGS to address this challenge. We represent the entire scene, including the dynamic subject, using a set of 3D Gaussians in a canonical space. Using a set of control signals, such as head pose and expressions, we transform them to the 3D space with learned deformations to generate the desired rendering. Our key innovation is a carefully designed deformation method which is guided by a learnable prior derived from a 3D morphable model. This approach is highly efficient in training and effective in controlling facial expressions, head positions, and view synthesis across various captures. We demonstrate the effectiveness of our learned deformation through extensive quantitative and qualitative experiments. The project page can be found at http://shahrukhathar.github.io/2024/02/05/Rig3DGS.html", + "arxiv_url": "http://arxiv.org/abs/2402.03723v1", + "pdf_url": "http://arxiv.org/pdf/2402.03723v1", + "published_date": "2024-02-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "4D-Rotor Gaussian Splatting: Towards Efficient Novel View Synthesis for Dynamic Scenes", + "authors": [ + "Yuanxing Duan", + "Fangyin Wei", + "Qiyu Dai", + "Yuhang He", + "Wenzheng Chen", + "Baoquan Chen" + ], + "abstract": "We consider the problem of novel-view synthesis (NVS) for dynamic scenes. Recent neural approaches have accomplished exceptional NVS results for static 3D scenes, but extensions to 4D time-varying scenes remain non-trivial. Prior efforts often encode dynamics by learning a canonical space plus implicit or explicit deformation fields, which struggle in challenging scenarios like sudden movements or generating high-fidelity renderings. In this paper, we introduce 4D Gaussian Splatting (4DRotorGS), a novel method that represents dynamic scenes with anisotropic 4D XYZT Gaussians, inspired by the success of 3D Gaussian Splatting in static scenes. We model dynamics at each timestamp by temporally slicing the 4D Gaussians, which naturally compose dynamic 3D Gaussians and can be seamlessly projected into images. As an explicit spatial-temporal representation, 4DRotorGS demonstrates powerful capabilities for modeling complicated dynamics and fine details--especially for scenes with abrupt motions. We further implement our temporal slicing and splatting techniques in a highly optimized CUDA acceleration framework, achieving real-time inference rendering speeds of up to 277 FPS on an RTX 3090 GPU and 583 FPS on an RTX 4090 GPU. Rigorous evaluations on scenes with diverse motions showcase the superior efficiency and effectiveness of 4DRotorGS, which consistently outperforms existing methods both quantitatively and qualitatively.", + "arxiv_url": "http://arxiv.org/abs/2402.03307v3", + "pdf_url": "http://arxiv.org/pdf/2402.03307v3", + "published_date": "2024-02-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM", + "authors": [ + "Mingrui Li", + "Shuhong Liu", + "Heng Zhou", + "Guohao Zhu", + "Na Cheng", + "Tianchen Deng", + "Hongyu Wang" + ], + "abstract": "We present SGS-SLAM, the first semantic visual SLAM system based on Gaussian Splatting. It incorporates appearance, geometry, and semantic features through multi-channel optimization, addressing the oversmoothing limitations of neural implicit SLAM systems in high-quality rendering, scene understanding, and object-level geometry. We introduce a unique semantic feature loss that effectively compensates for the shortcomings of traditional depth and color losses in object optimization. Through a semantic-guided keyframe selection strategy, we prevent erroneous reconstructions caused by cumulative errors. Extensive experiments demonstrate that SGS-SLAM delivers state-of-the-art performance in camera pose estimation, map reconstruction, precise semantic segmentation, and object-level geometric accuracy, while ensuring real-time rendering capabilities.", + "arxiv_url": "http://arxiv.org/abs/2402.03246v6", + "pdf_url": "http://arxiv.org/pdf/2402.03246v6", + "published_date": "2024-02-05", + "categories": [ + "cs.CV", + "cs.AI", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting", + "authors": [ + "Joanna Waczyńska", + "Piotr Borycki", + "Sławomir Tadeja", + "Jacek Tabor", + "Przemysław Spurek" + ], + "abstract": "Recently, a range of neural network-based methods for image rendering have been introduced. One such widely-researched neural radiance field (NeRF) relies on a neural network to represent 3D scenes, allowing for realistic view synthesis from a small number of 2D images. However, most NeRF models are constrained by long training and inference times. In comparison, Gaussian Splatting (GS) is a novel, state-of-the-art technique for rendering points in a 3D scene by approximating their contribution to image pixels through Gaussian distributions, warranting fast training and swift, real-time rendering. A drawback of GS is the absence of a well-defined approach for its conditioning due to the necessity to condition several hundred thousand Gaussian components. To solve this, we introduce the Gaussian Mesh Splatting (GaMeS) model, which allows modification of Gaussian components in a similar way as meshes. We parameterize each Gaussian component by the vertices of the mesh face. Furthermore, our model needs mesh initialization on input or estimated mesh during training. We also define Gaussian splats solely based on their location on the mesh, allowing for automatic adjustments in position, scale, and rotation during animation. As a result, we obtain a real-time rendering of editable GS.", + "arxiv_url": "http://arxiv.org/abs/2402.01459v3", + "pdf_url": "http://arxiv.org/pdf/2402.01459v3", + "published_date": "2024-02-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianStyle: Gaussian Head Avatar via StyleGAN", + "authors": [ + "Pinxin Liu", + "Luchuan Song", + "Daoan Zhang", + "Hang Hua", + "Yunlong Tang", + "Huaijin Tu", + "Jiebo Luo", + "Chenliang Xu" + ], + "abstract": "Existing methods like Neural Radiation Fields (NeRF) and 3D Gaussian Splatting (3DGS) have made significant strides in facial attribute control such as facial animation and components editing, yet they struggle with fine-grained representation and scalability in dynamic head modeling. To address these limitations, we propose GaussianStyle, a novel framework that integrates the volumetric strengths of 3DGS with the powerful implicit representation of StyleGAN. The GaussianStyle preserves structural information, such as expressions and poses, using Gaussian points, while projecting the implicit volumetric representation into StyleGAN to capture high-frequency details and mitigate the over-smoothing commonly observed in neural texture rendering. Experimental outcomes indicate that our method achieves state-of-the-art performance in reenactment, novel view synthesis, and animation.", + "arxiv_url": "http://arxiv.org/abs/2402.00827v3", + "pdf_url": "http://arxiv.org/pdf/2402.00827v3", + "published_date": "2024-02-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "360-GS: Layout-guided Panoramic Gaussian Splatting For Indoor Roaming", + "authors": [ + "Jiayang Bai", + "Letian Huang", + "Jie Guo", + "Wen Gong", + "Yuanqi Li", + "Yanwen Guo" + ], + "abstract": "3D Gaussian Splatting (3D-GS) has recently attracted great attention with real-time and photo-realistic renderings. This technique typically takes perspective images as input and optimizes a set of 3D elliptical Gaussians by splatting them onto the image planes, resulting in 2D Gaussians. However, applying 3D-GS to panoramic inputs presents challenges in effectively modeling the projection onto the spherical surface of ${360^\\circ}$ images using 2D Gaussians. In practical applications, input panoramas are often sparse, leading to unreliable initialization of 3D Gaussians and subsequent degradation of 3D-GS quality. In addition, due to the under-constrained geometry of texture-less planes (e.g., walls and floors), 3D-GS struggles to model these flat regions with elliptical Gaussians, resulting in significant floaters in novel views. To address these issues, we propose 360-GS, a novel $360^{\\circ}$ Gaussian splatting for a limited set of panoramic inputs. Instead of splatting 3D Gaussians directly onto the spherical surface, 360-GS projects them onto the tangent plane of the unit sphere and then maps them to the spherical projections. This adaptation enables the representation of the projection using Gaussians. We guide the optimization of 360-GS by exploiting layout priors within panoramas, which are simple to obtain and contain strong structural information about the indoor scene. Our experimental results demonstrate that 360-GS allows panoramic rendering and outperforms state-of-the-art methods with fewer artifacts in novel view synthesis, thus providing immersive roaming in indoor scenarios.", + "arxiv_url": "http://arxiv.org/abs/2402.00763v1", + "pdf_url": "http://arxiv.org/pdf/2402.00763v1", + "published_date": "2024-02-01", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy", + "authors": [ + "Letian Huang", + "Jiayang Bai", + "Jie Guo", + "Yuanqi Li", + "Yanwen Guo" + ], + "abstract": "3D Gaussian Splatting has garnered extensive attention and application in real-time neural rendering. Concurrently, concerns have been raised about the limitations of this technology in aspects such as point cloud storage, performance, and robustness in sparse viewpoints, leading to various improvements. However, there has been a notable lack of attention to the fundamental problem of projection errors introduced by the local affine approximation inherent in the splatting itself, and the consequential impact of these errors on the quality of photo-realistic rendering. This paper addresses the projection error function of 3D Gaussian Splatting, commencing with the residual error from the first-order Taylor expansion of the projection function. The analysis establishes a correlation between the error and the Gaussian mean position. Subsequently, leveraging function optimization theory, this paper analyzes the function's minima to provide an optimal projection strategy for Gaussian Splatting referred to Optimal Gaussian Splatting, which can accommodate a variety of camera models. Experimental validation further confirms that this projection methodology reduces artifacts, resulting in a more convincingly realistic rendering.", + "arxiv_url": "http://arxiv.org/abs/2402.00752v4", + "pdf_url": "http://arxiv.org/pdf/2402.00752v4", + "published_date": "2024-02-01", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "StopThePop: Sorted Gaussian Splatting for View-Consistent Real-time Rendering", + "authors": [ + "Lukas Radl", + "Michael Steiner", + "Mathias Parger", + "Alexander Weinrauch", + "Bernhard Kerbl", + "Markus Steinberger" + ], + "abstract": "Gaussian Splatting has emerged as a prominent model for constructing 3D representations from images across diverse domains. However, the efficiency of the 3D Gaussian Splatting rendering pipeline relies on several simplifications. Notably, reducing Gaussian to 2D splats with a single view-space depth introduces popping and blending artifacts during view rotation. Addressing this issue requires accurate per-pixel depth computation, yet a full per-pixel sort proves excessively costly compared to a global sort operation. In this paper, we present a novel hierarchical rasterization approach that systematically resorts and culls splats with minimal processing overhead. Our software rasterizer effectively eliminates popping artifacts and view inconsistencies, as demonstrated through both quantitative and qualitative measurements. Simultaneously, our method mitigates the potential for cheating view-dependent effects with popping, ensuring a more authentic representation. Despite the elimination of cheating, our approach achieves comparable quantitative results for test images, while increasing the consistency for novel view synthesis in motion. Due to its design, our hierarchical approach is only 4% slower on average than the original Gaussian Splatting. Notably, enforcing consistency enables a reduction in the number of Gaussians by approximately half with nearly identical quality and view-consistency. Consequently, rendering performance is nearly doubled, making our approach 1.6x faster than the original Gaussian Splatting, with a 50% reduction in memory requirements.", + "arxiv_url": "http://arxiv.org/abs/2402.00525v3", + "pdf_url": "http://arxiv.org/pdf/2402.00525v3", + "published_date": "2024-02-01", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition", + "authors": [ + "Xu Hu", + "Yuxi Wang", + "Lue Fan", + "Junsong Fan", + "Junran Peng", + "Zhen Lei", + "Qing Li", + "Zhaoxiang Zhang" + ], + "abstract": "3D Gaussian Splatting has emerged as an alternative 3D representation for novel view synthesis, benefiting from its high-quality rendering results and real-time rendering speed. However, the 3D Gaussians learned by 3D-GS have ambiguous structures without any geometry constraints. This inherent issue in 3D-GS leads to a rough boundary when segmenting individual objects. To remedy these problems, we propose SAGD, a conceptually simple yet effective boundary-enhanced segmentation pipeline for 3D-GS to improve segmentation accuracy while preserving segmentation speed. Specifically, we introduce a Gaussian Decomposition scheme, which ingeniously utilizes the special structure of 3D Gaussian, finds out, and then decomposes the boundary Gaussians. Moreover, to achieve fast interactive 3D segmentation, we introduce a novel training-free pipeline by lifting a 2D foundation model to 3D-GS. Extensive experiments demonstrate that our approach achieves high-quality 3D segmentation without rough boundary issues, which can be easily applied to other scene editing tasks.", + "arxiv_url": "http://arxiv.org/abs/2401.17857v3", + "pdf_url": "http://arxiv.org/pdf/2401.17857v3", + "published_date": "2024-01-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality", + "authors": [ + "Ying Jiang", + "Chang Yu", + "Tianyi Xie", + "Xuan Li", + "Yutao Feng", + "Huamin Wang", + "Minchen Li", + "Henry Lau", + "Feng Gao", + "Yin Yang", + "Chenfanfu Jiang" + ], + "abstract": "As consumer Virtual Reality (VR) and Mixed Reality (MR) technologies gain momentum, there's a growing focus on the development of engagements with 3D virtual content. Unfortunately, traditional techniques for content creation, editing, and interaction within these virtual spaces are fraught with difficulties. They tend to be not only engineering-intensive but also require extensive expertise, which adds to the frustration and inefficiency in virtual object manipulation. Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction, offering a seamless and intuitive user experience. By developing a physical dynamics-aware interactive Gaussian Splatting in a Virtual Reality setting, and constructing a highly efficient two-level embedding strategy alongside deformable body simulations, VR-GS ensures real-time execution with highly realistic dynamic responses. The components of our Virtual Reality system are designed for high efficiency and effectiveness, starting from detailed scene reconstruction and object segmentation, advancing through multi-view image in-painting, and extending to interactive physics-based editing. The system also incorporates real-time deformation embedding and dynamic shadow casting, ensuring a comprehensive and engaging virtual experience.Our project page is available at: https://yingjiang96.github.io/VR-GS/.", + "arxiv_url": "http://arxiv.org/abs/2401.16663v2", + "pdf_url": "http://arxiv.org/pdf/2401.16663v2", + "published_date": "2024-01-30", + "categories": [ + "cs.HC", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DG: A Framework for Using Generative AI for Handling Sparse Learner Performance Data From Intelligent Tutoring Systems", + "authors": [ + "Liang Zhang", + "Jionghao Lin", + "Conrad Borchers", + "Meng Cao", + "Xiangen Hu" + ], + "abstract": "Learning performance data (e.g., quiz scores and attempts) is significant for understanding learner engagement and knowledge mastery level. However, the learning performance data collected from Intelligent Tutoring Systems (ITSs) often suffers from sparsity, impacting the accuracy of learner modeling and knowledge assessments. To address this, we introduce the 3DG framework (3-Dimensional tensor for Densification and Generation), a novel approach combining tensor factorization with advanced generative models, including Generative Adversarial Network (GAN) and Generative Pre-trained Transformer (GPT), for enhanced data imputation and augmentation. The framework operates by first representing the data as a three-dimensional tensor, capturing dimensions of learners, questions, and attempts. It then densifies the data through tensor factorization and augments it using Generative AI models, tailored to individual learning patterns identified via clustering. Applied to data from an AutoTutor lesson by the Center for the Study of Adult Literacy (CSAL), the 3DG framework effectively generated scalable, personalized simulations of learning performance. Comparative analysis revealed GAN's superior reliability over GPT-4 in this context, underscoring its potential in addressing data sparsity challenges in ITSs and contributing to the advancement of personalized educational technology.", + "arxiv_url": "http://arxiv.org/abs/2402.01746v1", + "pdf_url": "http://arxiv.org/pdf/2402.01746v1", + "published_date": "2024-01-29", + "categories": [ + "cs.CY", + "cs.AI", + "cs.LG" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting", + "authors": [ + "Yiming Huang", + "Beilei Cui", + "Long Bai", + "Ziqi Guo", + "Mengya Xu", + "Mobarakol Islam", + "Hongliang Ren" + ], + "abstract": "In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previous work utilizes ground truth depth for optimization but is hard to acquire in the surgical domain. To overcome these obstacles, we present Endo-4DGS, a real-time endoscopic dynamic reconstruction approach that utilizes 3D Gaussian Splatting (GS) for 3D representation. Specifically, we propose lightweight MLPs to capture temporal dynamics with Gaussian deformation fields. To obtain a satisfactory Gaussian Initialization, we exploit a powerful depth estimation foundation model, Depth-Anything, to generate pseudo-depth maps as a geometry prior. We additionally propose confidence-guided learning to tackle the ill-pose problems in monocular depth estimation and enhance the depth-guided reconstruction with surface normal constraints and depth regularization. Our approach has been validated on two surgical datasets, where it can effectively render in real-time, compute efficiently, and reconstruct with remarkable accuracy.", + "arxiv_url": "http://arxiv.org/abs/2401.16416v4", + "pdf_url": "http://arxiv.org/pdf/2401.16416v4", + "published_date": "2024-01-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering", + "authors": [ + "Yutao Feng", + "Xiang Feng", + "Yintong Shang", + "Ying Jiang", + "Chang Yu", + "Zeshun Zong", + "Tianjia Shao", + "Hongzhi Wu", + "Kun Zhou", + "Chenfanfu Jiang", + "Yin Yang" + ], + "abstract": "We demonstrate the feasibility of integrating physics-based animations of solids and fluids with 3D Gaussian Splatting (3DGS) to create novel effects in virtual scenes reconstructed using 3DGS. Leveraging the coherence of the Gaussian Splatting and Position-Based Dynamics (PBD) in the underlying representation, we manage rendering, view synthesis, and the dynamics of solids and fluids in a cohesive manner. Similar to GaussianShader, we enhance each Gaussian kernel with an added normal, aligning the kernel's orientation with the surface normal to refine the PBD simulation. This approach effectively eliminates spiky noises that arise from rotational deformation in solids. It also allows us to integrate physically based rendering to augment the dynamic surface reflections on fluids. Consequently, our framework is capable of realistically reproducing surface highlights on dynamic fluids and facilitating interactions between scene objects and fluids from new views. For more information, please visit our project page at \\url{https://gaussiansplashing.github.io/}.", + "arxiv_url": "http://arxiv.org/abs/2401.15318v2", + "pdf_url": "http://arxiv.org/pdf/2401.15318v2", + "published_date": "2024-01-27", + "categories": [ + "cs.GR", + "cs.AI", + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TIP-Editor: An Accurate 3D Editor Following Both Text-Prompts And Image-Prompts", + "authors": [ + "Jingyu Zhuang", + "Di Kang", + "Yan-Pei Cao", + "Guanbin Li", + "Liang Lin", + "Ying Shan" + ], + "abstract": "Text-driven 3D scene editing has gained significant attention owing to its convenience and user-friendliness. However, existing methods still lack accurate control of the specified appearance and location of the editing result due to the inherent limitations of the text description. To this end, we propose a 3D scene editing framework, TIPEditor, that accepts both text and image prompts and a 3D bounding box to specify the editing region. With the image prompt, users can conveniently specify the detailed appearance/style of the target content in complement to the text description, enabling accurate control of the appearance. Specifically, TIP-Editor employs a stepwise 2D personalization strategy to better learn the representation of the existing scene and the reference image, in which a localization loss is proposed to encourage correct object placement as specified by the bounding box. Additionally, TIPEditor utilizes explicit and flexible 3D Gaussian splatting as the 3D representation to facilitate local editing while keeping the background unchanged. Extensive experiments have demonstrated that TIP-Editor conducts accurate editing following the text and image prompts in the specified bounding box region, consistently outperforming the baselines in editing quality, and the alignment to the prompts, qualitatively and quantitatively.", + "arxiv_url": "http://arxiv.org/abs/2401.14828v3", + "pdf_url": "http://arxiv.org/pdf/2401.14828v3", + "published_date": "2024-01-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GauU-Scene: A Scene Reconstruction Benchmark on Large Scale 3D Reconstruction Dataset Using Gaussian Splatting", + "authors": [ + "Butian Xiong", + "Zhuo Li", + "Zhen Li" + ], + "abstract": "We introduce a novel large-scale scene reconstruction benchmark using the newly developed 3D representation approach, Gaussian Splatting, on our expansive U-Scene dataset. U-Scene encompasses over one and a half square kilometres, featuring a comprehensive RGB dataset coupled with LiDAR ground truth. For data acquisition, we employed the Matrix 300 drone equipped with the high-accuracy Zenmuse L1 LiDAR, enabling precise rooftop data collection. This dataset, offers a unique blend of urban and academic environments for advanced spatial analysis convers more than 1.5 km$^2$. Our evaluation of U-Scene with Gaussian Splatting includes a detailed analysis across various novel viewpoints. We also juxtapose these results with those derived from our accurate point cloud dataset, highlighting significant differences that underscore the importance of combine multi-modal information", + "arxiv_url": "http://arxiv.org/abs/2401.14032v1", + "pdf_url": "http://arxiv.org/pdf/2401.14032v1", + "published_date": "2024-01-25", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EndoGaussians: Single View Dynamic Gaussian Splatting for Deformable Endoscopic Tissues Reconstruction", + "authors": [ + "Yangsen Chen", + "Hao Wang" + ], + "abstract": "The accurate 3D reconstruction of deformable soft body tissues from endoscopic videos is a pivotal challenge in medical applications such as VR surgery and medical image analysis. Existing methods often struggle with accuracy and the ambiguity of hallucinated tissue parts, limiting their practical utility. In this work, we introduce EndoGaussians, a novel approach that employs Gaussian Splatting for dynamic endoscopic 3D reconstruction. This method marks the first use of Gaussian Splatting in this context, overcoming the limitations of previous NeRF-based techniques. Our method sets new state-of-the-art standards, as demonstrated by quantitative assessments on various endoscope datasets. These advancements make our method a promising tool for medical professionals, offering more reliable and efficient 3D reconstructions for practical applications in the medical field.", + "arxiv_url": "http://arxiv.org/abs/2401.13352v1", + "pdf_url": "http://arxiv.org/pdf/2401.13352v1", + "published_date": "2024-01-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PSAvatar: A Point-based Shape Model for Real-Time Head Avatar Animation with 3D Gaussian Splatting", + "authors": [ + "Zhongyuan Zhao", + "Zhenyu Bao", + "Qing Li", + "Guoping Qiu", + "Kanglin Liu" + ], + "abstract": "Despite much progress, achieving real-time high-fidelity head avatar animation is still difficult and existing methods have to trade-off between speed and quality. 3DMM based methods often fail to model non-facial structures such as eyeglasses and hairstyles, while neural implicit models suffer from deformation inflexibility and rendering inefficiency. Although 3D Gaussian has been demonstrated to possess promising capability for geometry representation and radiance field reconstruction, applying 3D Gaussian in head avatar creation remains a major challenge since it is difficult for 3D Gaussian to model the head shape variations caused by changing poses and expressions. In this paper, we introduce PSAvatar, a novel framework for animatable head avatar creation that utilizes discrete geometric primitive to create a parametric morphable shape model and employs 3D Gaussian for fine detail representation and high fidelity rendering. The parametric morphable shape model is a Point-based Morphable Shape Model (PMSM) which uses points instead of meshes for 3D representation to achieve enhanced representation flexibility. The PMSM first converts the FLAME mesh to points by sampling on the surfaces as well as off the meshes to enable the reconstruction of not only surface-like structures but also complex geometries such as eyeglasses and hairstyles. By aligning these points with the head shape in an analysis-by-synthesis manner, the PMSM makes it possible to utilize 3D Gaussian for fine detail representation and appearance modeling, thus enabling the creation of high-fidelity avatars. We show that PSAvatar can reconstruct high-fidelity head avatars of a variety of subjects and the avatars can be animated in real-time ($\\ge$ 25 fps at a resolution of 512 $\\times$ 512 ).", + "arxiv_url": "http://arxiv.org/abs/2401.12900v5", + "pdf_url": "http://arxiv.org/pdf/2401.12900v5", + "published_date": "2024-01-23", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EndoGaussian: Real-time Gaussian Splatting for Dynamic Endoscopic Scene Reconstruction", + "authors": [ + "Yifan Liu", + "Chenxin Li", + "Chen Yang", + "Yixuan Yuan" + ], + "abstract": "Reconstructing deformable tissues from endoscopic videos is essential in many downstream surgical applications. However, existing methods suffer from slow rendering speed, greatly limiting their practical use. In this paper, we introduce EndoGaussian, a real-time endoscopic scene reconstruction framework built on 3D Gaussian Splatting (3DGS). By integrating the efficient Gaussian representation and highly-optimized rendering engine, our framework significantly boosts the rendering speed to a real-time level. To adapt 3DGS for endoscopic scenes, we propose two strategies, Holistic Gaussian Initialization (HGI) and Spatio-temporal Gaussian Tracking (SGT), to handle the non-trivial Gaussian initialization and tissue deformation problems, respectively. In HGI, we leverage recent depth estimation models to predict depth maps of input binocular/monocular image sequences, based on which pixels are re-projected and combined for holistic initialization. In SPT, we propose to model surface dynamics using a deformation field, which is composed of an efficient encoding voxel and a lightweight deformation decoder, allowing for Gaussian tracking with minor training and rendering burden. Experiments on public datasets demonstrate our efficacy against prior SOTAs in many aspects, including better rendering speed (195 FPS real-time, 100$\\times$ gain), better rendering quality (37.848 PSNR), and less training overhead (within 2 min/scene), showing significant promise for intraoperative surgery applications. Code is available at: \\url{https://yifliu3.github.io/EndoGaussian/}.", + "arxiv_url": "http://arxiv.org/abs/2401.12561v2", + "pdf_url": "http://arxiv.org/pdf/2401.12561v2", + "published_date": "2024-01-23", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EndoGS: Deformable Endoscopic Tissues Reconstruction with Gaussian Splatting", + "authors": [ + "Lingting Zhu", + "Zhao Wang", + "Jiahao Cui", + "Zhenchao Jin", + "Guying Lin", + "Lequan Yu" + ], + "abstract": "Surgical 3D reconstruction is a critical area of research in robotic surgery, with recent works adopting variants of dynamic radiance fields to achieve success in 3D reconstruction of deformable tissues from single-viewpoint videos. However, these methods often suffer from time-consuming optimization or inferior quality, limiting their adoption in downstream tasks. Inspired by 3D Gaussian Splatting, a recent trending 3D representation, we present EndoGS, applying Gaussian Splatting for deformable endoscopic tissue reconstruction. Specifically, our approach incorporates deformation fields to handle dynamic scenes, depth-guided supervision with spatial-temporal weight masks to optimize 3D targets with tool occlusion from a single viewpoint, and surface-aligned regularization terms to capture the much better geometry. As a result, EndoGS reconstructs and renders high-quality deformable endoscopic tissues from a single-viewpoint video, estimated depth maps, and labeled tool masks. Experiments on DaVinci robotic surgery videos demonstrate that EndoGS achieves superior rendering quality. Code is available at https://github.com/HKU-MedAI/EndoGS.", + "arxiv_url": "http://arxiv.org/abs/2401.11535v3", + "pdf_url": "http://arxiv.org/pdf/2401.11535v3", + "published_date": "2024-01-21", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "https://github.com/HKU-MedAI/EndoGS", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianBody: Clothed Human Reconstruction via 3d Gaussian Splatting", + "authors": [ + "Mengtian Li", + "Shengxiang Yao", + "Zhifeng Xie", + "Keyu Chen" + ], + "abstract": "In this work, we propose a novel clothed human reconstruction method called GaussianBody, based on 3D Gaussian Splatting. Compared with the costly neural radiance based models, 3D Gaussian Splatting has recently demonstrated great performance in terms of training time and rendering quality. However, applying the static 3D Gaussian Splatting model to the dynamic human reconstruction problem is non-trivial due to complicated non-rigid deformations and rich cloth details. To address these challenges, our method considers explicit pose-guided deformation to associate dynamic Gaussians across the canonical space and the observation space, introducing a physically-based prior with regularized transformations helps mitigate ambiguity between the two spaces. During the training process, we further propose a pose refinement strategy to update the pose regression for compensating the inaccurate initial estimation and a split-with-scale mechanism to enhance the density of regressed point clouds. The experiments validate that our method can achieve state-of-the-art photorealistic novel-view rendering results with high-quality details for dynamic clothed human bodies, along with explicit geometry reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2401.09720v2", + "pdf_url": "http://arxiv.org/pdf/2401.09720v2", + "published_date": "2024-01-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Efficient4D: Fast Dynamic 3D Object Generation from a Single-view Video", + "authors": [ + "Zijie Pan", + "Zeyu Yang", + "Xiatian Zhu", + "Li Zhang" + ], + "abstract": "Generating dynamic 3D object from a single-view video is challenging due to the lack of 4D labeled data. An intuitive approach is to extend previous image-to-3D pipelines by transferring off-the-shelf image generation models such as score distillation sampling.However, this approach would be slow and expensive to scale due to the need for back-propagating the information-limited supervision signals through a large pretrained model. To address this, we propose an efficient video-to-4D object generation framework called Efficient4D. It generates high-quality spacetime-consistent images under different camera views, and then uses them as labeled data to directly reconstruct the 4D content through a 4D Gaussian splatting model. Importantly, our method can achieve real-time rendering under continuous camera trajectories. To enable robust reconstruction under sparse views, we introduce inconsistency-aware confidence-weighted loss design, along with a lightly weighted score distillation loss. Extensive experiments on both synthetic and real videos show that Efficient4D offers a remarkable 10-fold increase in speed when compared to prior art alternatives while preserving the quality of novel view synthesis. For example, Efficient4D takes only 10 minutes to model a dynamic object, vs 120 minutes by the previous art model Consistent4D.", + "arxiv_url": "http://arxiv.org/abs/2401.08742v3", + "pdf_url": "http://arxiv.org/pdf/2401.08742v3", + "published_date": "2024-01-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities", + "authors": [ + "Xu Yan", + "Haiming Zhang", + "Yingjie Cai", + "Jingming Guo", + "Weichao Qiu", + "Bin Gao", + "Kaiqiang Zhou", + "Yue Zhao", + "Huan Jin", + "Jiantao Gao", + "Zhen Li", + "Lihui Jiang", + "Wei Zhang", + "Hongbo Zhang", + "Dengxin Dai", + "Bingbing Liu" + ], + "abstract": "The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained https://github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving.", + "arxiv_url": "http://arxiv.org/abs/2401.08045v1", + "pdf_url": "http://arxiv.org/pdf/2401.08045v1", + "published_date": "2024-01-16", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/zhanghm1995/Forge_VFM4AD", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Shadow Casting for Neural Characters", + "authors": [ + "Luis Bolanos", + "Shih-Yang Su", + "Helge Rhodin" + ], + "abstract": "Neural character models can now reconstruct detailed geometry and texture from video, but they lack explicit shadows and shading, leading to artifacts when generating novel views and poses or during relighting. It is particularly difficult to include shadows as they are a global effect and the required casting of secondary rays is costly. We propose a new shadow model using a Gaussian density proxy that replaces sampling with a simple analytic formula. It supports dynamic motion and is tailored for shadow computation, thereby avoiding the affine projection approximation and sorting required by the closely related Gaussian splatting. Combined with a deferred neural rendering model, our Gaussian shadows enable Lambertian shading and shadow casting with minimal overhead. We demonstrate improved reconstructions, with better separation of albedo, shading, and shadows in challenging outdoor scenes with direct sun light and hard shadows. Our method is able to optimize the light direction without any input from the user. As a result, novel poses have fewer shadow artifacts and relighting in novel scenes is more realistic compared to the state-of-the-art methods, providing new ways to pose neural characters in novel environments, increasing their applicability.", + "arxiv_url": "http://arxiv.org/abs/2401.06116v1", + "pdf_url": "http://arxiv.org/pdf/2401.06116v1", + "published_date": "2024-01-11", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "TRIPS: Trilinear Point Splatting for Real-Time Radiance Field Rendering", + "authors": [ + "Linus Franke", + "Darius Rückert", + "Laura Fink", + "Marc Stamminger" + ], + "abstract": "Point-based radiance field rendering has demonstrated impressive results for novel view synthesis, offering a compelling blend of rendering quality and computational efficiency. However, also latest approaches in this domain are not without their shortcomings. 3D Gaussian Splatting [Kerbl and Kopanas et al. 2023] struggles when tasked with rendering highly detailed scenes, due to blurring and cloudy artifacts. On the other hand, ADOP [R\\\"uckert et al. 2022] can accommodate crisper images, but the neural reconstruction network decreases performance, it grapples with temporal instability and it is unable to effectively address large gaps in the point cloud. In this paper, we present TRIPS (Trilinear Point Splatting), an approach that combines ideas from both Gaussian Splatting and ADOP. The fundamental concept behind our novel technique involves rasterizing points into a screen-space image pyramid, with the selection of the pyramid layer determined by the projected point size. This approach allows rendering arbitrarily large points using a single trilinear write. A lightweight neural network is then used to reconstruct a hole-free image including detail beyond splat resolution. Importantly, our render pipeline is entirely differentiable, allowing for automatic optimization of both point sizes and positions. Our evaluation demonstrate that TRIPS surpasses existing state-of-the-art methods in terms of rendering quality while maintaining a real-time frame rate of 60 frames per second on readily available hardware. This performance extends to challenging scenarios, such as scenes featuring intricate geometry, expansive landscapes, and auto-exposed footage. The project page is located at: https://lfranke.github.io/trips/", + "arxiv_url": "http://arxiv.org/abs/2401.06003v2", + "pdf_url": "http://arxiv.org/pdf/2401.06003v2", + "published_date": "2024-01-11", + "categories": [ + "cs.CV", + "cs.GR", + "I.3; I.4" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Learning Segmented 3D Gaussians via Efficient Feature Unprojection for Zero-shot Neural Scene Segmentation", + "authors": [ + "Bin Dou", + "Tianyu Zhang", + "Zhaohui Wang", + "Yongjia Ma", + "Zejian Yuan" + ], + "abstract": "Zero-shot neural scene segmentation, which reconstructs 3D neural segmentation field without manual annotations, serves as an effective way for scene understanding. However, existing models, especially the efficient 3D Gaussian-based methods, struggle to produce compact segmentation results. This issue stems primarily from their redundant learnable attributes assigned on individual Gaussians, leading to a lack of robustness against the 3D-inconsistencies in zero-shot generated raw labels. To address this problem, our work, named Compact Segmented 3D Gaussians (CoSegGaussians), proposes the Feature Unprojection and Fusion module as the segmentation field, which utilizes a shallow decoder generalizable for all Gaussians based on high-level features. Specifically, leveraging the learned Gaussian geometric parameters, semantic-aware image-based features are introduced into the scene via our unprojection technique. The lifted features, together with spatial information, are fed into the multi-scale aggregation decoder to generate segmentation identities for all Gaussians. Furthermore, we design CoSeg Loss to boost model robustness against 3D-inconsistent noises. Experimental results show that our model surpasses baselines on zero-shot semantic segmentation task, improving by ~10% mIoU over the best baseline. Code and more results will be available at https://David-Dou.github.io/CoSegGaussians.", + "arxiv_url": "http://arxiv.org/abs/2401.05925v4", + "pdf_url": "http://arxiv.org/pdf/2401.05925v4", + "published_date": "2024-01-11", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "AGG: Amortized Generative 3D Gaussians for Single Image to 3D", + "authors": [ + "Dejia Xu", + "Ye Yuan", + "Morteza Mardani", + "Sifei Liu", + "Jiaming Song", + "Zhangyang Wang", + "Arash Vahdat" + ], + "abstract": "Given the growing need for automatic 3D content creation pipelines, various 3D representations have been studied to generate 3D objects from a single image. Due to its superior rendering efficiency, 3D Gaussian splatting-based models have recently excelled in both 3D reconstruction and generation. 3D Gaussian splatting approaches for image to 3D generation are often optimization-based, requiring many computationally expensive score-distillation steps. To overcome these challenges, we introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image, eliminating the need for per-instance optimization. Utilizing an intermediate hybrid representation, AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization. Moreover, we propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module. Our method is evaluated against existing optimization-based 3D Gaussian frameworks and sampling-based pipelines utilizing other 3D representations, where AGG showcases competitive generation abilities both qualitatively and quantitatively while being several orders of magnitude faster. Project page: https://ir1d.github.io/AGG/", + "arxiv_url": "http://arxiv.org/abs/2401.04099v1", + "pdf_url": "http://arxiv.org/pdf/2401.04099v1", + "published_date": "2024-01-08", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Survey on 3D Gaussian Splatting", + "authors": [ + "Guikun Chen", + "Wenguan Wang" + ], + "abstract": "3D Gaussian splatting (GS) has recently emerged as a transformative technique in the realm of explicit radiance field and computer graphics. This innovative approach, characterized by the utilization of millions of learnable 3D Gaussians, represents a significant departure from mainstream neural radiance field approaches, which predominantly use implicit, coordinate-based models to map spatial coordinates to pixel values. 3D GS, with its explicit scene representation and differentiable rendering algorithm, not only promises real-time rendering capability but also introduces unprecedented levels of editability. This positions 3D GS as a potential game-changer for the next generation of 3D reconstruction and representation. In the present paper, we provide the first systematic overview of the recent developments and critical contributions in the domain of 3D GS. We begin with a detailed exploration of the underlying principles and the driving forces behind the emergence of 3D GS, laying the groundwork for understanding its significance. A focal point of our discussion is the practical applicability of 3D GS. By enabling unprecedented rendering speed, 3D GS opens up a plethora of applications, ranging from virtual reality to interactive media and beyond. This is complemented by a comparative analysis of leading 3D GS models, evaluated across various benchmark tasks to highlight their performance and practical utility. The survey concludes by identifying current challenges and suggesting potential avenues for future research in this domain. Through this survey, we aim to provide a valuable resource for both newcomers and seasoned researchers, fostering further exploration and advancement in applicable and explicit radiance field representation.", + "arxiv_url": "http://arxiv.org/abs/2401.03890v4", + "pdf_url": "http://arxiv.org/pdf/2401.03890v4", + "published_date": "2024-01-08", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "cs.MM" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Progress and Prospects in 3D Generative AI: A Technical Overview including 3D human", + "authors": [ + "Song Bai", + "Jie Li" + ], + "abstract": "While AI-generated text and 2D images continue to expand its territory, 3D generation has gradually emerged as a trend that cannot be ignored. Since the year 2023 an abundant amount of research papers has emerged in the domain of 3D generation. This growth encompasses not just the creation of 3D objects, but also the rapid development of 3D character and motion generation. Several key factors contribute to this progress. The enhanced fidelity in stable diffusion, coupled with control methods that ensure multi-view consistency, and realistic human models like SMPL-X, contribute synergistically to the production of 3D models with remarkable consistency and near-realistic appearances. The advancements in neural network-based 3D storing and rendering models, such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have accelerated the efficiency and realism of neural rendered models. Furthermore, the multimodality capabilities of large language models have enabled language inputs to transcend into human motion outputs. This paper aims to provide a comprehensive overview and summary of the relevant papers published mostly during the latter half year of 2023. It will begin by discussing the AI generated object models in 3D, followed by the generated 3D human models, and finally, the generated 3D human motions, culminating in a conclusive summary and a vision for the future.", + "arxiv_url": "http://arxiv.org/abs/2401.02620v1", + "pdf_url": "http://arxiv.org/pdf/2401.02620v1", + "published_date": "2024-01-05", + "categories": [ + "cs.AI", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Characterizing Satellite Geometry via Accelerated 3D Gaussian Splatting", + "authors": [ + "Van Minh Nguyen", + "Emma Sandidge", + "Trupti Mahendrakar", + "Ryan T. White" + ], + "abstract": "The accelerating deployment of spacecraft in orbit have generated interest in on-orbit servicing (OOS), inspection of spacecraft, and active debris removal (ADR). Such missions require precise rendezvous and proximity operations in the vicinity of non-cooperative, possible unknown, resident space objects. Safety concerns with manned missions and lag times with ground-based control necessitate complete autonomy. This requires robust characterization of the target's geometry. In this article, we present an approach for mapping geometries of satellites on orbit based on 3D Gaussian Splatting that can run on computing resources available on current spaceflight hardware. We demonstrate model training and 3D rendering performance on a hardware-in-the-loop satellite mock-up under several realistic lighting and motion conditions. Our model is shown to be capable of training on-board and rendering higher quality novel views of an unknown satellite nearly 2 orders of magnitude faster than previous NeRF-based algorithms. Such on-board capabilities are critical to enable downstream machine intelligence tasks necessary for autonomous guidance, navigation, and control tasks.", + "arxiv_url": "http://arxiv.org/abs/2401.02588v1", + "pdf_url": "http://arxiv.org/pdf/2401.02588v1", + "published_date": "2024-01-05", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PEGASUS: Physically Enhanced Gaussian Splatting Simulation System for 6DoF Object Pose Dataset Generation", + "authors": [ + "Lukas Meyer", + "Floris Erich", + "Yusuke Yoshiyasu", + "Marc Stamminger", + "Noriaki Ando", + "Yukiyasu Domae" + ], + "abstract": "We introduce Physically Enhanced Gaussian Splatting Simulation System (PEGASUS) for 6DOF object pose dataset generation, a versatile dataset generator based on 3D Gaussian Splatting. Environment and object representations can be easily obtained using commodity cameras to reconstruct with Gaussian Splatting. PEGASUS allows the composition of new scenes by merging the respective underlying Gaussian Splatting point cloud of an environment with one or multiple objects. Leveraging a physics engine enables the simulation of natural object placement within a scene through interaction between meshes extracted for the objects and the environment. Consequently, an extensive amount of new scenes - static or dynamic - can be created by combining different environments and objects. By rendering scenes from various perspectives, diverse data points such as RGB images, depth maps, semantic masks, and 6DoF object poses can be extracted. Our study demonstrates that training on data generated by PEGASUS enables pose estimation networks to successfully transfer from synthetic data to real-world data. Moreover, we introduce the Ramen dataset, comprising 30 Japanese cup noodle items. This dataset includes spherical scans that captures images from both object hemisphere and the Gaussian Splatting reconstruction, making them compatible with PEGASUS.", + "arxiv_url": "http://arxiv.org/abs/2401.02281v2", + "pdf_url": "http://arxiv.org/pdf/2401.02281v2", + "published_date": "2024-01-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding", + "authors": [ + "Xingxing Zuo", + "Pouya Samangouei", + "Yunwen Zhou", + "Yan Di", + "Mingyang Li" + ], + "abstract": "Precisely perceiving the geometric and semantic properties of real-world 3D objects is crucial for the continued evolution of augmented reality and robotic applications. To this end, we present Foundation Model Embedded Gaussian Splatting (FMGS), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS). The key contribution of this work is an efficient method to reconstruct and represent 3D vision-language models. This is achieved by distilling feature maps generated from image-based foundation models into those rendered from our 3D model. To ensure high-quality rendering and fast training, we introduce a novel scene representation by integrating strengths from both GS and multi-resolution hash encodings (MHE). Our effective training procedure also introduces a pixel alignment loss that makes the rendered feature distance of the same semantic entities close, following the pixel-level semantic boundaries. Our results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection, despite that we are 851X faster for inference. This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments. We plan to release the code on the project page.", + "arxiv_url": "http://arxiv.org/abs/2401.01970v2", + "pdf_url": "http://arxiv.org/pdf/2401.01970v2", + "published_date": "2024-01-03", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting", + "authors": [ + "Yunzhi Yan", + "Haotong Lin", + "Chenxu Zhou", + "Weijie Wang", + "Haiyang Sun", + "Kun Zhan", + "Xianpeng Lang", + "Xiaowei Zhou", + "Sida Peng" + ], + "abstract": "This paper aims to tackle the problem of modeling dynamic urban streets for autonomous driving scenes. Recent methods extend NeRF by incorporating tracked vehicle poses to animate vehicles, enabling photo-realistic view synthesis of dynamic urban street scenes. However, significant limitations are their slow training and rendering speed. We introduce Street Gaussians, a new explicit scene representation that tackles these limitations. Specifically, the dynamic urban scene is represented as a set of point clouds equipped with semantic logits and 3D Gaussians, each associated with either a foreground vehicle or the background. To model the dynamics of foreground object vehicles, each object point cloud is optimized with optimizable tracked poses, along with a 4D spherical harmonics model for the dynamic appearance. The explicit representation allows easy composition of object vehicles and background, which in turn allows for scene editing operations and rendering at 135 FPS (1066 $\\times$ 1600 resolution) within half an hour of training. The proposed method is evaluated on multiple challenging benchmarks, including KITTI and Waymo Open datasets. Experiments show that the proposed method consistently outperforms state-of-the-art methods across all datasets. The code will be released to ensure reproducibility.", + "arxiv_url": "http://arxiv.org/abs/2401.01339v3", + "pdf_url": "http://arxiv.org/pdf/2401.01339v3", + "published_date": "2024-01-02", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Deblurring 3D Gaussian Splatting", + "authors": [ + "Byeonghyeon Lee", + "Howoong Lee", + "Xiangyu Sun", + "Usman Ali", + "Eunbyung Park" + ], + "abstract": "Recent studies in Radiance Fields have paved the robust way for novel view synthesis with their photorealistic rendering quality. Nevertheless, they usually employ neural networks and volumetric rendering, which are costly to train and impede their broad use in various real-time applications due to the lengthy rendering time. Lately 3D Gaussians splatting-based approach has been proposed to model the 3D scene, and it achieves remarkable visual quality while rendering the images in real-time. However, it suffers from severe degradation in the rendering quality if the training images are blurry. Blurriness commonly occurs due to the lens defocusing, object motion, and camera shake, and it inevitably intervenes in clean image acquisition. Several previous studies have attempted to render clean and sharp images from blurry input images using neural fields. The majority of those works, however, are designed only for volumetric rendering-based neural radiance fields and are not straightforwardly applicable to rasterization-based 3D Gaussian splatting methods. Thus, we propose a novel real-time deblurring framework, Deblurring 3D Gaussian Splatting, using a small Multi-Layer Perceptron (MLP) that manipulates the covariance of each 3D Gaussian to model the scene blurriness. While Deblurring 3D Gaussian Splatting can still enjoy real-time rendering, it can reconstruct fine and sharp details from blurry images. A variety of experiments have been conducted on the benchmark, and the results have revealed the effectiveness of our approach for deblurring. Qualitative results are available at https://benhenryl.github.io/Deblurring-3D-Gaussian-Splatting/", + "arxiv_url": "http://arxiv.org/abs/2401.00834v3", + "pdf_url": "http://arxiv.org/pdf/2401.00834v3", + "published_date": "2024-01-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "4DGen: Grounded 4D Content Generation with Spatial-temporal Consistency", + "authors": [ + "Yuyang Yin", + "Dejia Xu", + "Zhangyang Wang", + "Yao Zhao", + "Yunchao Wei" + ], + "abstract": "Aided by text-to-image and text-to-video diffusion models, existing 4D content creation pipelines utilize score distillation sampling to optimize the entire dynamic 3D scene. However, as these pipelines generate 4D content from text or image inputs, they incur significant time and effort in prompt engineering through trial and error. This work introduces 4DGen, a novel, holistic framework for grounded 4D content creation that decomposes the 4D generation task into multiple stages. We identify static 3D assets and monocular video sequences as key components in constructing the 4D content. Our pipeline facilitates conditional 4D generation, enabling users to specify geometry (3D assets) and motion (monocular videos), thus offering superior control over content creation. Furthermore, we construct our 4D representation using dynamic 3D Gaussians, which permits efficient, high-resolution supervision through rendering during training, thereby facilitating high-quality 4D generation. Additionally, we employ spatial-temporal pseudo labels on anchor frames, along with seamless consistency priors implemented through 3D-aware score distillation sampling and smoothness regularizations. Compared to existing baselines, our approach yields competitive results in faithfully reconstructing input signals and realistically inferring renderings from novel viewpoints and timesteps. Most importantly, our method supports grounded generation, offering users enhanced control, a feature difficult to achieve with previous methods. Project page: https://vita-group.github.io/4DGen/", + "arxiv_url": "http://arxiv.org/abs/2312.17225v2", + "pdf_url": "http://arxiv.org/pdf/2312.17225v2", + "published_date": "2023-12-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamGaussian4D: Generative 4D Gaussian Splatting", + "authors": [ + "Jiawei Ren", + "Liang Pan", + "Jiaxiang Tang", + "Chi Zhang", + "Ang Cao", + "Gang Zeng", + "Ziwei Liu" + ], + "abstract": "4D content generation has achieved remarkable progress recently. However, existing methods suffer from long optimization times, a lack of motion controllability, and a low quality of details. In this paper, we introduce DreamGaussian4D (DG4D), an efficient 4D generation framework that builds on Gaussian Splatting (GS). Our key insight is that combining explicit modeling of spatial transformations with static GS makes an efficient and powerful representation for 4D generation. Moreover, video generation methods have the potential to offer valuable spatial-temporal priors, enhancing the high-quality 4D generation. Specifically, we propose an integral framework with two major modules: 1) Image-to-4D GS - we initially generate static GS with DreamGaussianHD, followed by HexPlane-based dynamic generation with Gaussian deformation; and 2) Video-to-Video Texture Refinement - we refine the generated UV-space texture maps and meanwhile enhance their temporal consistency by utilizing a pre-trained image-to-video diffusion model. Notably, DG4D reduces the optimization time from several hours to just a few minutes, allows the generated 3D motion to be visually controlled, and produces animated meshes that can be realistically rendered in 3D engines.", + "arxiv_url": "http://arxiv.org/abs/2312.17142v3", + "pdf_url": "http://arxiv.org/pdf/2312.17142v3", + "published_date": "2023-12-28", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis", + "authors": [ + "Zhan Li", + "Zhang Chen", + "Zhong Li", + "Yi Xu" + ], + "abstract": "Novel view synthesis of dynamic scenes has been an intriguing yet challenging problem. Despite recent advancements, simultaneously achieving high-resolution photorealistic results, real-time rendering, and compact storage remains a formidable task. To address these challenges, we propose Spacetime Gaussian Feature Splatting as a novel dynamic scene representation, composed of three pivotal components. First, we formulate expressive Spacetime Gaussians by enhancing 3D Gaussians with temporal opacity and parametric motion/rotation. This enables Spacetime Gaussians to capture static, dynamic, as well as transient content within a scene. Second, we introduce splatted feature rendering, which replaces spherical harmonics with neural features. These features facilitate the modeling of view- and time-dependent appearance while maintaining small size. Third, we leverage the guidance of training error and coarse depth to sample new Gaussians in areas that are challenging to converge with existing pipelines. Experiments on several established real-world datasets demonstrate that our method achieves state-of-the-art rendering quality and speed, while retaining compact storage. At 8K resolution, our lite-version model can render at 60 FPS on an Nvidia RTX 4090 GPU. Our code is available at https://github.com/oppo-us-research/SpacetimeGaussians.", + "arxiv_url": "http://arxiv.org/abs/2312.16812v2", + "pdf_url": "http://arxiv.org/pdf/2312.16812v2", + "published_date": "2023-12-28", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "https://github.com/oppo-us-research/SpacetimeGaussians", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LangSplat: 3D Language Gaussian Splatting", + "authors": [ + "Minghan Qin", + "Wanhua Li", + "Jiawei Zhou", + "Haoqian Wang", + "Hanspeter Pfister" + ], + "abstract": "Humans live in a 3D world and commonly use natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP language embeddings in a NeRF model, LangSplat advances the field by utilizing a collection of 3D Gaussians, each encoding language features distilled from CLIP, to represent the language field. By employing a tile-based splatting technique for rendering language features, we circumvent the costly rendering process inherent in NeRF. Instead of directly learning CLIP embeddings, LangSplat first trains a scene-wise language autoencoder and then learns language features on the scene-specific latent space, thereby alleviating substantial memory demands imposed by explicit modeling. Existing methods struggle with imprecise and vague 3D language fields, which fail to discern clear boundaries between objects. We delve into this issue and propose to learn hierarchical semantics using SAM, thereby eliminating the need for extensively querying the language field across various scales and the regularization of DINO features. Extensive experimental results show that LangSplat significantly outperforms the previous state-of-the-art method LERF by a large margin. Notably, LangSplat is extremely efficient, achieving a 199 $\\times$ speedup compared to LERF at the resolution of 1440 $\\times$ 1080. We strongly recommend readers to check out our video results at https://langsplat.github.io/", + "arxiv_url": "http://arxiv.org/abs/2312.16084v2", + "pdf_url": "http://arxiv.org/pdf/2312.16084v2", + "published_date": "2023-12-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "2D-Guided 3D Gaussian Segmentation", + "authors": [ + "Kun Lan", + "Haoran Li", + "Haolin Shi", + "Wenjun Wu", + "Yong Liao", + "Lin Wang", + "Pengyuan Zhou" + ], + "abstract": "Recently, 3D Gaussian, as an explicit 3D representation method, has demonstrated strong competitiveness over NeRF (Neural Radiance Fields) in terms of expressing complex scenes and training duration. These advantages signal a wide range of applications for 3D Gaussians in 3D understanding and editing. Meanwhile, the segmentation of 3D Gaussians is still in its infancy. The existing segmentation methods are not only cumbersome but also incapable of segmenting multiple objects simultaneously in a short amount of time. In response, this paper introduces a 3D Gaussian segmentation method implemented with 2D segmentation as supervision. This approach uses input 2D segmentation maps to guide the learning of the added 3D Gaussian semantic information, while nearest neighbor clustering and statistical filtering refine the segmentation results. Experiments show that our concise method can achieve comparable performances on mIOU and mAcc for multi-object segmentation as previous single-object segmentation methods.", + "arxiv_url": "http://arxiv.org/abs/2312.16047v1", + "pdf_url": "http://arxiv.org/pdf/2312.16047v1", + "published_date": "2023-12-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Sparse-view CT Reconstruction with 3D Gaussian Volumetric Representation", + "authors": [ + "Yingtai Li", + "Xueming Fu", + "Shang Zhao", + "Ruiyang Jin", + "S. Kevin Zhou" + ], + "abstract": "Sparse-view CT is a promising strategy for reducing the radiation dose of traditional CT scans, but reconstructing high-quality images from incomplete and noisy data is challenging. Recently, 3D Gaussian has been applied to model complex natural scenes, demonstrating fast convergence and better rendering of novel views compared to implicit neural representations (INRs). Taking inspiration from the successful application of 3D Gaussians in natural scene modeling and novel view synthesis, we investigate their potential for sparse-view CT reconstruction. We leverage prior information from the filtered-backprojection reconstructed image to initialize the Gaussians; and update their parameters via comparing difference in the projection space. Performance is further enhanced by adaptive density control. Compared to INRs, 3D Gaussians benefit more from prior information to explicitly bypass learning in void spaces and allocate the capacity efficiently, accelerating convergence. 3D Gaussians also efficiently learn high-frequency details. Trained in a self-supervised manner, 3D Gaussians avoid the need for large-scale paired data. Our experiments on the AAPM-Mayo dataset demonstrate that 3D Gaussians can provide superior performance compared to INR-based methods. This work is in progress, and the code will be publicly available.", + "arxiv_url": "http://arxiv.org/abs/2312.15676v1", + "pdf_url": "http://arxiv.org/pdf/2312.15676v1", + "published_date": "2023-12-25", + "categories": [ + "eess.IV", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Human101: Training 100+FPS Human Gaussians in 100s from 1 View", + "authors": [ + "Mingwei Li", + "Jiachen Tao", + "Zongxin Yang", + "Yi Yang" + ], + "abstract": "Reconstructing the human body from single-view videos plays a pivotal role in the virtual reality domain. One prevalent application scenario necessitates the rapid reconstruction of high-fidelity 3D digital humans while simultaneously ensuring real-time rendering and interaction. Existing methods often struggle to fulfill both requirements. In this paper, we introduce Human101, a novel framework adept at producing high-fidelity dynamic 3D human reconstructions from 1-view videos by training 3D Gaussians in 100 seconds and rendering in 100+ FPS. Our method leverages the strengths of 3D Gaussian Splatting, which provides an explicit and efficient representation of 3D humans. Standing apart from prior NeRF-based pipelines, Human101 ingeniously applies a Human-centric Forward Gaussian Animation method to deform the parameters of 3D Gaussians, thereby enhancing rendering speed (i.e., rendering 1024-resolution images at an impressive 60+ FPS and rendering 512-resolution images at 100+ FPS). Experimental results indicate that our approach substantially eclipses current methods, clocking up to a 10 times surge in frames per second and delivering comparable or superior rendering quality. Code and demos will be released at https://github.com/longxiang-ai/Human101.", + "arxiv_url": "http://arxiv.org/abs/2312.15258v1", + "pdf_url": "http://arxiv.org/pdf/2312.15258v1", + "published_date": "2023-12-23", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/longxiang-ai/Human101", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Deformable 3D Gaussian Splatting for Animatable Human Avatars", + "authors": [ + "HyunJun Jung", + "Nikolas Brasch", + "Jifei Song", + "Eduardo Perez-Pellitero", + "Yiren Zhou", + "Zhihao Li", + "Nassir Navab", + "Benjamin Busam" + ], + "abstract": "Recent advances in neural radiance fields enable novel view synthesis of photo-realistic images in dynamic settings, which can be applied to scenarios with human animation. Commonly used implicit backbones to establish accurate models, however, require many input views and additional annotations such as human masks, UV maps and depth maps. In this work, we propose ParDy-Human (Parameterized Dynamic Human Avatar), a fully explicit approach to construct a digital avatar from as little as a single monocular sequence. ParDy-Human introduces parameter-driven dynamics into 3D Gaussian Splatting where 3D Gaussians are deformed by a human pose model to animate the avatar. Our method is composed of two parts: A first module that deforms canonical 3D Gaussians according to SMPL vertices and a consecutive module that further takes their designed joint encodings and predicts per Gaussian deformations to deal with dynamics beyond SMPL vertex deformations. Images are then synthesized by a rasterizer. ParDy-Human constitutes an explicit model for realistic dynamic human avatars which requires significantly fewer training views and images. Our avatars learning is free of additional annotations such as masks and can be trained with variable backgrounds while inferring full-resolution images efficiently even on consumer hardware. We provide experimental evidence to show that ParDy-Human outperforms state-of-the-art methods on ZJU-MoCap and THUman4.0 datasets both quantitatively and visually.", + "arxiv_url": "http://arxiv.org/abs/2312.15059v1", + "pdf_url": "http://arxiv.org/pdf/2312.15059v1", + "published_date": "2023-12-22", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models", + "authors": [ + "Huan Ling", + "Seung Wook Kim", + "Antonio Torralba", + "Sanja Fidler", + "Karsten Kreis" + ], + "abstract": "Text-guided diffusion models have revolutionized image and video generation and have also been successfully used for optimization-based 3D object synthesis. Here, we instead focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects using score distillation methods with an additional temporal dimension. Compared to previous work, we pursue a novel compositional generation-based approach, and combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization, thereby simultaneously enforcing temporal consistency, high-quality visual appearance and realistic geometry. Our method, called Align Your Gaussians (AYG), leverages dynamic 3D Gaussian Splatting with deformation fields as 4D representation. Crucial to AYG is a novel method to regularize the distribution of the moving 3D Gaussians and thereby stabilize the optimization and induce motion. We also propose a motion amplification mechanism as well as a new autoregressive synthesis scheme to generate and combine multiple 4D sequences for longer generation. These techniques allow us to synthesize vivid dynamic scenes, outperform previous work qualitatively and quantitatively and achieve state-of-the-art text-to-4D performance. Due to the Gaussian 4D representation, different 4D animations can be seamlessly combined, as we demonstrate. AYG opens up promising avenues for animation, simulation and digital content creation as well as synthetic data generation.", + "arxiv_url": "http://arxiv.org/abs/2312.13763v2", + "pdf_url": "http://arxiv.org/pdf/2312.13763v2", + "published_date": "2023-12-21", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting with NeRF-based Color and Opacity", + "authors": [ + "Dawid Malarz", + "Weronika Smolak", + "Jacek Tabor", + "Sławomir Tadeja", + "Przemysław Spurek" + ], + "abstract": "Neural Radiance Fields (NeRFs) have demonstrated the remarkable potential of neural networks to capture the intricacies of 3D objects. By encoding the shape and color information within neural network weights, NeRFs excel at producing strikingly sharp novel views of 3D objects. Recently, numerous generalizations of NeRFs utilizing generative models have emerged, expanding its versatility. In contrast, Gaussian Splatting (GS) offers a similar render quality with faster training and inference as it does not need neural networks to work. It encodes information about the 3D objects in the set of Gaussian distributions that can be rendered in 3D similarly to classical meshes. Unfortunately, GS are difficult to condition since they usually require circa hundred thousand Gaussian components. To mitigate the caveats of both models, we propose a hybrid model Viewing Direction Gaussian Splatting (VDGS) that uses GS representation of the 3D object's shape and NeRF-based encoding of color and opacity. Our model uses Gaussian distributions with trainable positions (i.e. means of Gaussian), shape (i.e. covariance of Gaussian), color and opacity, and a neural network that takes Gaussian parameters and viewing direction to produce changes in the said color and opacity. As a result, our model better describes shadows, light reflections, and the transparency of 3D objects without adding additional texture and light components.", + "arxiv_url": "http://arxiv.org/abs/2312.13729v5", + "pdf_url": "http://arxiv.org/pdf/2312.13729v5", + "published_date": "2023-12-21", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Splatter Image: Ultra-Fast Single-View 3D Reconstruction", + "authors": [ + "Stanislaw Szymanowicz", + "Christian Rupprecht", + "Andrea Vedaldi" + ], + "abstract": "We introduce the \\method, an ultra-efficient approach for monocular 3D object reconstruction. Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images. We apply Gaussian Splatting to monocular reconstruction by learning a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS. Our main innovation is the surprisingly straightforward design of this network, which, using 2D operators, maps the input image to one 3D Gaussian per pixel. The resulting set of Gaussians thus has the form an image, the Splatter Image. We further extend the method take several images as input via cross-view attention. Owning to the speed of the renderer (588 FPS), we use a single GPU for training while generating entire images at each iteration to optimize perceptual metrics like LPIPS. On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works. Code, models, demo and more results are available at https://szymanowiczs.github.io/splatter-image.", + "arxiv_url": "http://arxiv.org/abs/2312.13150v2", + "pdf_url": "http://arxiv.org/pdf/2312.13150v2", + "published_date": "2023-12-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting", + "authors": [ + "Richard Shaw", + "Michal Nazarczuk", + "Jifei Song", + "Arthur Moreau", + "Sibi Catley-Chandar", + "Helisa Dhamo", + "Eduardo Perez-Pellitero" + ], + "abstract": "Novel view synthesis has shown rapid progress recently, with methods capable of producing increasingly photorealistic results. 3D Gaussian Splatting has emerged as a promising method, producing high-quality renderings of scenes and enabling interactive viewing at real-time frame rates. However, it is limited to static scenes. In this work, we extend 3D Gaussian Splatting to reconstruct dynamic scenes. We model a scene's dynamics using dynamic MLPs, learning deformations from temporally-local canonical representations to per-frame 3D Gaussians. To disentangle static and dynamic regions, tuneable parameters weigh each Gaussian's respective MLP parameters, improving the dynamics modelling of imbalanced scenes. We introduce a sliding window training strategy that partitions the sequence into smaller manageable windows to handle arbitrary length scenes while maintaining high rendering quality. We propose an adaptive sampling strategy to determine appropriate window size hyperparameters based on the scene's motion, balancing training overhead with visual quality. Training a separate dynamic 3D Gaussian model for each sliding window allows the canonical representation to change, enabling the reconstruction of scenes with significant geometric changes. Temporal consistency is enforced using a fine-tuning step with self-supervising consistency loss on randomly sampled novel views. As a result, our method produces high-quality renderings of general dynamic scenes with competitive quantitative performance, which can be viewed in real-time in our dynamic interactive viewer.", + "arxiv_url": "http://arxiv.org/abs/2312.13308v2", + "pdf_url": "http://arxiv.org/pdf/2312.13308v2", + "published_date": "2023-12-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Compact 3D Scene Representation via Self-Organizing Gaussian Grids", + "authors": [ + "Wieland Morgenstern", + "Florian Barthel", + "Anna Hilsmann", + "Peter Eisert" + ], + "abstract": "3D Gaussian Splatting has recently emerged as a highly promising technique for modeling of static 3D scenes. In contrast to Neural Radiance Fields, it utilizes efficient rasterization allowing for very fast rendering at high-quality. However, the storage size is significantly higher, which hinders practical deployment, e.g. on resource constrained devices. In this paper, we introduce a compact scene representation organizing the parameters of 3D Gaussian Splatting (3DGS) into a 2D grid with local homogeneity, ensuring a drastic reduction in storage requirements without compromising visual quality during rendering. Central to our idea is the explicit exploitation of perceptual redundancies present in natural scenes. In essence, the inherent nature of a scene allows for numerous permutations of Gaussian parameters to equivalently represent it. To this end, we propose a novel highly parallel algorithm that regularly arranges the high-dimensional Gaussian parameters into a 2D grid while preserving their neighborhood structure. During training, we further enforce local smoothness between the sorted parameters in the grid. The uncompressed Gaussians use the same structure as 3DGS, ensuring a seamless integration with established renderers. Our method achieves a reduction factor of 17x to 42x in size for complex scenes with no increase in training time, marking a substantial leap forward in the domain of 3D scene distribution and consumption. Additional information can be found on our project page: https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/", + "arxiv_url": "http://arxiv.org/abs/2312.13299v2", + "pdf_url": "http://arxiv.org/pdf/2312.13299v2", + "published_date": "2023-12-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction", + "authors": [ + "David Charatan", + "Sizhe Li", + "Andrea Tagliasacchi", + "Vincent Sitzmann" + ], + "abstract": "We introduce pixelSplat, a feed-forward model that learns to reconstruct 3D radiance fields parameterized by 3D Gaussian primitives from pairs of images. Our model features real-time and memory-efficient rendering for scalable training as well as fast 3D reconstruction at inference time. To overcome local minima inherent to sparse and locally supported representations, we predict a dense probability distribution over 3D and sample Gaussian means from that probability distribution. We make this sampling operation differentiable via a reparameterization trick, allowing us to back-propagate gradients through the Gaussian splatting representation. We benchmark our method on wide-baseline novel view synthesis on the real-world RealEstate10k and ACID datasets, where we outperform state-of-the-art light field transformers and accelerate rendering by 2.5 orders of magnitude while reconstructing an interpretable and editable 3D radiance field.", + "arxiv_url": "http://arxiv.org/abs/2312.12337v4", + "pdf_url": "http://arxiv.org/pdf/2312.12337v4", + "published_date": "2023-12-19", + "categories": [ + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning", + "authors": [ + "Ye Yuan", + "Xueting Li", + "Yangyi Huang", + "Shalini De Mello", + "Koki Nagano", + "Jan Kautz", + "Umar Iqbal" + ], + "abstract": "Gaussian splatting has emerged as a powerful 3D representation that harnesses the advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper, we seek to leverage Gaussian splatting to generate realistic animatable avatars from textual descriptions, addressing the limitations (e.g., flexibility and efficiency) imposed by mesh or NeRF-based representations. However, a naive application of Gaussian splatting cannot generate high-quality animatable avatars and suffers from learning instability; it also cannot capture fine avatar geometries and often leads to degenerate body parts. To tackle these problems, we first propose a primitive-based 3D Gaussian representation where Gaussians are defined inside pose-driven primitives to facilitate animation. Second, to stabilize and amortize the learning of millions of Gaussians, we propose to use neural implicit fields to predict the Gaussian attributes (e.g., colors). Finally, to capture fine avatar geometries and extract detailed meshes, we propose a novel SDF-based implicit mesh learning approach for 3D Gaussians that regularizes the underlying geometries and extracts highly detailed textured meshes. Our proposed method, GAvatar, enables the large-scale generation of diverse animatable avatars using only text prompts. GAvatar significantly surpasses existing methods in terms of both appearance and geometry quality, and achieves extremely fast rendering (100 fps) at 1K resolution.", + "arxiv_url": "http://arxiv.org/abs/2312.11461v2", + "pdf_url": "http://arxiv.org/pdf/2312.11461v2", + "published_date": "2023-12-18", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis", + "authors": [ + "Yiqing Liang", + "Numair Khan", + "Zhengqin Li", + "Thu Nguyen-Phuoc", + "Douglas Lanman", + "James Tompkin", + "Lei Xiao" + ], + "abstract": "We propose a method for dynamic scene reconstruction using deformable 3D Gaussians that is tailored for monocular video. Building upon the efficiency of Gaussian splatting, our approach extends the representation to accommodate dynamic elements via a deformable set of Gaussians residing in a canonical space, and a time-dependent deformation field defined by a multi-layer perceptron (MLP). Moreover, under the assumption that most natural scenes have large regions that remain static, we allow the MLP to focus its representational power by additionally including a static Gaussian point cloud. The concatenated dynamic and static point clouds form the input for the Gaussian Splatting rasterizer, enabling real-time rendering. The differentiable pipeline is optimized end-to-end with a self-supervised rendering loss. Our method achieves results that are comparable to state-of-the-art dynamic neural radiance field methods while allowing much faster optimization and rendering. Project website: https://lynl7130.github.io/gaufre/index.html", + "arxiv_url": "http://arxiv.org/abs/2312.11458v1", + "pdf_url": "http://arxiv.org/pdf/2312.11458v1", + "published_date": "2023-12-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Exploring the Feasibility of Generating Realistic 3D Models of Endangered Species Using DreamGaussian: An Analysis of Elevation Angle's Impact on Model Generation", + "authors": [ + "Selcuk Anil Karatopak", + "Deniz Sen" + ], + "abstract": "Many species face the threat of extinction. It's important to study these species and gather information about them as much as possible to preserve biodiversity. Due to the rarity of endangered species, there is a limited amount of data available, making it difficult to apply data requiring generative AI methods to this domain. We aim to study the feasibility of generating consistent and real-like 3D models of endangered animals using limited data. Such a phenomenon leads us to utilize zero-shot stable diffusion models that can generate a 3D model out of a single image of the target species. This paper investigates the intricate relationship between elevation angle and the output quality of 3D model generation, focusing on the innovative approach presented in DreamGaussian. DreamGaussian, a novel framework utilizing Generative Gaussian Splatting along with novel mesh extraction and refinement algorithms, serves as the focal point of our study. We conduct a comprehensive analysis, analyzing the effect of varying elevation angles on DreamGaussian's ability to reconstruct 3D scenes accurately. Through an empirical evaluation, we demonstrate how changes in elevation angle impact the generated images' spatial coherence, structural integrity, and perceptual realism. We observed that giving a correct elevation angle with the input image significantly affects the result of the generated 3D model. We hope this study to be influential for the usability of AI to preserve endangered animals; while the penultimate aim is to obtain a model that can output biologically consistent 3D models via small samples, the qualitative interpretation of an existing state-of-the-art model such as DreamGaussian will be a step forward in our goal.", + "arxiv_url": "http://arxiv.org/abs/2312.09682v1", + "pdf_url": "http://arxiv.org/pdf/2312.09682v1", + "published_date": "2023-12-15", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Text2Immersion: Generative Immersive Scene with 3D Gaussians", + "authors": [ + "Hao Ouyang", + "Kathryn Heal", + "Stephen Lombardi", + "Tiancheng Sun" + ], + "abstract": "We introduce Text2Immersion, an elegant method for producing high-quality 3D immersive scenes from text prompts. Our proposed pipeline initiates by progressively generating a Gaussian cloud using pre-trained 2D diffusion and depth estimation models. This is followed by a refining stage on the Gaussian cloud, interpolating and refining it to enhance the details of the generated scene. Distinct from prevalent methods that focus on single object or indoor scenes, or employ zoom-out trajectories, our approach generates diverse scenes with various objects, even extending to the creation of imaginary scenes. Consequently, Text2Immersion can have wide-ranging implications for various applications such as virtual reality, game development, and automated content creation. Extensive evaluations demonstrate that our system surpasses other methods in rendering quality and diversity, further progressing towards text-driven 3D scene generation. We will make the source code publicly accessible at the project page.", + "arxiv_url": "http://arxiv.org/abs/2312.09242v1", + "pdf_url": "http://arxiv.org/pdf/2312.09242v1", + "published_date": "2023-12-14", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting", + "authors": [ + "Zhiyin Qian", + "Shaofei Wang", + "Marko Mihajlovic", + "Andreas Geiger", + "Siyu Tang" + ], + "abstract": "We introduce an approach that creates animatable human avatars from monocular videos using 3D Gaussian Splatting (3DGS). Existing methods based on neural radiance fields (NeRFs) achieve high-quality novel-view/novel-pose image synthesis but often require days of training, and are extremely slow at inference time. Recently, the community has explored fast grid structures for efficient training of clothed avatars. Albeit being extremely fast at training, these methods can barely achieve an interactive rendering frame rate with around 15 FPS. In this paper, we use 3D Gaussian Splatting and learn a non-rigid deformation network to reconstruct animatable clothed human avatars that can be trained within 30 minutes and rendered at real-time frame rates (50+ FPS). Given the explicit nature of our representation, we further introduce as-isometric-as-possible regularizations on both the Gaussian mean vectors and the covariance matrices, enhancing the generalization of our model on highly articulated unseen poses. Experimental results show that our method achieves comparable and even better performance compared to state-of-the-art approaches on animatable avatar creation from a monocular input, while being 400x and 250x faster in training and inference, respectively.", + "arxiv_url": "http://arxiv.org/abs/2312.09228v3", + "pdf_url": "http://arxiv.org/pdf/2312.09228v3", + "published_date": "2023-12-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers", + "authors": [ + "Zi-Xin Zou", + "Zhipeng Yu", + "Yuan-Chen Guo", + "Yangguang Li", + "Ding Liang", + "Yan-Pei Cao", + "Song-Hai Zhang" + ], + "abstract": "Recent advancements in 3D reconstruction from single images have been driven by the evolution of generative models. Prominent among these are methods based on Score Distillation Sampling (SDS) and the adaptation of diffusion models in the 3D domain. Despite their progress, these techniques often face limitations due to slow optimization or rendering processes, leading to extensive training and optimization times. In this paper, we introduce a novel approach for single-view reconstruction that efficiently generates a 3D model from a single image via feed-forward inference. Our method utilizes two transformer-based networks, namely a point decoder and a triplane decoder, to reconstruct 3D objects using a hybrid Triplane-Gaussian intermediate representation. This hybrid representation strikes a balance, achieving a faster rendering speed compared to implicit representations while simultaneously delivering superior rendering quality than explicit representations. The point decoder is designed for generating point clouds from single images, offering an explicit representation which is then utilized by the triplane decoder to query Gaussian features for each point. This design choice addresses the challenges associated with directly regressing explicit 3D Gaussian attributes characterized by their non-structural nature. Subsequently, the 3D Gaussians are decoded by an MLP to enable rapid rendering through splatting. Both decoders are built upon a scalable, transformer-based architecture and have been efficiently trained on large-scale 3D datasets. The evaluations conducted on both synthetic datasets and real-world images demonstrate that our method not only achieves higher quality but also ensures a faster runtime in comparison to previous state-of-the-art techniques. Please see our project page at https://zouzx.github.io/TriplaneGaussian/.", + "arxiv_url": "http://arxiv.org/abs/2312.09147v2", + "pdf_url": "http://arxiv.org/pdf/2312.09147v2", + "published_date": "2023-12-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching", + "authors": [ + "Yuan Sun", + "Xuan Wang", + "Yunfan Zhang", + "Jie Zhang", + "Caigui Jiang", + "Yu Guo", + "Fei Wang" + ], + "abstract": "We present a method named iComMa to address the 6D camera pose estimation problem in computer vision. Conventional pose estimation methods typically rely on the target's CAD model or necessitate specific network training tailored to particular object classes. Some existing methods have achieved promising results in mesh-free object and scene pose estimation by inverting the Neural Radiance Fields (NeRF). However, they still struggle with adverse initializations such as large rotations and translations. To address this issue, we propose an efficient method for accurate camera pose estimation by inverting 3D Gaussian Splatting (3DGS). Specifically, a gradient-based differentiable framework optimizes camera pose by minimizing the residual between the query image and the rendered image, requiring no training. An end-to-end matching module is designed to enhance the model's robustness against adverse initializations, while minimizing pixel-level comparing loss aids in precise pose estimation. Experimental results on synthetic and complex real-world data demonstrate the effectiveness of the proposed approach in challenging conditions and the accuracy of camera pose estimation.", + "arxiv_url": "http://arxiv.org/abs/2312.09031v2", + "pdf_url": "http://arxiv.org/pdf/2312.09031v2", + "published_date": "2023-12-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes", + "authors": [ + "Xiaoyu Zhou", + "Zhiwei Lin", + "Xiaojun Shan", + "Yongtao Wang", + "Deqing Sun", + "Ming-Hsuan Yang" + ], + "abstract": "We present DrivingGaussian, an efficient and effective framework for surrounding dynamic autonomous driving scenes. For complex scenes with moving objects, we first sequentially and progressively model the static background of the entire scene with incremental static 3D Gaussians. We then leverage a composite dynamic Gaussian graph to handle multiple moving objects, individually reconstructing each object and restoring their accurate positions and occlusion relationships within the scene. We further use a LiDAR prior for Gaussian Splatting to reconstruct scenes with greater details and maintain panoramic consistency. DrivingGaussian outperforms existing methods in dynamic driving scene reconstruction and enables photorealistic surround-view synthesis with high-fidelity and multi-camera consistency. Our project page is at: https://github.com/VDIGPKU/DrivingGaussian.", + "arxiv_url": "http://arxiv.org/abs/2312.07920v3", + "pdf_url": "http://arxiv.org/pdf/2312.07920v3", + "published_date": "2023-12-13", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/VDIGPKU/DrivingGaussian", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "COLMAP-Free 3D Gaussian Splatting", + "authors": [ + "Yang Fu", + "Sifei Liu", + "Amey Kulkarni", + "Jan Kautz", + "Alexei A. Efros", + "Xiaolong Wang" + ], + "abstract": "While neural rendering has led to impressive advances in scene reconstruction and novel view synthesis, it relies heavily on accurately pre-computed camera poses. To relax this constraint, multiple efforts have been made to train Neural Radiance Fields (NeRFs) without pre-processed camera poses. However, the implicit representations of NeRFs provide extra challenges to optimize the 3D structure and camera poses at the same time. On the other hand, the recently proposed 3D Gaussian Splatting provides new opportunities given its explicit point cloud representations. This paper leverages both the explicit geometric representation and the continuity of the input video stream to perform novel view synthesis without any SfM preprocessing. We process the input frames in a sequential manner and progressively grow the 3D Gaussians set by taking one input frame at a time, without the need to pre-compute the camera poses. Our method significantly improves over previous approaches in view synthesis and camera pose estimation under large motion changes. Our project page is https://oasisyang.github.io/colmap-free-3dgs", + "arxiv_url": "http://arxiv.org/abs/2312.07504v2", + "pdf_url": "http://arxiv.org/pdf/2312.07504v2", + "published_date": "2023-12-12", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Splatting SLAM", + "authors": [ + "Hidenobu Matsuki", + "Riku Murai", + "Paul H. J. Kelly", + "Andrew J. Davison" + ], + "abstract": "We present the first application of 3D Gaussian Splatting in monocular SLAM, the most fundamental but the hardest setup for Visual SLAM. Our method, which runs live at 3fps, utilises Gaussians as the only 3D representation, unifying the required representation for accurate, efficient tracking, mapping, and high-quality rendering. Designed for challenging monocular settings, our approach is seamlessly extendable to RGB-D SLAM when an external depth sensor is available. Several innovations are required to continuously reconstruct 3D scenes with high fidelity from a live camera. First, to move beyond the original 3DGS algorithm, which requires accurate poses from an offline Structure from Motion (SfM) system, we formulate camera tracking for 3DGS using direct optimisation against the 3D Gaussians, and show that this enables fast and robust tracking with a wide basin of convergence. Second, by utilising the explicit nature of the Gaussians, we introduce geometric verification and regularisation to handle the ambiguities occurring in incremental 3D dense reconstruction. Finally, we introduce a full SLAM system which not only achieves state-of-the-art results in novel view synthesis and trajectory estimation but also reconstruction of tiny and even transparent objects.", + "arxiv_url": "http://arxiv.org/abs/2312.06741v2", + "pdf_url": "http://arxiv.org/pdf/2312.06741v2", + "published_date": "2023-12-11", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering", + "authors": [ + "Haokai Pang", + "Heming Zhu", + "Adam Kortylewski", + "Christian Theobalt", + "Marc Habermann" + ], + "abstract": "Real-time rendering of photorealistic and controllable human avatars stands as a cornerstone in Computer Vision and Graphics. While recent advances in neural implicit rendering have unlocked unprecedented photorealism for digital avatars, real-time performance has mostly been demonstrated for static scenes only. To address this, we propose ASH, an animatable Gaussian splatting approach for photorealistic rendering of dynamic humans in real-time. We parameterize the clothed human as animatable 3D Gaussians, which can be efficiently splatted into image space to generate the final rendering. However, naively learning the Gaussian parameters in 3D space poses a severe challenge in terms of compute. Instead, we attach the Gaussians onto a deformable character model, and learn their parameters in 2D texture space, which allows leveraging efficient 2D convolutional architectures that easily scale with the required number of Gaussians. We benchmark ASH with competing methods on pose-controllable avatars, demonstrating that our method outperforms existing real-time methods by a large margin and shows comparable or even better results than offline methods.", + "arxiv_url": "http://arxiv.org/abs/2312.05941v2", + "pdf_url": "http://arxiv.org/pdf/2312.05941v2", + "published_date": "2023-12-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CoGS: Controllable Gaussian Splatting", + "authors": [ + "Heng Yu", + "Joel Julin", + "Zoltán Á. Milacski", + "Koichiro Niinuma", + "László A. Jeni" + ], + "abstract": "Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applicability. On the other hand, while single-camera Neural Radiance Fields (NeRFs) offer a more streamlined approach, they have excessive training and rendering costs. 3D Gaussian Splatting would be a suitable alternative but for two reasons. Firstly, existing methods for 3D dynamic Gaussians require synchronized multi-view cameras, and secondly, the lack of controllability in dynamic scenarios. We present CoGS, a method for Controllable Gaussian Splatting, that enables the direct manipulation of scene elements, offering real-time control of dynamic scenes without the prerequisite of pre-computing control signals. We evaluated CoGS using both synthetic and real-world datasets that include dynamic objects that differ in degree of difficulty. In our evaluations, CoGS consistently outperformed existing dynamic and controllable neural representations in terms of visual fidelity.", + "arxiv_url": "http://arxiv.org/abs/2312.05664v2", + "pdf_url": "http://arxiv.org/pdf/2312.05664v2", + "published_date": "2023-12-09", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GIR: 3D Gaussian Inverse Rendering for Relightable Scene Factorization", + "authors": [ + "Yahao Shi", + "Yanmin Wu", + "Chenming Wu", + "Xing Liu", + "Chen Zhao", + "Haocheng Feng", + "Jian Zhang", + "Bin Zhou", + "Errui Ding", + "Jingdong Wang" + ], + "abstract": "This paper presents a 3D Gaussian Inverse Rendering (GIR) method, employing 3D Gaussian representations to effectively factorize the scene into material properties, light, and geometry. The key contributions lie in three-fold. We compute the normal of each 3D Gaussian using the shortest eigenvector, with a directional masking scheme forcing accurate normal estimation without external supervision. We adopt an efficient voxel-based indirect illumination tracing scheme that stores direction-aware outgoing radiance in each 3D Gaussian to disentangle secondary illumination for approximating multi-bounce light transport. To further enhance the illumination disentanglement, we represent a high-resolution environmental map with a learnable low-resolution map and a lightweight, fully convolutional network. Our method achieves state-of-the-art performance in both relighting and novel view synthesis tasks among the recently proposed inverse rendering methods while achieving real-time rendering. This substantiates our proposed method's efficacy and broad applicability, highlighting its potential as an influential tool in various real-time interactive graphics applications such as material editing and relighting. The code will be released at https://github.com/guduxiaolang/GIR.", + "arxiv_url": "http://arxiv.org/abs/2312.05133v2", + "pdf_url": "http://arxiv.org/pdf/2312.05133v2", + "published_date": "2023-12-08", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/guduxiaolang/GIR", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting", + "authors": [ + "Xiaofeng Yang", + "Yiwen Chen", + "Cheng Chen", + "Chi Zhang", + "Yi Xu", + "Xulei Yang", + "Fayao Liu", + "Guosheng Lin" + ], + "abstract": "We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks. Despite the critical importance of these tasks, existing methodologies often struggle to generate high-caliber results. We begin by examining the inherent limitations in previous diffusion priors. We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation. To address this issue, we propose a novel, unified framework that iteratively optimizes both the 3D model and the diffusion prior. Leveraging the different learnable parameters of the diffusion prior, our approach offers multiple configurations, affording various trade-offs between performance and implementation complexity. Notably, our experimental results demonstrate that our method markedly surpasses existing techniques, establishing new state-of-the-art in the realm of text-to-3D generation. Furthermore, our approach exhibits impressive performance on both NeRF and the newly introduced 3D Gaussian Splatting backbones. Additionally, our framework yields insightful contributions to the understanding of recent score distillation methods, such as the VSD and DDS loss.", + "arxiv_url": "http://arxiv.org/abs/2312.04820v1", + "pdf_url": "http://arxiv.org/pdf/2312.04820v1", + "published_date": "2023-12-08", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS", + "authors": [ + "Sharath Girish", + "Kamal Gupta", + "Abhinav Shrivastava" + ], + "abstract": "Recently, 3D Gaussian splatting (3D-GS) has gained popularity in novel-view scene synthesis. It addresses the challenges of lengthy training times and slow rendering speeds associated with Neural Radiance Fields (NeRFs). Through rapid, differentiable rasterization of 3D Gaussians, 3D-GS achieves real-time rendering and accelerated training. They, however, demand substantial memory resources for both training and storage, as they require millions of Gaussians in their point cloud representation for each scene. We present a technique utilizing quantized embeddings to significantly reduce per-point memory storage requirements and a coarse-to-fine training strategy for a faster and more stable optimization of the Gaussian point clouds. Our approach develops a pruning stage which results in scene representations with fewer Gaussians, leading to faster training times and rendering speeds for real-time rendering of high resolution scenes. We reduce storage memory by more than an order of magnitude all while preserving the reconstruction quality. We validate the effectiveness of our approach on a variety of datasets and scenes preserving the visual quality while consuming 10-20x lesser memory and faster training/inference speed. Project page and code is available https://efficientgaussian.github.io", + "arxiv_url": "http://arxiv.org/abs/2312.04564v3", + "pdf_url": "http://arxiv.org/pdf/2312.04564v3", + "published_date": "2023-12-07", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar", + "authors": [ + "Yufan Chen", + "Lizhen Wang", + "Qijing Li", + "Hongjiang Xiao", + "Shengping Zhang", + "Hongxun Yao", + "Yebin Liu" + ], + "abstract": "The ability to animate photo-realistic head avatars reconstructed from monocular portrait video sequences represents a crucial step in bridging the gap between the virtual and real worlds. Recent advancements in head avatar techniques, including explicit 3D morphable meshes (3DMM), point clouds, and neural implicit representation have been exploited for this ongoing research. However, 3DMM-based methods are constrained by their fixed topologies, point-based approaches suffer from a heavy training burden due to the extensive quantity of points involved, and the last ones suffer from limitations in deformation flexibility and rendering efficiency. In response to these challenges, we propose MonoGaussianAvatar (Monocular Gaussian Point-based Head Avatar), a novel approach that harnesses 3D Gaussian point representation coupled with a Gaussian deformation field to learn explicit head avatars from monocular portrait videos. We define our head avatars with Gaussian points characterized by adaptable shapes, enabling flexible topology. These points exhibit movement with a Gaussian deformation field in alignment with the target pose and expression of a person, facilitating efficient deformation. Additionally, the Gaussian points have controllable shape, size, color, and opacity combined with Gaussian splatting, allowing for efficient training and rendering. Experiments demonstrate the superior performance of our method, which achieves state-of-the-art results among previous methods.", + "arxiv_url": "http://arxiv.org/abs/2312.04558v1", + "pdf_url": "http://arxiv.org/pdf/2312.04558v1", + "published_date": "2023-12-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Relightable Gaussian Codec Avatars", + "authors": [ + "Shunsuke Saito", + "Gabriel Schwartz", + "Tomas Simon", + "Junxuan Li", + "Giljoo Nam" + ], + "abstract": "The fidelity of relighting is bounded by both geometry and appearance representations. For geometry, both mesh and volumetric approaches have difficulty modeling intricate structures like 3D hair geometry. For appearance, existing relighting models are limited in fidelity and often too slow to render in real-time with high-resolution continuous environments. In this work, we present Relightable Gaussian Codec Avatars, a method to build high-fidelity relightable head avatars that can be animated to generate novel expressions. Our geometry model based on 3D Gaussians can capture 3D-consistent sub-millimeter details such as hair strands and pores on dynamic face sequences. To support diverse materials of human heads such as the eyes, skin, and hair in a unified manner, we present a novel relightable appearance model based on learnable radiance transfer. Together with global illumination-aware spherical harmonics for the diffuse components, we achieve real-time relighting with all-frequency reflections using spherical Gaussians. This appearance model can be efficiently relit under both point light and continuous illumination. We further improve the fidelity of eye reflections and enable explicit gaze control by introducing relightable explicit eye models. Our method outperforms existing approaches without compromising real-time performance. We also demonstrate real-time relighting of avatars on a tethered consumer VR headset, showcasing the efficiency and fidelity of our avatars.", + "arxiv_url": "http://arxiv.org/abs/2312.03704v2", + "pdf_url": "http://arxiv.org/pdf/2312.03704v2", + "published_date": "2023-12-06", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting", + "authors": [ + "Yuheng Jiang", + "Zhehao Shen", + "Penghao Wang", + "Zhuo Su", + "Yu Hong", + "Yingliang Zhang", + "Jingyi Yu", + "Lan Xu" + ], + "abstract": "We have recently seen tremendous progress in photo-real human modeling and rendering. Yet, efficiently rendering realistic human performance and integrating it into the rasterization pipeline remains challenging. In this paper, we present HiFi4G, an explicit and compact Gaussian-based approach for high-fidelity human performance rendering from dense footage. Our core intuition is to marry the 3D Gaussian representation with non-rigid tracking, achieving a compact and compression-friendly representation. We first propose a dual-graph mechanism to obtain motion priors, with a coarse deformation graph for effective initialization and a fine-grained Gaussian graph to enforce subsequent constraints. Then, we utilize a 4D Gaussian optimization scheme with adaptive spatial-temporal regularizers to effectively balance the non-rigid prior and Gaussian updating. We also present a companion compression scheme with residual compensation for immersive experiences on various platforms. It achieves a substantial compression rate of approximately 25 times, with less than 2MB of storage per frame. Extensive experiments demonstrate the effectiveness of our approach, which significantly outperforms existing approaches in terms of optimization speed, rendering quality, and storage overhead.", + "arxiv_url": "http://arxiv.org/abs/2312.03461v2", + "pdf_url": "http://arxiv.org/pdf/2312.03461v2", + "published_date": "2023-12-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle", + "authors": [ + "Youtian Lin", + "Zuozhuo Dai", + "Siyu Zhu", + "Yao Yao" + ], + "abstract": "We introduce Gaussian-Flow, a novel point-based approach for fast dynamic scene reconstruction and real-time rendering from both multi-view and monocular videos. In contrast to the prevalent NeRF-based approaches hampered by slow training and rendering speeds, our approach harnesses recent advancements in point-based 3D Gaussian Splatting (3DGS). Specifically, a novel Dual-Domain Deformation Model (DDDM) is proposed to explicitly model attribute deformations of each Gaussian point, where the time-dependent residual of each attribute is captured by a polynomial fitting in the time domain, and a Fourier series fitting in the frequency domain. The proposed DDDM is capable of modeling complex scene deformations across long video footage, eliminating the need for training separate 3DGS for each frame or introducing an additional implicit neural field to model 3D dynamics. Moreover, the explicit deformation modeling for discretized Gaussian points ensures ultra-fast training and rendering of a 4D scene, which is comparable to the original 3DGS designed for static 3D reconstruction. Our proposed approach showcases a substantial efficiency improvement, achieving a $5\\times$ faster training speed compared to the per-frame 3DGS modeling. In addition, quantitative results demonstrate that the proposed Gaussian-Flow significantly outperforms previous leading methods in novel view rendering quality. Project page: https://nju-3dv.github.io/projects/Gaussian-Flow", + "arxiv_url": "http://arxiv.org/abs/2312.03431v1", + "pdf_url": "http://arxiv.org/pdf/2312.03431v1", + "published_date": "2023-12-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian-SLAM: Photo-realistic Dense SLAM with Gaussian Splatting", + "authors": [ + "Vladimir Yugay", + "Yue Li", + "Theo Gevers", + "Martin R. Oswald" + ], + "abstract": "We present a dense simultaneous localization and mapping (SLAM) method that uses 3D Gaussians as a scene representation. Our approach enables interactive-time reconstruction and photo-realistic rendering from real-world single-camera RGBD videos. To this end, we propose a novel effective strategy for seeding new Gaussians for newly explored areas and their effective online optimization that is independent of the scene size and thus scalable to larger scenes. This is achieved by organizing the scene into sub-maps which are independently optimized and do not need to be kept in memory. We further accomplish frame-to-model camera tracking by minimizing photometric and geometric losses between the input and rendered frames. The Gaussian representation allows for high-quality photo-realistic real-time rendering of real-world scenes. Evaluation on synthetic and real-world datasets demonstrates competitive or superior performance in mapping, tracking, and rendering compared to existing neural dense SLAM methods.", + "arxiv_url": "http://arxiv.org/abs/2312.10070v2", + "pdf_url": "http://arxiv.org/pdf/2312.10070v2", + "published_date": "2023-12-06", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields", + "authors": [ + "Shijie Zhou", + "Haoran Chang", + "Sicheng Jiang", + "Zhiwen Fan", + "Zehao Zhu", + "Dejia Xu", + "Pradyumna Chari", + "Suya You", + "Zhangyang Wang", + "Achuta Kadambi" + ], + "abstract": "3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundation models. However, these methods have two major limitations: (a) they are limited by the rendering speed of NeRF pipelines, and (b) implicitly represented feature fields suffer from continuity artifacts reducing feature quality. Recently, 3D Gaussian Splatting has shown state-of-the-art performance on real-time radiance field rendering. In this work, we go one step further: in addition to radiance field rendering, we enable 3D Gaussian splatting on arbitrary-dimension semantic features via 2D foundation model distillation. This translation is not straightforward: naively incorporating feature fields in the 3DGS framework encounters significant challenges, notably the disparities in spatial resolution and channel consistency between RGB images and feature maps. We propose architectural and training changes to efficiently avert this problem. Our proposed method is general, and our experiments showcase novel view semantic segmentation, language-guided editing and segment anything through learning feature fields from state-of-the-art 2D foundation models such as SAM and CLIP-LSeg. Across experiments, our distillation method is able to provide comparable or better results, while being significantly faster to both train and render. Additionally, to the best of our knowledge, we are the first method to enable point and bounding-box prompting for radiance field manipulation, by leveraging the SAM model. Project website at: https://feature-3dgs.github.io/", + "arxiv_url": "http://arxiv.org/abs/2312.03203v3", + "pdf_url": "http://arxiv.org/pdf/2312.03203v3", + "published_date": "2023-12-06", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing", + "authors": [ + "Yushi Lan", + "Feitong Tan", + "Di Qiu", + "Qiangeng Xu", + "Kyle Genova", + "Zeng Huang", + "Sean Fanello", + "Rohit Pandey", + "Thomas Funkhouser", + "Chen Change Loy", + "Yinda Zhang" + ], + "abstract": "We present a novel framework for generating photorealistic 3D human head and subsequently manipulating and reposing them with remarkable flexibility. The proposed approach leverages an implicit function representation of 3D human heads, employing 3D Gaussians anchored on a parametric face model. To enhance representational capabilities and encode spatial information, we embed a lightweight tri-plane payload within each Gaussian rather than directly storing color and opacity. Additionally, we parameterize the Gaussians in a 2D UV space via a 3DMM, enabling effective utilization of the diffusion model for 3D head avatar generation. Our method facilitates the creation of diverse and realistic 3D human heads with fine-grained editing over facial features and expressions. Extensive experiments demonstrate the effectiveness of our method.", + "arxiv_url": "http://arxiv.org/abs/2312.03763v3", + "pdf_url": "http://arxiv.org/pdf/2312.03763v3", + "published_date": "2023-12-05", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GauHuman: Articulated Gaussian Splatting from Monocular Human Videos", + "authors": [ + "Shoukang Hu", + "Ziwei Liu" + ], + "abstract": "We present, GauHuman, a 3D human model with Gaussian Splatting for both fast training (1 ~ 2 minutes) and real-time rendering (up to 189 FPS), compared with existing NeRF-based implicit representation modelling frameworks demanding hours of training and seconds of rendering per frame. Specifically, GauHuman encodes Gaussian Splatting in the canonical space and transforms 3D Gaussians from canonical space to posed space with linear blend skinning (LBS), in which effective pose and LBS refinement modules are designed to learn fine details of 3D humans under negligible computational cost. Moreover, to enable fast optimization of GauHuman, we initialize and prune 3D Gaussians with 3D human prior, while splitting/cloning via KL divergence guidance, along with a novel merge operation for further speeding up. Extensive experiments on ZJU_Mocap and MonoCap datasets demonstrate that GauHuman achieves state-of-the-art performance quantitatively and qualitatively with fast training and real-time rendering speed. Notably, without sacrificing rendering quality, GauHuman can fast model the 3D human performer with ~13k 3D Gaussians.", + "arxiv_url": "http://arxiv.org/abs/2312.02973v1", + "pdf_url": "http://arxiv.org/pdf/2312.02973v1", + "published_date": "2023-12-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting", + "authors": [ + "Helisa Dhamo", + "Yinyu Nie", + "Arthur Moreau", + "Jifei Song", + "Richard Shaw", + "Yiren Zhou", + "Eduardo Pérez-Pellitero" + ], + "abstract": "3D head animation has seen major quality and runtime improvements over the last few years, particularly empowered by the advances in differentiable rendering and neural radiance fields. Real-time rendering is a highly desirable goal for real-world applications. We propose HeadGaS, a model that uses 3D Gaussian Splats (3DGS) for 3D head reconstruction and animation. In this paper we introduce a hybrid model that extends the explicit 3DGS representation with a base of learnable latent features, which can be linearly blended with low-dimensional parameters from parametric head models to obtain expression-dependent color and opacity values. We demonstrate that HeadGaS delivers state-of-the-art results in real-time inference frame rates, surpassing baselines by up to 2dB, while accelerating rendering speed by over x10.", + "arxiv_url": "http://arxiv.org/abs/2312.02902v2", + "pdf_url": "http://arxiv.org/pdf/2312.02902v2", + "published_date": "2023-12-05", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HHAvatar: Gaussian Head Avatar with Dynamic Hairs", + "authors": [ + "Zhanfeng Liao", + "Yuelang Xu", + "Zhe Li", + "Qijing Li", + "Boyao Zhou", + "Ruifeng Bai", + "Di Xu", + "Hongwen Zhang", + "Yebin Liu" + ], + "abstract": "Creating high-fidelity 3D head avatars has always been a research hotspot, but it remains a great challenge under lightweight sparse view setups. In this paper, we propose HHAvatar represented by controllable 3D Gaussians for high-fidelity head avatar with dynamic hair modeling. We first use 3D Gaussians to represent the appearance of the head, and then jointly optimize neutral 3D Gaussians and a fully learned MLP-based deformation field to capture complex expressions. The two parts benefit each other, thereby our method can model fine-grained dynamic details while ensuring expression accuracy. Furthermore, we devise a well-designed geometry-guided initialization strategy based on implicit SDF and Deep Marching Tetrahedra for the stability and convergence of the training procedure. To address the problem of dynamic hair modeling, we introduce a hybrid head model into our avatar representation based Gaussian Head Avatar and a training method that considers timing information and an occlusion perception module to model the non-rigid motion of hair. Experiments show that our approach outperforms other state-of-the-art sparse-view methods, achieving ultra high-fidelity rendering quality at 2K resolution even under exaggerated expressions and driving hairs reasonably with the motion of the head", + "arxiv_url": "http://arxiv.org/abs/2312.03029v3", + "pdf_url": "http://arxiv.org/pdf/2312.03029v3", + "published_date": "2023-12-05", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis", + "authors": [ + "Shunyuan Zheng", + "Boyao Zhou", + "Ruizhi Shao", + "Boning Liu", + "Shengping Zhang", + "Liqiang Nie", + "Yebin Liu" + ], + "abstract": "We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2312.02155v3", + "pdf_url": "http://arxiv.org/pdf/2312.02155v3", + "published_date": "2023-12-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "MANUS: Markerless Grasp Capture using Articulated 3D Gaussians", + "authors": [ + "Chandradeep Pokhariya", + "Ishaan N Shah", + "Angela Xing", + "Zekun Li", + "Kefan Chen", + "Avinash Sharma", + "Srinath Sridhar" + ], + "abstract": "Understanding how we grasp objects with our hands has important applications in areas like robotics and mixed reality. However, this challenging problem requires accurate modeling of the contact between hands and objects. To capture grasps, existing methods use skeletons, meshes, or parametric models that does not represent hand shape accurately resulting in inaccurate contacts. We present MANUS, a method for Markerless Hand-Object Grasp Capture using Articulated 3D Gaussians. We build a novel articulated 3D Gaussians representation that extends 3D Gaussian splatting for high-fidelity representation of articulating hands. Since our representation uses Gaussian primitives, it enables us to efficiently and accurately estimate contacts between the hand and the object. For the most accurate results, our method requires tens of camera views that current datasets do not provide. We therefore build MANUS-Grasps, a new dataset that contains hand-object grasps viewed from 50+ cameras across 30+ scenes, 3 subjects, and comprising over 7M frames. In addition to extensive qualitative results, we also show that our method outperforms others on a quantitative contact evaluation method that uses paint transfer from the object to the hand.", + "arxiv_url": "http://arxiv.org/abs/2312.02137v2", + "pdf_url": "http://arxiv.org/pdf/2312.02137v2", + "published_date": "2023-12-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Re-Nerfing: Improving Novel View Synthesis through Novel View Synthesis", + "authors": [ + "Felix Tristram", + "Stefano Gasperini", + "Nassir Navab", + "Federico Tombari" + ], + "abstract": "Recent neural rendering and reconstruction techniques, such as NeRFs or Gaussian Splatting, have shown remarkable novel view synthesis capabilities but require hundreds of images of the scene from diverse viewpoints to render high-quality novel views. With fewer images available, these methods start to fail since they can no longer correctly triangulate the underlying 3D geometry and converge to a non-optimal solution. These failures can manifest as floaters or blurry renderings in sparsely observed areas of the scene. In this paper, we propose Re-Nerfing, a simple and general add-on approach that leverages novel view synthesis itself to tackle this problem. Using an already trained NVS method, we render novel views between existing ones and augment the training data to optimize a second model. This introduces additional multi-view constraints and allows the second model to converge to a better solution. With Re-Nerfing we achieve significant improvements upon multiple pipelines based on NeRF and Gaussian-Splatting in sparse view settings of the mip-NeRF 360 and LLFF datasets. Notably, Re-Nerfing does not require prior knowledge or extra supervision signals, making it a flexible and practical add-on.", + "arxiv_url": "http://arxiv.org/abs/2312.02255v3", + "pdf_url": "http://arxiv.org/pdf/2312.02255v3", + "published_date": "2023-12-04", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians", + "authors": [ + "Liangxiao Hu", + "Hongwen Zhang", + "Yuxiang Zhang", + "Boyao Zhou", + "Boning Liu", + "Shengping Zhang", + "Liqiang Nie" + ], + "abstract": "We present GaussianAvatar, an efficient approach to creating realistic human avatars with dynamic 3D appearances from a single video. We start by introducing animatable 3D Gaussians to explicitly represent humans in various poses and clothing styles. Such an explicit and animatable representation can fuse 3D appearances more efficiently and consistently from 2D observations. Our representation is further augmented with dynamic properties to support pose-dependent appearance modeling, where a dynamic appearance network along with an optimizable feature tensor is designed to learn the motion-to-appearance mapping. Moreover, by leveraging the differentiable motion condition, our method enables a joint optimization of motions and appearances during avatar modeling, which helps to tackle the long-standing issue of inaccurate motion estimation in monocular settings. The efficacy of GaussianAvatar is validated on both the public dataset and our collected dataset, demonstrating its superior performances in terms of appearance quality and rendering efficiency.", + "arxiv_url": "http://arxiv.org/abs/2312.02134v3", + "pdf_url": "http://arxiv.org/pdf/2312.02134v3", + "published_date": "2023-12-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAM", + "authors": [ + "Nikhil Keetha", + "Jay Karhade", + "Krishna Murthy Jatavallabhula", + "Gengshan Yang", + "Sebastian Scherer", + "Deva Ramanan", + "Jonathon Luiten" + ], + "abstract": "Dense simultaneous localization and mapping (SLAM) is crucial for robotics and augmented reality applications. However, current methods are often hampered by the non-volumetric or implicit way they represent a scene. This work introduces SplaTAM, an approach that, for the first time, leverages explicit volumetric representations, i.e., 3D Gaussians, to enable high-fidelity reconstruction from a single unposed RGB-D camera, surpassing the capabilities of existing methods. SplaTAM employs a simple online tracking and mapping system tailored to the underlying Gaussian representation. It utilizes a silhouette mask to elegantly capture the presence of scene density. This combination enables several benefits over prior representations, including fast rendering and dense optimization, quickly determining if areas have been previously mapped, and structured map expansion by adding more Gaussians. Extensive experiments show that SplaTAM achieves up to 2x superior performance in camera pose estimation, map construction, and novel-view synthesis over existing methods, paving the way for more immersive high-fidelity SLAM applications.", + "arxiv_url": "http://arxiv.org/abs/2312.02126v3", + "pdf_url": "http://arxiv.org/pdf/2312.02126v3", + "published_date": "2023-12-04", + "categories": [ + "cs.CV", + "cs.AI", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mathematical Supplement for the $\\texttt{gsplat}$ Library", + "authors": [ + "Vickie Ye", + "Angjoo Kanazawa" + ], + "abstract": "This report provides the mathematical details of the gsplat library, a modular toolbox for efficient differentiable Gaussian splatting, as proposed by Kerbl et al. It provides a self-contained reference for the computations involved in the forward and backward passes of differentiable Gaussian splatting. To facilitate practical usage and development, we provide a user friendly Python API that exposes each component of the forward and backward passes in rasterization at github.com/nerfstudio-project/gsplat .", + "arxiv_url": "http://arxiv.org/abs/2312.02121v1", + "pdf_url": "http://arxiv.org/pdf/2312.02121v1", + "published_date": "2023-12-04", + "categories": [ + "cs.MS", + "cs.CV", + "cs.GR", + "cs.NA", + "math.NA" + ], + "github_url": "https://github.com/nerfstudio-project/gsplat", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianAvatars: Photorealistic Head Avatars with Rigged 3D Gaussians", + "authors": [ + "Shenhan Qian", + "Tobias Kirschstein", + "Liam Schoneveld", + "Davide Davoli", + "Simon Giebenhain", + "Matthias Nießner" + ], + "abstract": "We introduce GaussianAvatars, a new method to create photorealistic head avatars that are fully controllable in terms of expression, pose, and viewpoint. The core idea is a dynamic 3D representation based on 3D Gaussian splats that are rigged to a parametric morphable face model. This combination facilitates photorealistic rendering while allowing for precise animation control via the underlying parametric model, e.g., through expression transfer from a driving sequence or by manually changing the morphable model parameters. We parameterize each splat by a local coordinate frame of a triangle and optimize for explicit displacement offset to obtain a more accurate geometric representation. During avatar reconstruction, we jointly optimize for the morphable model parameters and Gaussian splat parameters in an end-to-end fashion. We demonstrate the animation capabilities of our photorealistic avatar in several challenging scenarios. For instance, we show reenactments from a driving video, where our method outperforms existing works by a significant margin.", + "arxiv_url": "http://arxiv.org/abs/2312.02069v2", + "pdf_url": "http://arxiv.org/pdf/2312.02069v2", + "published_date": "2023-12-04", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes", + "authors": [ + "Yi-Hua Huang", + "Yang-Tian Sun", + "Ziyi Yang", + "Xiaoyang Lyu", + "Yan-Pei Cao", + "Xiaojuan Qi" + ], + "abstract": "Novel view synthesis for dynamic scenes is still a challenging problem in computer vision and graphics. Recently, Gaussian splatting has emerged as a robust technique to represent static scenes and enable high-quality and real-time novel view synthesis. Building upon this technique, we propose a new representation that explicitly decomposes the motion and appearance of dynamic scenes into sparse control points and dense Gaussians, respectively. Our key idea is to use sparse control points, significantly fewer in number than the Gaussians, to learn compact 6 DoF transformation bases, which can be locally interpolated through learned interpolation weights to yield the motion field of 3D Gaussians. We employ a deformation MLP to predict time-varying 6 DoF transformations for each control point, which reduces learning complexities, enhances learning abilities, and facilitates obtaining temporal and spatial coherent motion patterns. Then, we jointly learn the 3D Gaussians, the canonical space locations of control points, and the deformation MLP to reconstruct the appearance, geometry, and dynamics of 3D scenes. During learning, the location and number of control points are adaptively adjusted to accommodate varying motion complexities in different regions, and an ARAP loss following the principle of as rigid as possible is developed to enforce spatial continuity and local rigidity of learned motions. Finally, thanks to the explicit sparse motion representation and its decomposition from appearance, our method can enable user-controlled motion editing while retaining high-fidelity appearances. Extensive experiments demonstrate that our approach outperforms existing approaches on novel view synthesis with a high rendering speed and enables novel appearance-preserved motion editing applications. Project page: https://yihua7.github.io/SC-GS-web/", + "arxiv_url": "http://arxiv.org/abs/2312.14937v3", + "pdf_url": "http://arxiv.org/pdf/2312.14937v3", + "published_date": "2023-12-04", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianHead: High-fidelity Head Avatars with Learnable Gaussian Derivation", + "authors": [ + "Jie Wang", + "Jiu-Cheng Xie", + "Xianyan Li", + "Feng Xu", + "Chi-Man Pun", + "Hao Gao" + ], + "abstract": "Constructing vivid 3D head avatars for given subjects and realizing a series of animations on them is valuable yet challenging. This paper presents GaussianHead, which models the actional human head with anisotropic 3D Gaussians. In our framework, a motion deformation field and multi-resolution tri-plane are constructed respectively to deal with the head's dynamic geometry and complex texture. Notably, we impose an exclusive derivation scheme on each Gaussian, which generates its multiple doppelgangers through a set of learnable parameters for position transformation. With this design, we can compactly and accurately encode the appearance information of Gaussians, even those fitting the head's particular components with sophisticated structures. In addition, an inherited derivation strategy for newly added Gaussians is adopted to facilitate training acceleration. Extensive experiments show that our method can produce high-fidelity renderings, outperforming state-of-the-art approaches in reconstruction, cross-identity reenactment, and novel view synthesis tasks. Our code is available at: https://github.com/chiehwangs/gaussian-head.", + "arxiv_url": "http://arxiv.org/abs/2312.01632v4", + "pdf_url": "http://arxiv.org/pdf/2312.01632v4", + "published_date": "2023-12-04", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/chiehwangs/gaussian-head", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding", + "authors": [ + "Jun Xiang", + "Xuan Gao", + "Yudong Guo", + "Juyong Zhang" + ], + "abstract": "We propose FlashAvatar, a novel and lightweight 3D animatable avatar representation that could reconstruct a digital avatar from a short monocular video sequence in minutes and render high-fidelity photo-realistic images at 300FPS on a consumer-grade GPU. To achieve this, we maintain a uniform 3D Gaussian field embedded in the surface of a parametric face model and learn extra spatial offset to model non-surface regions and subtle facial details. While full use of geometric priors can capture high-frequency facial details and preserve exaggerated expressions, proper initialization can help reduce the number of Gaussians, thus enabling super-fast rendering speed. Extensive experimental results demonstrate that FlashAvatar outperforms existing works regarding visual quality and personalized details and is almost an order of magnitude faster in rendering speed. Project page: https://ustc3dv.github.io/FlashAvatar/", + "arxiv_url": "http://arxiv.org/abs/2312.02214v2", + "pdf_url": "http://arxiv.org/pdf/2312.02214v2", + "published_date": "2023-12-03", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction", + "authors": [ + "Devikalyan Das", + "Christopher Wewer", + "Raza Yunus", + "Eddy Ilg", + "Jan Eric Lenssen" + ], + "abstract": "Reconstructing dynamic objects from monocular videos is a severely underconstrained and challenging problem, and recent work has approached it in various directions. However, owing to the ill-posed nature of this problem, there has been no solution that can provide consistent, high-quality novel views from camera positions that are significantly different from the training views. In this work, we introduce Neural Parametric Gaussians (NPGs) to take on this challenge by imposing a two-stage approach: first, we fit a low-rank neural deformation model, which then is used as regularization for non-rigid reconstruction in the second stage. The first stage learns the object's deformations such that it preserves consistency in novel views. The second stage obtains high reconstruction quality by optimizing 3D Gaussians that are driven by the coarse model. To this end, we introduce a local 3D Gaussian representation, where temporally shared Gaussians are anchored in and deformed by local oriented volumes. The resulting combined model can be rendered as radiance fields, resulting in high-quality photo-realistic reconstructions of the non-rigidly deforming objects. We demonstrate that NPGs achieve superior results compared to previous works, especially in challenging scenarios with few multi-view cues.", + "arxiv_url": "http://arxiv.org/abs/2312.01196v2", + "pdf_url": "http://arxiv.org/pdf/2312.01196v2", + "published_date": "2023-12-02", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D", + "authors": [ + "Pengsheng Guo", + "Hans Hao", + "Adam Caccavale", + "Zhongzheng Ren", + "Edward Zhang", + "Qi Shan", + "Aditya Sankar", + "Alexander G. Schwing", + "Alex Colburn", + "Fangchang Ma" + ], + "abstract": "In the realm of text-to-3D generation, utilizing 2D diffusion models through score distillation sampling (SDS) frequently leads to issues such as blurred appearances and multi-faced geometry, primarily due to the intrinsically noisy nature of the SDS loss. Our analysis identifies the core of these challenges as the interaction among noise levels in the 2D diffusion process, the architecture of the diffusion network, and the 3D model representation. To overcome these limitations, we present StableDreamer, a methodology incorporating three advances. First, inspired by InstructNeRF2NeRF, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss. This finding provides a novel tool to debug SDS, which we use to show the impact of time-annealing noise levels on reducing multi-faced geometries. Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition. Based on this observation, StableDreamer introduces a two-stage training strategy that effectively combines these aspects, resulting in high-fidelity 3D models. Third, we adopt an anisotropic 3D Gaussians representation, replacing Neural Radiance Fields (NeRFs), to enhance the overall quality, reduce memory usage during training, and accelerate rendering speeds, and better capture semi-transparent objects. StableDreamer reduces multi-face geometries, generates fine details, and converges stably.", + "arxiv_url": "http://arxiv.org/abs/2312.02189v1", + "pdf_url": "http://arxiv.org/pdf/2312.02189v1", + "published_date": "2023-12-02", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DISTWAR: Fast Differentiable Rendering on Raster-based Rendering Pipelines", + "authors": [ + "Sankeerth Durvasula", + "Adrian Zhao", + "Fan Chen", + "Ruofan Liang", + "Pawan Kumar Sanjaya", + "Nandita Vijaykumar" + ], + "abstract": "Differentiable rendering is a technique used in an important emerging class of visual computing applications that involves representing a 3D scene as a model that is trained from 2D images using gradient descent. Recent works (e.g. 3D Gaussian Splatting) use a rasterization pipeline to enable rendering high quality photo-realistic imagery at high speeds from these learned 3D models. These methods have been demonstrated to be very promising, providing state-of-art quality for many important tasks. However, training a model to represent a scene is still a time-consuming task even when using powerful GPUs. In this work, we observe that the gradient computation phase during training is a significant bottleneck on GPUs due to the large number of atomic operations that need to be processed. These atomic operations overwhelm atomic units in the L2 partitions causing stalls. To address this challenge, we leverage the observations that during the gradient computation: (1) for most warps, all threads atomically update the same memory locations; and (2) warps generate varying amounts of atomic traffic (since some threads may be inactive). We propose DISTWAR, a software-approach to accelerate atomic operations based on two key ideas: First, we enable warp-level reduction of threads at the SM sub-cores using registers to leverage the locality in intra-warp atomic updates. Second, we distribute the atomic computation between the warp-level reduction at the SM and the L2 atomic units to increase the throughput of atomic computation. Warps with many threads performing atomic updates to the same memory locations are scheduled at the SM, and the rest using L2 atomic units. We implement DISTWAR using existing warp-level primitives. We evaluate DISTWAR on widely used raster-based differentiable rendering workloads. We demonstrate significant speedups of 2.44x on average (up to 5.7x).", + "arxiv_url": "http://arxiv.org/abs/2401.05345v1", + "pdf_url": "http://arxiv.org/pdf/2401.05345v1", + "published_date": "2023-12-01", + "categories": [ + "cs.CV", + "cs.GR", + "cs.PF" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Segment Any 3D Gaussians", + "authors": [ + "Jiazhong Cen", + "Jiemin Fang", + "Chen Yang", + "Lingxi Xie", + "Xiaopeng Zhang", + "Wei Shen", + "Qi Tian" + ], + "abstract": "This paper presents SAGA (Segment Any 3D GAussians), a highly efficient 3D promptable segmentation method based on 3D Gaussian Splatting (3D-GS). Given 2D visual prompts as input, SAGA can segment the corresponding 3D target represented by 3D Gaussians within 4 ms. This is achieved by attaching an scale-gated affinity feature to each 3D Gaussian to endow it a new property towards multi-granularity segmentation. Specifically, a scale-aware contrastive training strategy is proposed for the scale-gated affinity feature learning. It 1) distills the segmentation capability of the Segment Anything Model (SAM) from 2D masks into the affinity features and 2) employs a soft scale gate mechanism to deal with multi-granularity ambiguity in 3D segmentation through adjusting the magnitude of each feature channel according to a specified 3D physical scale. Evaluations demonstrate that SAGA achieves real-time multi-granularity segmentation with quality comparable to state-of-the-art methods. As one of the first methods addressing promptable segmentation in 3D-GS, the simplicity and effectiveness of SAGA pave the way for future advancements in this field. Our code will be released.", + "arxiv_url": "http://arxiv.org/abs/2312.00860v2", + "pdf_url": "http://arxiv.org/pdf/2312.00860v2", + "published_date": "2023-12-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Grouping: Segment and Edit Anything in 3D Scenes", + "authors": [ + "Mingqiao Ye", + "Martin Danelljan", + "Fisher Yu", + "Lei Ke" + ], + "abstract": "The recent Gaussian Splatting achieves high-quality and real-time novel-view synthesis of the 3D scenes. However, it is solely concentrated on the appearance and geometry modeling, while lacking in fine-grained object-level scene understanding. To address this issue, we propose Gaussian Grouping, which extends Gaussian Splatting to jointly reconstruct and segment anything in open-world 3D scenes. We augment each Gaussian with a compact Identity Encoding, allowing the Gaussians to be grouped according to their object instance or stuff membership in the 3D scene. Instead of resorting to expensive 3D labels, we supervise the Identity Encodings during the differentiable rendering by leveraging the 2D mask predictions by Segment Anything Model (SAM), along with introduced 3D spatial consistency regularization. Compared to the implicit NeRF representation, we show that the discrete and grouped 3D Gaussians can reconstruct, segment and edit anything in 3D with high visual quality, fine granularity and efficiency. Based on Gaussian Grouping, we further propose a local Gaussian Editing scheme, which shows efficacy in versatile scene editing applications, including 3D object removal, inpainting, colorization, style transfer and scene recomposition. Our code and models are at https://github.com/lkeab/gaussian-grouping.", + "arxiv_url": "http://arxiv.org/abs/2312.00732v2", + "pdf_url": "http://arxiv.org/pdf/2312.00732v2", + "published_date": "2023-12-01", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "https://github.com/lkeab/gaussian-grouping", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting", + "authors": [ + "Zehao Zhu", + "Zhiwen Fan", + "Yifan Jiang", + "Zhangyang Wang" + ], + "abstract": "Novel view synthesis from limited observations remains an important and persistent task. However, high efficiency in existing NeRF-based few-shot view synthesis is often compromised to obtain an accurate 3D representation. To address this challenge, we propose a few-shot view synthesis framework based on 3D Gaussian Splatting that enables real-time and photo-realistic view synthesis with as few as three training views. The proposed method, dubbed FSGS, handles the extremely sparse initialized SfM points with a thoughtfully designed Gaussian Unpooling process. Our method iteratively distributes new Gaussians around the most representative locations, subsequently infilling local details in vacant areas. We also integrate a large-scale pre-trained monocular depth estimator within the Gaussians optimization process, leveraging online augmented views to guide the geometric optimization towards an optimal solution. Starting from sparse points observed from limited input viewpoints, our FSGS can accurately grow into unseen regions, comprehensively covering the scene and boosting the rendering quality of novel views. Overall, FSGS achieves state-of-the-art performance in both accuracy and rendering efficiency across diverse datasets, including LLFF, Mip-NeRF360, and Blender. Project website: https://zehaozhu.github.io/FSGS/.", + "arxiv_url": "http://arxiv.org/abs/2312.00451v2", + "pdf_url": "http://arxiv.org/pdf/2312.00451v2", + "published_date": "2023-12-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance", + "authors": [ + "Hanlin Chen", + "Chen Li", + "Gim Hee Lee" + ], + "abstract": "Existing neural implicit surface reconstruction methods have achieved impressive performance in multi-view 3D reconstruction by leveraging explicit geometry priors such as depth maps or point clouds as regularization. However, the reconstruction results still lack fine details because of the over-smoothed depth map or sparse point cloud. In this work, we propose a neural implicit surface reconstruction pipeline with guidance from 3D Gaussian Splatting to recover highly detailed surfaces. The advantage of 3D Gaussian Splatting is that it can generate dense point clouds with detailed structure. Nonetheless, a naive adoption of 3D Gaussian Splatting can fail since the generated points are the centers of 3D Gaussians that do not necessarily lie on the surface. We thus introduce a scale regularizer to pull the centers close to the surface by enforcing the 3D Gaussians to be extremely thin. Moreover, we propose to refine the point cloud from 3D Gaussians Splatting with the normal priors from the surface predicted by neural implicit models instead of using a fixed set of points as guidance. Consequently, the quality of surface reconstruction improves from the guidance of the more accurate 3D Gaussian splatting. By jointly optimizing the 3D Gaussian Splatting and the neural implicit model, our approach benefits from both representations and generates complete surfaces with intricate details. Experiments on Tanks and Temples verify the effectiveness of our proposed method.", + "arxiv_url": "http://arxiv.org/abs/2312.00846v1", + "pdf_url": "http://arxiv.org/pdf/2312.00846v1", + "published_date": "2023-12-01", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SparseGS: Real-Time 360° Sparse View Synthesis using Gaussian Splatting", + "authors": [ + "Haolin Xiong", + "Sairisheek Muttukuru", + "Rishi Upadhyay", + "Pradyumna Chari", + "Achuta Kadambi" + ], + "abstract": "The problem of novel view synthesis has grown significantly in popularity recently with the introduction of Neural Radiance Fields (NeRFs) and other implicit scene representation methods. A recent advance, 3D Gaussian Splatting (3DGS), leverages an explicit representation to achieve real-time rendering with high-quality results. However, 3DGS still requires an abundance of training views to generate a coherent scene representation. In few shot settings, similar to NeRF, 3DGS tends to overfit to training views, causing background collapse and excessive floaters, especially as the number of training views are reduced. We propose a method to enable training coherent 3DGS-based radiance fields of 360-degree scenes from sparse training views. We integrate depth priors with generative and explicit constraints to reduce background collapse, remove floaters, and enhance consistency from unseen viewpoints. Experiments show that our method outperforms base 3DGS by 6.4% in LPIPS and by 12.2% in PSNR, and NeRF-based methods by at least 17.6% in LPIPS on the MipNeRF-360 dataset with substantially less training and inference cost.", + "arxiv_url": "http://arxiv.org/abs/2312.00206v2", + "pdf_url": "http://arxiv.org/pdf/2312.00206v2", + "published_date": "2023-11-30", + "categories": [ + "cs.CV", + "cs.LG", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting", + "authors": [ + "Agelos Kratimenos", + "Jiahui Lei", + "Kostas Daniilidis" + ], + "abstract": "Accurately and efficiently modeling dynamic scenes and motions is considered so challenging a task due to temporal dynamics and motion complexity. To address these challenges, we propose DynMF, a compact and efficient representation that decomposes a dynamic scene into a few neural trajectories. We argue that the per-point motions of a dynamic scene can be decomposed into a small set of explicit or learned trajectories. Our carefully designed neural framework consisting of a tiny set of learned basis queried only in time allows for rendering speed similar to 3D Gaussian Splatting, surpassing 120 FPS, while at the same time, requiring only double the storage compared to static scenes. Our neural representation adequately constrains the inherently underconstrained motion field of a dynamic scene leading to effective and fast optimization. This is done by biding each point to motion coefficients that enforce the per-point sharing of basis trajectories. By carefully applying a sparsity loss to the motion coefficients, we are able to disentangle the motions that comprise the scene, independently control them, and generate novel motion combinations that have never been seen before. We can reach state-of-the-art render quality within just 5 minutes of training and in less than half an hour, we can synthesize novel views of dynamic scenes with superior photorealistic quality. Our representation is interpretable, efficient, and expressive enough to offer real-time view synthesis of complex dynamic scene motions, in monocular and multi-view scenarios.", + "arxiv_url": "http://arxiv.org/abs/2312.00112v1", + "pdf_url": "http://arxiv.org/pdf/2312.00112v1", + "published_date": "2023-11-30", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DeformGS: Scene Flow in Highly Deformable Scenes for Deformable Object Manipulation", + "authors": [ + "Bardienus P. Duisterhof", + "Zhao Mandi", + "Yunchao Yao", + "Jia-Wei Liu", + "Jenny Seidenschwarz", + "Mike Zheng Shou", + "Deva Ramanan", + "Shuran Song", + "Stan Birchfield", + "Bowen Wen", + "Jeffrey Ichnowski" + ], + "abstract": "Teaching robots to fold, drape, or reposition deformable objects such as cloth will unlock a variety of automation applications. While remarkable progress has been made for rigid object manipulation, manipulating deformable objects poses unique challenges, including frequent occlusions, infinite-dimensional state spaces and complex dynamics. Just as object pose estimation and tracking have aided robots for rigid manipulation, dense 3D tracking (scene flow) of highly deformable objects will enable new applications in robotics while aiding existing approaches, such as imitation learning or creating digital twins with real2sim transfer. We propose DeformGS, an approach to recover scene flow in highly deformable scenes, using simultaneous video captures of a dynamic scene from multiple cameras. DeformGS builds on recent advances in Gaussian splatting, a method that learns the properties of a large number of Gaussians for state-of-the-art and fast novel-view synthesis. DeformGS learns a deformation function to project a set of Gaussians with canonical properties into world space. The deformation function uses a neural-voxel encoding and a multilayer perceptron (MLP) to infer Gaussian position, rotation, and a shadow scalar. We enforce physics-inspired regularization terms based on conservation of momentum and isometry, which leads to trajectories with smaller trajectory errors. We also leverage existing foundation models SAM and XMEM to produce noisy masks, and learn a per-Gaussian mask for better physics-inspired regularization. DeformGS achieves high-quality 3D tracking on highly deformable scenes with shadows and occlusions. In experiments, DeformGS improves 3D tracking by an average of 55.8% compared to the state-of-the-art. With sufficient texture, DeformGS achieves a median tracking error of 3.3 mm on a cloth of 1.5 x 1.5 m in area. Website: https://deformgs.github.io", + "arxiv_url": "http://arxiv.org/abs/2312.00583v2", + "pdf_url": "http://arxiv.org/pdf/2312.00583v2", + "published_date": "2023-11-30", + "categories": [ + "cs.CV", + "cs.RO" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering", + "authors": [ + "Tao Lu", + "Mulin Yu", + "Linning Xu", + "Yuanbo Xiangli", + "Limin Wang", + "Dahua Lin", + "Bo Dai" + ], + "abstract": "Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to fit every training view, neglecting the underlying scene geometry. Consequently, the resulting model becomes less robust to significant view changes, texture-less area and lighting effects. We introduce Scaffold-GS, which uses anchor points to distribute local 3D Gaussians, and predicts their attributes on-the-fly based on viewing direction and distance within the view frustum. Anchor growing and pruning strategies are developed based on the importance of neural Gaussians to reliably improve the scene coverage. We show that our method effectively reduces redundant Gaussians while delivering high-quality rendering. We also demonstrates an enhanced capability to accommodate scenes with varying levels-of-detail and view-dependent observations, without sacrificing the rendering speed.", + "arxiv_url": "http://arxiv.org/abs/2312.00109v1", + "pdf_url": "http://arxiv.org/pdf/2312.00109v1", + "published_date": "2023-11-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Periodic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering", + "authors": [ + "Yurui Chen", + "Chun Gu", + "Junzhe Jiang", + "Xiatian Zhu", + "Li Zhang" + ], + "abstract": "Modeling dynamic, large-scale urban scenes is challenging due to their highly intricate geometric structures and unconstrained dynamics in both space and time. Prior methods often employ high-level architectural priors, separating static and dynamic elements, resulting in suboptimal capture of their synergistic interactions. To address this challenge, we present a unified representation model, called Periodic Vibration Gaussian (PVG). PVG builds upon the efficient 3D Gaussian splatting technique, originally designed for static scene representation, by introducing periodic vibration-based temporal dynamics. This innovation enables PVG to elegantly and uniformly represent the characteristics of various objects and elements in dynamic urban scenes. To enhance temporally coherent and large scene representation learning with sparse training data, we introduce a novel temporal smoothing mechanism and a position-aware adaptive control strategy respectively. Extensive experiments on Waymo Open Dataset and KITTI benchmarks demonstrate that PVG surpasses state-of-the-art alternatives in both reconstruction and novel view synthesis for both dynamic and static scenes. Notably, PVG achieves this without relying on manually labeled object bounding boxes or expensive optical flow estimation. Moreover, PVG exhibits 900-fold acceleration in rendering over the best alternative.", + "arxiv_url": "http://arxiv.org/abs/2311.18561v2", + "pdf_url": "http://arxiv.org/pdf/2311.18561v2", + "published_date": "2023-11-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding", + "authors": [ + "Jin-Chuan Shi", + "Miao Wang", + "Hao-Bin Duan", + "Shao-Hua Guan" + ], + "abstract": "Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as object localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their efficacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view synthesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaussians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the memory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our comprehensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU.", + "arxiv_url": "http://arxiv.org/abs/2311.18482v1", + "pdf_url": "http://arxiv.org/pdf/2311.18482v1", + "published_date": "2023-11-30", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization", + "authors": [ + "KL Navaneet", + "Kossar Pourahmadi Meibodi", + "Soroush Abbasi Koohpayegani", + "Hamed Pirsiavash" + ], + "abstract": "3D Gaussian Splatting (3DGS) is a new method for modeling and rendering 3D radiance fields that achieves much faster learning and rendering time compared to SOTA NeRF methods. However, it comes with a drawback in the much larger storage demand compared to NeRF methods since it needs to store the parameters for several 3D Gaussians. We notice that many Gaussians may share similar parameters, so we introduce a simple vector quantization method based on K-means to quantize the Gaussian parameters while optimizing them. Then, we store the small codebook along with the index of the code for each Gaussian. We compress the indices further by sorting them and using a method similar to run-length encoding. Moreover, we use a simple regularizer to encourage zero opacity (invisible Gaussians) to reduce the storage and rendering time by a large factor through reducing the number of Gaussians. We do extensive experiments on standard benchmarks as well as an existing 3D dataset that is an order of magnitude larger than the standard benchmarks used in this field. We show that our simple yet effective method can reduce the storage cost for 3DGS by 40 to 50x and rendering time by 2 to 3x with a very small drop in the quality of rendered images.", + "arxiv_url": "http://arxiv.org/abs/2311.18159v3", + "pdf_url": "http://arxiv.org/pdf/2311.18159v3", + "published_date": "2023-11-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HUGS: Human Gaussian Splats", + "authors": [ + "Muhammed Kocabas", + "Jen-Hao Rick Chang", + "James Gabriel", + "Oncel Tuzel", + "Anurag Ranjan" + ], + "abstract": "Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting (3DGS). Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes. We utilize the SMPL body model to initialize the human Gaussians. To capture details that are not modeled by SMPL (e.g. cloth, hairs), we allow the 3D Gaussians to deviate from the human body model. Utilizing 3D Gaussians for animated humans brings new challenges, including the artifacts created when articulating the Gaussians. We propose to jointly optimize the linear blend skinning weights to coordinate the movements of individual Gaussians during animation. Our approach enables novel-pose synthesis of human and novel view synthesis of both the human and the scene. We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work. Our code will be announced here: https://github.com/apple/ml-hugs", + "arxiv_url": "http://arxiv.org/abs/2311.17910v1", + "pdf_url": "http://arxiv.org/pdf/2311.17910v1", + "published_date": "2023-11-29", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "https://github.com/apple/ml-hugs", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting", + "authors": [ + "Alexander Vilesov", + "Pradyumna Chari", + "Achuta Kadambi" + ], + "abstract": "With the onset of diffusion-based generative models and their ability to generate text-conditioned images, content generation has received a massive invigoration. Recently, these models have been shown to provide useful guidance for the generation of 3D graphics assets. However, existing work in text-conditioned 3D generation faces fundamental constraints: (i) inability to generate detailed, multi-object scenes, (ii) inability to textually control multi-object configurations, and (iii) physically realistic scene composition. In this work, we propose CG3D, a method for compositionally generating scalable 3D assets that resolves these constraints. We find that explicit Gaussian radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes. By utilizing a guidance framework built around this explicit representation, we show state of the art results, capable of even exceeding the guiding diffusion model in terms of object combinations and physics accuracy.", + "arxiv_url": "http://arxiv.org/abs/2311.17907v1", + "pdf_url": "http://arxiv.org/pdf/2311.17907v1", + "published_date": "2023-11-29", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FisherRF: Active View Selection and Uncertainty Quantification for Radiance Fields using Fisher Information", + "authors": [ + "Wen Jiang", + "Boshu Lei", + "Kostas Daniilidis" + ], + "abstract": "This study addresses the challenging problem of active view selection and uncertainty quantification within the domain of Radiance Fields. Neural Radiance Fields (NeRF) have greatly advanced image rendering and reconstruction, but the limited availability of 2D images poses uncertainties stemming from occlusions, depth ambiguities, and imaging errors. Efficiently selecting informative views becomes crucial, and quantifying NeRF model uncertainty presents intricate challenges. Existing approaches either depend on model architecture or are based on assumptions regarding density distributions that are not generally applicable. By leveraging Fisher Information, we efficiently quantify observed information within Radiance Fields without ground truth data. This can be used for the next best view selection and pixel-wise uncertainty quantification. Our method overcomes existing limitations on model architecture and effectiveness, achieving state-of-the-art results in both view selection and uncertainty quantification, demonstrating its potential to advance the field of Radiance Fields. Our method with the 3D Gaussian Splatting backend could perform view selections at 70 fps.", + "arxiv_url": "http://arxiv.org/abs/2311.17874v1", + "pdf_url": "http://arxiv.org/pdf/2311.17874v1", + "published_date": "2023-11-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Gaussian Shell Maps for Efficient 3D Human Generation", + "authors": [ + "Rameen Abdal", + "Wang Yifan", + "Zifan Shi", + "Yinghao Xu", + "Ryan Po", + "Zhengfei Kuang", + "Qifeng Chen", + "Dit-Yan Yeung", + "Gordon Wetzstein" + ], + "abstract": "Efficient generation of 3D digital humans is important in several industries, including virtual reality, social media, and cinematic production. 3D generative adversarial networks (GANs) have demonstrated state-of-the-art (SOTA) quality and diversity for generated assets. Current 3D GAN architectures, however, typically rely on volume representations, which are slow to render, thereby hampering the GAN training and requiring multi-view-inconsistent 2D upsamplers. Here, we introduce Gaussian Shell Maps (GSMs) as a framework that connects SOTA generator network architectures with emerging 3D Gaussian rendering primitives using an articulable multi shell--based scaffold. In this setting, a CNN generates a 3D texture stack with features that are mapped to the shells. The latter represent inflated and deflated versions of a template surface of a digital human in a canonical body pose. Instead of rasterizing the shells directly, we sample 3D Gaussians on the shells whose attributes are encoded in the texture features. These Gaussians are efficiently and differentiably rendered. The ability to articulate the shells is important during GAN training and, at inference time, to deform a body into arbitrary user-defined poses. Our efficient rendering scheme bypasses the need for view-inconsistent upsamplers and achieves high-quality multi-view consistent renderings at a native resolution of $512 \\times 512$ pixels. We demonstrate that GSMs successfully generate 3D humans when trained on single-view datasets, including SHHQ and DeepFashion.", + "arxiv_url": "http://arxiv.org/abs/2311.17857v1", + "pdf_url": "http://arxiv.org/pdf/2311.17857v1", + "published_date": "2023-11-29", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianShader: 3D Gaussian Splatting with Shading Functions for Reflective Surfaces", + "authors": [ + "Yingwenqi Jiang", + "Jiadong Tu", + "Yuan Liu", + "Xifeng Gao", + "Xiaoxiao Long", + "Wenping Wang", + "Yuexin Ma" + ], + "abstract": "The advent of neural 3D Gaussians has recently brought about a revolution in the field of neural rendering, facilitating the generation of high-quality renderings at real-time speeds. However, the explicit and discrete representation encounters challenges when applied to scenes featuring reflective surfaces. In this paper, we present GaussianShader, a novel method that applies a simplified shading function on 3D Gaussians to enhance the neural rendering in scenes with reflective surfaces while preserving the training and rendering efficiency. The main challenge in applying the shading function lies in the accurate normal estimation on discrete 3D Gaussians. Specifically, we proposed a novel normal estimation framework based on the shortest axis directions of 3D Gaussians with a delicately designed loss to make the consistency between the normals and the geometries of Gaussian spheres. Experiments show that GaussianShader strikes a commendable balance between efficiency and visual quality. Our method surpasses Gaussian Splatting in PSNR on specular object datasets, exhibiting an improvement of 1.57dB. When compared to prior works handling reflective surfaces, such as Ref-NeRF, our optimization time is significantly accelerated (23h vs. 0.58h). Please click on our project website to see more results.", + "arxiv_url": "http://arxiv.org/abs/2311.17977v1", + "pdf_url": "http://arxiv.org/pdf/2311.17977v1", + "published_date": "2023-11-29", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LightGaussian: Unbounded 3D Gaussian Compression with 15x Reduction and 200+ FPS", + "authors": [ + "Zhiwen Fan", + "Kevin Wang", + "Kairun Wen", + "Zehao Zhu", + "Dejia Xu", + "Zhangyang Wang" + ], + "abstract": "Recent advances in real-time neural rendering using point-based techniques have enabled broader adoption of 3D representations. However, foundational approaches like 3D Gaussian Splatting impose substantial storage overhead, as Structure-from-Motion (SfM) points can grow to millions, often requiring gigabyte-level disk space for a single unbounded scene. This growth presents scalability challenges and hinders splatting efficiency. To address this, we introduce LightGaussian, a method for transforming 3D Gaussians into a more compact format. Inspired by Network Pruning, LightGaussian identifies Gaussians with minimal global significance on scene reconstruction, and applies a pruning and recovery process to reduce redundancy while preserving visual quality. Knowledge distillation and pseudo-view augmentation then transfer spherical harmonic coefficients to a lower degree, yielding compact representations. Gaussian Vector Quantization, based on each Gaussian's global significance, further lowers bitwidth with minimal accuracy loss. LightGaussian achieves an average 15x compression rate while boosting FPS from 144 to 237 within the 3D-GS framework, enabling efficient complex scene representation on the Mip-NeRF 360 and Tank & Temple datasets. The proposed Gaussian pruning approach is also adaptable to other 3D representations (e.g., Scaffold-GS), demonstrating strong generalization capabilities.", + "arxiv_url": "http://arxiv.org/abs/2311.17245v6", + "pdf_url": "http://arxiv.org/pdf/2311.17245v6", + "published_date": "2023-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting", + "authors": [ + "Xian Liu", + "Xiaohang Zhan", + "Jiaxiang Tang", + "Ying Shan", + "Gang Zeng", + "Dahua Lin", + "Xihui Liu", + "Ziwei Liu" + ], + "abstract": "Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. Project Page: https://alvinliu0.github.io/projects/HumanGaussian", + "arxiv_url": "http://arxiv.org/abs/2311.17061v2", + "pdf_url": "http://arxiv.org/pdf/2311.17061v2", + "published_date": "2023-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Point'n Move: Interactive Scene Object Manipulation on Gaussian Splatting Radiance Fields", + "authors": [ + "Jiajun Huang", + "Hongchuan Yu" + ], + "abstract": "We propose Point'n Move, a method that achieves interactive scene object manipulation with exposed region inpainting. Interactivity here further comes from intuitive object selection and real-time editing. To achieve this, we adopt Gaussian Splatting Radiance Field as the scene representation and fully leverage its explicit nature and speed advantage. Its explicit representation formulation allows us to devise a 2D prompt points to 3D mask dual-stage self-prompting segmentation algorithm, perform mask refinement and merging, minimize change as well as provide good initialization for scene inpainting and perform editing in real-time without per-editing training, all leads to superior quality and performance. We test our method by performing editing on both forward-facing and 360 scenes. We also compare our method against existing scene object removal methods, showing superior quality despite being more capable and having a speed advantage.", + "arxiv_url": "http://arxiv.org/abs/2311.16737v1", + "pdf_url": "http://arxiv.org/pdf/2311.16737v1", + "published_date": "2023-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Human Gaussian Splatting: Real-time Rendering of Animatable Avatars", + "authors": [ + "Arthur Moreau", + "Jifei Song", + "Helisa Dhamo", + "Richard Shaw", + "Yiren Zhou", + "Eduardo Pérez-Pellitero" + ], + "abstract": "This work addresses the problem of real-time rendering of photorealistic human body avatars learned from multi-view videos. While the classical approaches to model and render virtual humans generally use a textured mesh, recent research has developed neural body representations that achieve impressive visual quality. However, these models are difficult to render in real-time and their quality degrades when the character is animated with body poses different than the training observations. We propose an animatable human model based on 3D Gaussian Splatting, that has recently emerged as a very efficient alternative to neural radiance fields. The body is represented by a set of gaussian primitives in a canonical space which is deformed with a coarse to fine approach that combines forward skinning and local non-rigid refinement. We describe how to learn our Human Gaussian Splatting (HuGS) model in an end-to-end fashion from multi-view observations, and evaluate it against the state-of-the-art approaches for novel pose synthesis of clothed body. Our method achieves 1.5 dB PSNR improvement over the state-of-the-art on THuman4 dataset while being able to render in real-time (80 fps for 512x512 resolution).", + "arxiv_url": "http://arxiv.org/abs/2311.17113v2", + "pdf_url": "http://arxiv.org/pdf/2311.17113v2", + "published_date": "2023-11-28", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering", + "authors": [ + "Zhiwen Yan", + "Weng Fei Low", + "Yu Chen", + "Gim Hee Lee" + ], + "abstract": "3D Gaussians have recently emerged as a highly efficient representation for 3D reconstruction and rendering. Despite its high rendering quality and speed at high resolutions, they both deteriorate drastically when rendered at lower resolutions or from far away camera position. During low resolution or far away rendering, the pixel size of the image can fall below the Nyquist frequency compared to the screen size of each splatted 3D Gaussian and leads to aliasing effect. The rendering is also drastically slowed down by the sequential alpha blending of more splatted Gaussians per pixel. To address these issues, we propose a multi-scale 3D Gaussian splatting algorithm, which maintains Gaussians at different scales to represent the same scene. Higher-resolution images are rendered with more small Gaussians, and lower-resolution images are rendered with fewer larger Gaussians. With similar training time, our algorithm can achieve 13\\%-66\\% PSNR and 160\\%-2400\\% rendering speed improvement at 4$\\times$-128$\\times$ scale rendering on Mip-NeRF360 dataset compared to the single scale 3D Gaussian splitting. Our code and more results are available on our project website https://jokeryan.github.io/projects/ms-gs/", + "arxiv_url": "http://arxiv.org/abs/2311.17089v2", + "pdf_url": "http://arxiv.org/pdf/2311.17089v2", + "published_date": "2023-11-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GART: Gaussian Articulated Template Models", + "authors": [ + "Jiahui Lei", + "Yufu Wang", + "Georgios Pavlakos", + "Lingjie Liu", + "Kostas Daniilidis" + ], + "abstract": "We introduce Gaussian Articulated Template Model GART, an explicit, efficient, and expressive representation for non-rigid articulated subject capturing and rendering from monocular videos. GART utilizes a mixture of moving 3D Gaussians to explicitly approximate a deformable subject's geometry and appearance. It takes advantage of a categorical template model prior (SMPL, SMAL, etc.) with learnable forward skinning while further generalizing to more complex non-rigid deformations with novel latent bones. GART can be reconstructed via differentiable rendering from monocular videos in seconds or minutes and rendered in novel poses faster than 150fps.", + "arxiv_url": "http://arxiv.org/abs/2311.16099v1", + "pdf_url": "http://arxiv.org/pdf/2311.16099v1", + "published_date": "2023-11-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Animatable and Relightable Gaussians for High-fidelity Human Avatar Modeling", + "authors": [ + "Zhe Li", + "Yipengjing Sun", + "Zerong Zheng", + "Lizhen Wang", + "Shengping Zhang", + "Yebin Liu" + ], + "abstract": "Modeling animatable human avatars from RGB videos is a long-standing and challenging problem. Recent works usually adopt MLP-based neural radiance fields (NeRF) to represent 3D humans, but it remains difficult for pure MLPs to regress pose-dependent garment details. To this end, we introduce Animatable Gaussians, a new avatar representation that leverages powerful 2D CNNs and 3D Gaussian splatting to create high-fidelity avatars. To associate 3D Gaussians with the animatable avatar, we learn a parametric template from the input videos, and then parameterize the template on two front & back canonical Gaussian maps where each pixel represents a 3D Gaussian. The learned template is adaptive to the wearing garments for modeling looser clothes like dresses. Such template-guided 2D parameterization enables us to employ a powerful StyleGAN-based CNN to learn the pose-dependent Gaussian maps for modeling detailed dynamic appearances. Furthermore, we introduce a pose projection strategy for better generalization given novel poses. To tackle the realistic relighting of animatable avatars, we introduce physically-based rendering into the avatar representation for decomposing avatar materials and environment illumination. Overall, our method can create lifelike avatars with dynamic, realistic, generalized and relightable appearances. Experiments show that our method outperforms other state-of-the-art approaches.", + "arxiv_url": "http://arxiv.org/abs/2311.16096v4", + "pdf_url": "http://arxiv.org/pdf/2311.16096v4", + "published_date": "2023-11-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing", + "authors": [ + "Jian Gao", + "Chun Gu", + "Youtian Lin", + "Zhihao Li", + "Hao Zhu", + "Xun Cao", + "Li Zhang", + "Yao Yao" + ], + "abstract": "In this paper, we present a novel differentiable point-based rendering framework to achieve photo-realistic relighting. To make the reconstructed scene relightable, we enhance vanilla 3D Gaussians by associating extra properties, including normal vectors, BRDF parameters, and incident lighting from various directions. From a collection of multi-view images, the 3D scene is optimized through 3D Gaussian Splatting while BRDF and lighting are decomposed by physically based differentiable rendering. To produce plausible shadow effects in photo-realistic relighting, we introduce an innovative point-based ray tracing with the bounding volume hierarchies for efficient visibility pre-computation. Extensive experiments demonstrate our improved BRDF estimation, novel view synthesis and relighting results compared to state-of-the-art approaches. The proposed framework showcases the potential to revolutionize the mesh-based graphics pipeline with a point-based pipeline enabling editing, tracing, and relighting.", + "arxiv_url": "http://arxiv.org/abs/2311.16043v2", + "pdf_url": "http://arxiv.org/pdf/2311.16043v2", + "published_date": "2023-11-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions", + "authors": [ + "Junjie Wang", + "Jiemin Fang", + "Xiaopeng Zhang", + "Lingxi Xie", + "Qi Tian" + ], + "abstract": "Recently, impressive results have been achieved in 3D scene editing with text instructions based on a 2D diffusion model. However, current diffusion models primarily generate images by predicting noise in the latent space, and the editing is usually applied to the whole image, which makes it challenging to perform delicate, especially localized, editing for 3D scenes. Inspired by recent 3D Gaussian splatting, we propose a systematic framework, named GaussianEditor, to edit 3D scenes delicately via 3D Gaussians with text instructions. Benefiting from the explicit property of 3D Gaussians, we design a series of techniques to achieve delicate editing. Specifically, we first extract the region of interest (RoI) corresponding to the text instruction, aligning it to 3D Gaussians. The Gaussian RoI is further used to control the editing process. Our framework can achieve more delicate and precise editing of 3D scenes than previous methods while enjoying much faster training speed, i.e. within 20 minutes on a single V100 GPU, more than twice as fast as Instruct-NeRF2NeRF (45 minutes -- 2 hours).", + "arxiv_url": "http://arxiv.org/abs/2311.16037v2", + "pdf_url": "http://arxiv.org/pdf/2311.16037v2", + "published_date": "2023-11-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Mip-Splatting: Alias-free 3D Gaussian Splatting", + "authors": [ + "Zehao Yu", + "Anpei Chen", + "Binbin Huang", + "Torsten Sattler", + "Andreas Geiger" + ], + "abstract": "Recently, 3D Gaussian Splatting has demonstrated impressive novel view synthesis results, reaching high fidelity and efficiency. However, strong artifacts can be observed when changing the sampling rate, \\eg, by changing focal length or camera distance. We find that the source for this phenomenon can be attributed to the lack of 3D frequency constraints and the usage of a 2D dilation filter. To address this problem, we introduce a 3D smoothing filter which constrains the size of the 3D Gaussian primitives based on the maximal sampling frequency induced by the input views, eliminating high-frequency artifacts when zooming in. Moreover, replacing 2D dilation with a 2D Mip filter, which simulates a 2D box filter, effectively mitigates aliasing and dilation issues. Our evaluation, including scenarios such a training on single-scale images and testing on multiple scales, validates the effectiveness of our approach.", + "arxiv_url": "http://arxiv.org/abs/2311.16493v1", + "pdf_url": "http://arxiv.org/pdf/2311.16493v1", + "published_date": "2023-11-27", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Animatable 3D Gaussian: Fast and High-Quality Reconstruction of Multiple Human Avatars", + "authors": [ + "Yang Liu", + "Xiang Huang", + "Minghan Qin", + "Qinwei Lin", + "Haoqian Wang" + ], + "abstract": "Neural radiance fields are capable of reconstructing high-quality drivable human avatars but are expensive to train and render and not suitable for multi-human scenes with complex shadows. To reduce consumption, we propose Animatable 3D Gaussian, which learns human avatars from input images and poses. We extend 3D Gaussians to dynamic human scenes by modeling a set of skinned 3D Gaussians and a corresponding skeleton in canonical space and deforming 3D Gaussians to posed space according to the input poses. We introduce a multi-head hash encoder for pose-dependent shape and appearance and a time-dependent ambient occlusion module to achieve high-quality reconstructions in scenes containing complex motions and dynamic shadows. On both novel view synthesis and novel pose synthesis tasks, our method achieves higher reconstruction quality than InstantAvatar with less training time (1/60), less GPU memory (1/4), and faster rendering speed (7x). Our method can be easily extended to multi-human scenes and achieve comparable novel view synthesis results on a scene with ten people in only 25 seconds of training.", + "arxiv_url": "http://arxiv.org/abs/2311.16482v3", + "pdf_url": "http://arxiv.org/pdf/2311.16482v3", + "published_date": "2023-11-27", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-IR: 3D Gaussian Splatting for Inverse Rendering", + "authors": [ + "Zhihao Liang", + "Qi Zhang", + "Ying Feng", + "Ying Shan", + "Kui Jia" + ], + "abstract": "We propose GS-IR, a novel inverse rendering approach based on 3D Gaussian Splatting (GS) that leverages forward mapping volume rendering to achieve photorealistic novel view synthesis and relighting results. Unlike previous works that use implicit neural representations and volume rendering (e.g. NeRF), which suffer from low expressive power and high computational complexity, we extend GS, a top-performance representation for novel view synthesis, to estimate scene geometry, surface material, and environment illumination from multi-view images captured under unknown lighting conditions. There are two main problems when introducing GS to inverse rendering: 1) GS does not support producing plausible normal natively; 2) forward mapping (e.g. rasterization and splatting) cannot trace the occlusion like backward mapping (e.g. ray tracing). To address these challenges, our GS-IR proposes an efficient optimization scheme that incorporates a depth-derivation-based regularization for normal estimation and a baking-based occlusion to model indirect lighting. The flexible and expressive GS representation allows us to achieve fast and compact geometry reconstruction, photorealistic novel view synthesis, and effective physically-based rendering. We demonstrate the superiority of our method over baseline methods through qualitative and quantitative evaluations on various challenging scenes.", + "arxiv_url": "http://arxiv.org/abs/2311.16473v3", + "pdf_url": "http://arxiv.org/pdf/2311.16473v3", + "published_date": "2023-11-26", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting", + "authors": [ + "Yiwen Chen", + "Zilong Chen", + "Chi Zhang", + "Feng Wang", + "Xiaofeng Yang", + "Yikai Wang", + "Zhongang Cai", + "Lei Yang", + "Huaping Liu", + "Guosheng Lin" + ], + "abstract": "3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, an innovative and efficient 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation. GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing, which traces the editing target throughout the training process. Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, efficacy, and rapid performance, marking a significant advancement in 3D editing. Project Page: https://buaacyw.github.io/gaussian-editor/", + "arxiv_url": "http://arxiv.org/abs/2311.14521v4", + "pdf_url": "http://arxiv.org/pdf/2311.14521v4", + "published_date": "2023-11-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Compact 3D Gaussian Representation for Radiance Field", + "authors": [ + "Joo Chan Lee", + "Daniel Rho", + "Xiangyu Sun", + "Jong Hwan Ko", + "Eunbyung Park" + ], + "abstract": "Neural Radiance Fields (NeRFs) have demonstrated remarkable potential in capturing complex 3D scenes with high fidelity. However, one persistent challenge that hinders the widespread adoption of NeRFs is the computational bottleneck due to the volumetric rendering. On the other hand, 3D Gaussian splatting (3DGS) has recently emerged as an alternative representation that leverages a 3D Gaussisan-based representation and adopts the rasterization pipeline to render the images rather than volumetric rendering, achieving very fast rendering speed and promising image quality. However, a significant drawback arises as 3DGS entails a substantial number of 3D Gaussians to maintain the high fidelity of the rendered images, which requires a large amount of memory and storage. To address this critical issue, we place a specific emphasis on two key objectives: reducing the number of Gaussian points without sacrificing performance and compressing the Gaussian attributes, such as view-dependent color and covariance. To this end, we propose a learnable mask strategy that significantly reduces the number of Gaussians while preserving high performance. In addition, we propose a compact but effective representation of view-dependent color by employing a grid-based neural field rather than relying on spherical harmonics. Finally, we learn codebooks to compactly represent the geometric attributes of Gaussian by vector quantization. With model compression techniques such as quantization and entropy coding, we consistently show over 25$\\times$ reduced storage and enhanced rendering speed, while maintaining the quality of the scene representation, compared to 3DGS. Our work provides a comprehensive framework for 3D scene representation, achieving high performance, fast training, compactness, and real-time rendering. Our project page is available at https://maincold2.github.io/c3dgs/.", + "arxiv_url": "http://arxiv.org/abs/2311.13681v2", + "pdf_url": "http://arxiv.org/pdf/2311.13681v2", + "published_date": "2023-11-22", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Animatable 3D Gaussians for High-fidelity Synthesis of Human Motions", + "authors": [ + "Keyang Ye", + "Tianjia Shao", + "Kun Zhou" + ], + "abstract": "We present a novel animatable 3D Gaussian model for rendering high-fidelity free-view human motions in real time. Compared to existing NeRF-based methods, the model owns better capability in synthesizing high-frequency details without the jittering problem across video frames. The core of our model is a novel augmented 3D Gaussian representation, which attaches each Gaussian with a learnable code. The learnable code serves as a pose-dependent appearance embedding for refining the erroneous appearance caused by geometric transformation of Gaussians, based on which an appearance refinement model is learned to produce residual Gaussian properties to match the appearance in target pose. To force the Gaussians to learn the foreground human only without background interference, we further design a novel alpha loss to explicitly constrain the Gaussians within the human body. We also propose to jointly optimize the human joint parameters to improve the appearance accuracy. The animatable 3D Gaussian model can be learned with shallow MLPs, so new human motions can be synthesized in real time (66 fps on avarage). Experiments show that our model has superior performance over NeRF-based methods.", + "arxiv_url": "http://arxiv.org/abs/2311.13404v2", + "pdf_url": "http://arxiv.org/pdf/2311.13404v2", + "published_date": "2023-11-22", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Depth-Regularized Optimization for 3D Gaussian Splatting in Few-Shot Images", + "authors": [ + "Jaeyoung Chung", + "Jeongtaek Oh", + "Kyoung Mu Lee" + ], + "abstract": "In this paper, we present a method to optimize Gaussian splatting with a limited number of images while avoiding overfitting. Representing a 3D scene by combining numerous Gaussian splats has yielded outstanding visual quality. However, it tends to overfit the training views when only a small number of images are available. To address this issue, we introduce a dense depth map as a geometry guide to mitigate overfitting. We obtained the depth map using a pre-trained monocular depth estimation model and aligning the scale and offset using sparse COLMAP feature points. The adjusted depth aids in the color-based optimization of 3D Gaussian splatting, mitigating floating artifacts, and ensuring adherence to geometric constraints. We verify the proposed method on the NeRF-LLFF dataset with varying numbers of few images. Our approach demonstrates robust geometry compared to the original method that relies solely on images. Project page: robot0321.github.io/DepthRegGS", + "arxiv_url": "http://arxiv.org/abs/2311.13398v3", + "pdf_url": "http://arxiv.org/pdf/2311.13398v3", + "published_date": "2023-11-22", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes", + "authors": [ + "Jaeyoung Chung", + "Suyoung Lee", + "Hyeongjin Nam", + "Jaerin Lee", + "Kyoung Mu Lee" + ], + "abstract": "With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: https://luciddreamer-cvlab.github.io/", + "arxiv_url": "http://arxiv.org/abs/2311.13384v2", + "pdf_url": "http://arxiv.org/pdf/2311.13384v2", + "published_date": "2023-11-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering", + "authors": [ + "Antoine Guédon", + "Vincent Lepetit" + ], + "abstract": "We propose a method to allow precise and extremely fast mesh extraction from 3D Gaussian Splatting. Gaussian Splatting has recently become very popular as it yields realistic rendering while being significantly faster to train than NeRFs. It is however challenging to extract a mesh from the millions of tiny 3D gaussians as these gaussians tend to be unorganized after optimization and no method has been proposed so far. Our first key contribution is a regularization term that encourages the gaussians to align well with the surface of the scene. We then introduce a method that exploits this alignment to extract a mesh from the Gaussians using Poisson reconstruction, which is fast, scalable, and preserves details, in contrast to the Marching Cubes algorithm usually applied to extract meshes from Neural SDFs. Finally, we introduce an optional refinement strategy that binds gaussians to the surface of the mesh, and jointly optimizes these Gaussians and the mesh through Gaussian splatting rendering. This enables easy editing, sculpting, rigging, animating, compositing and relighting of the Gaussians using traditional softwares by manipulating the mesh instead of the gaussians themselves. Retrieving such an editable mesh for realistic rendering is done within minutes with our method, compared to hours with the state-of-the-art methods on neural SDFs, while providing a better rendering quality. Our project page is the following: https://anttwo.github.io/sugar/", + "arxiv_url": "http://arxiv.org/abs/2311.12775v3", + "pdf_url": "http://arxiv.org/pdf/2311.12775v3", + "published_date": "2023-11-21", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis", + "authors": [ + "Kai Katsumata", + "Duc Minh Vo", + "Hideki Nakayama" + ], + "abstract": "3D Gaussian Splatting (3DGS) has shown remarkable success in synthesizing novel views given multiple views of a static scene. Yet, 3DGS faces challenges when applied to dynamic scenes because 3D Gaussian parameters need to be updated per timestep, requiring a large amount of memory and at least a dozen observations per timestep. To address these limitations, we present a compact dynamic 3D Gaussian representation that models positions and rotations as functions of time with a few parameter approximations while keeping other properties of 3DGS including scale, color and opacity invariant. Our method can dramatically reduce memory usage and relax a strict multi-view assumption. In our experiments on monocular and multi-view scenarios, we show that our method not only matches state-of-the-art methods, often linked with slower rendering speeds, in terms of high rendering quality but also significantly surpasses them by achieving a rendering speed of $118$ frames per second (FPS) at a resolution of 1,352$\\times$1,014 on a single GPU.", + "arxiv_url": "http://arxiv.org/abs/2311.12897v2", + "pdf_url": "http://arxiv.org/pdf/2311.12897v2", + "published_date": "2023-11-21", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics", + "authors": [ + "Tianyi Xie", + "Zeshun Zong", + "Yuxing Qiu", + "Xuan Li", + "Yutao Feng", + "Yin Yang", + "Chenfanfu Jiang" + ], + "abstract": "We introduce PhysGaussian, a new method that seamlessly integrates physically grounded Newtonian dynamics within 3D Gaussians to achieve high-quality novel motion synthesis. Employing a custom Material Point Method (MPM), our approach enriches 3D Gaussian kernels with physically meaningful kinematic deformation and mechanical stress attributes, all evolved in line with continuum mechanics principles. A defining characteristic of our method is the seamless integration between physical simulation and visual rendering: both components utilize the same 3D Gaussian kernels as their discrete representations. This negates the necessity for triangle/tetrahedron meshing, marching cubes, \"cage meshes,\" or any other geometry embedding, highlighting the principle of \"what you see is what you simulate (WS$^2$).\" Our method demonstrates exceptional versatility across a wide variety of materials--including elastic entities, metals, non-Newtonian fluids, and granular materials--showcasing its strong capabilities in creating diverse visual content with novel viewpoints and movements. Our project page is at: https://xpandora.github.io/PhysGaussian/", + "arxiv_url": "http://arxiv.org/abs/2311.12198v3", + "pdf_url": "http://arxiv.org/pdf/2311.12198v3", + "published_date": "2023-11-20", + "categories": [ + "cs.GR", + "cs.AI", + "cs.CV", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting", + "authors": [ + "Chi Yan", + "Delin Qu", + "Dan Xu", + "Bin Zhao", + "Zhigang Wang", + "Dong Wang", + "Xuelong Li" + ], + "abstract": "In this paper, we introduce \\textbf{GS-SLAM} that first utilizes 3D Gaussian representation in the Simultaneous Localization and Mapping (SLAM) system. It facilitates a better balance between efficiency and accuracy. Compared to recent SLAM methods employing neural implicit representations, our method utilizes a real-time differentiable splatting rendering pipeline that offers significant speedup to map optimization and RGB-D rendering. Specifically, we propose an adaptive expansion strategy that adds new or deletes noisy 3D Gaussians in order to efficiently reconstruct new observed scene geometry and improve the mapping of previously observed areas. This strategy is essential to extend 3D Gaussian representation to reconstruct the whole scene rather than synthesize a static object in existing methods. Moreover, in the pose tracking process, an effective coarse-to-fine technique is designed to select reliable 3D Gaussian representations to optimize camera pose, resulting in runtime reduction and robust estimation. Our method achieves competitive performance compared with existing state-of-the-art real-time methods on the Replica, TUM-RGBD datasets. Project page: https://gs-slam.github.io/.", + "arxiv_url": "http://arxiv.org/abs/2311.11700v4", + "pdf_url": "http://arxiv.org/pdf/2311.11700v4", + "published_date": "2023-11-20", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching", + "authors": [ + "Yixun Liang", + "Xin Yang", + "Jiantao Lin", + "Haodong Li", + "Xiaogang Xu", + "Yingcong Chen" + ], + "abstract": "The recent advancements in text-to-3D generation mark a significant milestone in generative models, unlocking new possibilities for creating imaginative 3D assets across various real-world scenarios. While recent advancements in text-to-3D generation have shown promise, they often fall short in rendering detailed and high-quality 3D models. This problem is especially prevalent as many methods base themselves on Score Distillation Sampling (SDS). This paper identifies a notable deficiency in SDS, that it brings inconsistent and low-quality updating direction for the 3D model, causing the over-smoothing effect. To address this, we propose a novel approach called Interval Score Matching (ISM). ISM employs deterministic diffusing trajectories and utilizes interval-based score matching to counteract over-smoothing. Furthermore, we incorporate 3D Gaussian Splatting into our text-to-3D generation pipeline. Extensive experiments show that our model largely outperforms the state-of-the-art in quality and training efficiency.", + "arxiv_url": "http://arxiv.org/abs/2311.11284v3", + "pdf_url": "http://arxiv.org/pdf/2311.11284v3", + "published_date": "2023-11-19", + "categories": [ + "cs.CV", + "cs.GR", + "cs.MM" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianDiffusion: 3D Gaussian Splatting for Denoising Diffusion Probabilistic Models with Structured Noise", + "authors": [ + "Xinhai Li", + "Huaibin Wang", + "Kuo-Kun Tseng" + ], + "abstract": "Text-to-3D, known for its efficient generation methods and expansive creative potential, has garnered significant attention in the AIGC domain. However, the pixel-wise rendering of NeRF and its ray marching light sampling constrain the rendering speed, impacting its utility in downstream industrial applications. Gaussian Splatting has recently shown a trend of replacing the traditional pointwise sampling technique commonly used in NeRF-based methodologies, and it is changing various aspects of 3D reconstruction. This paper introduces a novel text to 3D content generation framework, Gaussian Diffusion, based on Gaussian Splatting and produces more realistic renderings. The challenge of achieving multi-view consistency in 3D generation significantly impedes modeling complexity and accuracy. Taking inspiration from SJC, we explore employing multi-view noise distributions to perturb images generated by 3D Gaussian Splatting, aiming to rectify inconsistencies in multi-view geometry. We ingeniously devise an efficient method to generate noise that produces Gaussian noise from diverse viewpoints, all originating from a shared noise source. Furthermore, vanilla 3D Gaussian-based generation tends to trap models in local minima, causing artifacts like floaters, burrs, or proliferative elements. To mitigate these issues, we propose the variational Gaussian Splatting technique to enhance the quality and stability of 3D appearance. To our knowledge, our approach represents the first comprehensive utilization of Gaussian Diffusion across the entire spectrum of 3D content generation processes.", + "arxiv_url": "http://arxiv.org/abs/2311.11221v3", + "pdf_url": "http://arxiv.org/pdf/2311.11221v3", + "published_date": "2023-11-19", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "SplatArmor: Articulated Gaussian splatting for animatable humans from monocular RGB videos", + "authors": [ + "Rohit Jena", + "Ganesh Subramanian Iyer", + "Siddharth Choudhary", + "Brandon Smith", + "Pratik Chaudhari", + "James Gee" + ], + "abstract": "We propose SplatArmor, a novel approach for recovering detailed and animatable human models by `armoring' a parameterized body model with 3D Gaussians. Our approach represents the human as a set of 3D Gaussians within a canonical space, whose articulation is defined by extending the skinning of the underlying SMPL geometry to arbitrary locations in the canonical space. To account for pose-dependent effects, we introduce a SE(3) field, which allows us to capture both the location and anisotropy of the Gaussians. Furthermore, we propose the use of a neural color field to provide color regularization and 3D supervision for the precise positioning of these Gaussians. We show that Gaussian splatting provides an interesting alternative to neural rendering based methods by leverging a rasterization primitive without facing any of the non-differentiability and optimization challenges typically faced in such approaches. The rasterization paradigms allows us to leverage forward skinning, and does not suffer from the ambiguities associated with inverse skinning and warping. We show compelling results on the ZJU MoCap and People Snapshot datasets, which underscore the effectiveness of our method for controllable human synthesis.", + "arxiv_url": "http://arxiv.org/abs/2311.10812v1", + "pdf_url": "http://arxiv.org/pdf/2311.10812v1", + "published_date": "2023-11-17", + "categories": [ + "cs.CV", + "cs.GR", + "cs.LG" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "neural rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis", + "authors": [ + "Simon Niedermayr", + "Josef Stumpfegger", + "Rüdiger Westermann" + ], + "abstract": "Recently, high-fidelity scene reconstruction with an optimized 3D Gaussian splat representation has been introduced for novel view synthesis from sparse image sets. Making such representations suitable for applications like network streaming and rendering on low-power devices requires significantly reduced memory consumption as well as improved rendering efficiency. We propose a compressed 3D Gaussian splat representation that utilizes sensitivity-aware vector clustering with quantization-aware training to compress directional colors and Gaussian parameters. The learned codebooks have low bitrates and achieve a compression rate of up to $31\\times$ on real-world scenes with only minimal degradation of visual quality. We demonstrate that the compressed splat representation can be efficiently rendered with hardware rasterization on lightweight GPUs at up to $4\\times$ higher framerates than reported via an optimized GPU compute pipeline. Extensive experiments across multiple datasets demonstrate the robustness and rendering speed of the proposed approach.", + "arxiv_url": "http://arxiv.org/abs/2401.02436v2", + "pdf_url": "http://arxiv.org/pdf/2401.02436v2", + "published_date": "2023-11-17", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Drivable 3D Gaussian Avatars", + "authors": [ + "Wojciech Zielonka", + "Timur Bagautdinov", + "Shunsuke Saito", + "Michael Zollhöfer", + "Justus Thies", + "Javier Romero" + ], + "abstract": "We present Drivable 3D Gaussian Avatars (D3GA), the first 3D controllable model for human bodies rendered with Gaussian splats. Current photorealistic drivable avatars require either accurate 3D registrations during training, dense input images during testing, or both. The ones based on neural radiance fields also tend to be prohibitively slow for telepresence applications. This work uses the recently presented 3D Gaussian Splatting (3DGS) technique to render realistic humans at real-time framerates, using dense calibrated multi-view videos as input. To deform those primitives, we depart from the commonly used point deformation method of linear blend skinning (LBS) and use a classic volumetric deformation method: cage deformations. Given their smaller size, we drive these deformations with joint angles and keypoints, which are more suitable for communication applications. Our experiments on nine subjects with varied body shapes, clothes, and motions obtain higher-quality results than state-of-the-art methods when using the same training and test data.", + "arxiv_url": "http://arxiv.org/abs/2311.08581v1", + "pdf_url": "http://arxiv.org/pdf/2311.08581v1", + "published_date": "2023-11-14", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dynamic Gaussian Splatting from Markerless Motion Capture can Reconstruct Infants Movements", + "authors": [ + "R. James Cotton", + "Colleen Peyton" + ], + "abstract": "Easy access to precise 3D tracking of movement could benefit many aspects of rehabilitation. A challenge to achieving this goal is that while there are many datasets and pretrained algorithms for able-bodied adults, algorithms trained on these datasets often fail to generalize to clinical populations including people with disabilities, infants, and neonates. Reliable movement analysis of infants and neonates is important as spontaneous movement behavior is an important indicator of neurological function and neurodevelopmental disability, which can help guide early interventions. We explored the application of dynamic Gaussian splatting to sparse markerless motion capture (MMC) data. Our approach leverages semantic segmentation masks to focus on the infant, significantly improving the initialization of the scene. Our results demonstrate the potential of this method in rendering novel views of scenes and tracking infant movements. This work paves the way for advanced movement analysis tools that can be applied to diverse clinical populations, with a particular emphasis on early detection in infants.", + "arxiv_url": "http://arxiv.org/abs/2310.19441v1", + "pdf_url": "http://arxiv.org/pdf/2310.19441v1", + "published_date": "2023-10-30", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting", + "authors": [ + "Zeyu Yang", + "Hongye Yang", + "Zijie Pan", + "Li Zhang" + ], + "abstract": "Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time is challenging due to scene complexity and temporal dynamics. Despite advancements in neural implicit models, limitations persist: (i) Inadequate Scene Structure: Existing methods struggle to reveal the spatial and temporal structure of dynamic scenes from directly learning the complex 6D plenoptic function. (ii) Scaling Deformation Modeling: Explicitly modeling scene element deformation becomes impractical for complex dynamics. To address these issues, we consider the spacetime as an entirety and propose to approximate the underlying spatio-temporal 4D volume of a dynamic scene by optimizing a collection of 4D primitives, with explicit geometry and appearance modeling. Learning to optimize the 4D primitives enables us to synthesize novel views at any desired time with our tailored rendering routine. Our model is conceptually simple, consisting of a 4D Gaussian parameterized by anisotropic ellipses that can rotate arbitrarily in space and time, as well as view-dependent and time-evolved appearance represented by the coefficient of 4D spherindrical harmonics. This approach offers simplicity, flexibility for variable-length video and end-to-end training, and efficient real-time rendering, making it suitable for capturing complex dynamic scene motions. Experiments across various benchmarks, including monocular and multi-view scenarios, demonstrate our 4DGS model's superior visual quality and efficiency.", + "arxiv_url": "http://arxiv.org/abs/2310.10642v3", + "pdf_url": "http://arxiv.org/pdf/2310.10642v3", + "published_date": "2023-10-16", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models", + "authors": [ + "Taoran Yi", + "Jiemin Fang", + "Junjie Wang", + "Guanjun Wu", + "Lingxi Xie", + "Xiaopeng Zhang", + "Wenyu Liu", + "Qi Tian", + "Xinggang Wang" + ], + "abstract": "In recent times, the generation of 3D assets from text prompts has shown impressive results. Both 2D and 3D diffusion models can help generate decent 3D objects based on prompts. 3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain. 2D diffusion models enjoy strong abilities of generalization and fine generation, but 3D consistency is hard to guarantee. This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation. A fast 3D object generation framework, named as GaussianDreamer, is proposed, where the 3D diffusion model provides priors for initialization and the 2D diffusion model enriches the geometry and appearance. Operations of noisy point growing and color perturbation are introduced to enhance the initialized Gaussians. Our GaussianDreamer can generate a high-quality 3D instance or 3D avatar within 15 minutes on one GPU, much faster than previous methods, while the generated instances can be directly rendered in real time. Demos and code are available at https://taoranyi.com/gaussiandreamer/.", + "arxiv_url": "http://arxiv.org/abs/2310.08529v3", + "pdf_url": "http://arxiv.org/pdf/2310.08529v3", + "published_date": "2023-10-12", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "4D Gaussian Splatting for Real-Time Dynamic Scene Rendering", + "authors": [ + "Guanjun Wu", + "Taoran Yi", + "Jiemin Fang", + "Lingxi Xie", + "Xiaopeng Zhang", + "Wei Wei", + "Wenyu Liu", + "Qi Tian", + "Xinggang Wang" + ], + "abstract": "Representing and rendering dynamic scenes has been an important but challenging task. Especially, to accurately model complex motions, high efficiency is usually hard to guarantee. To achieve real-time dynamic scene rendering while also enjoying high training and storage efficiency, we propose 4D Gaussian Splatting (4D-GS) as a holistic representation for dynamic scenes rather than applying 3D-GS for each individual frame. In 4D-GS, a novel explicit representation containing both 3D Gaussians and 4D neural voxels is proposed. A decomposed neural voxel encoding algorithm inspired by HexPlane is proposed to efficiently build Gaussian features from 4D neural voxels and then a lightweight MLP is applied to predict Gaussian deformations at novel timestamps. Our 4D-GS method achieves real-time rendering under high resolutions, 82 FPS at an 800$\\times$800 resolution on an RTX 3090 GPU while maintaining comparable or better quality than previous state-of-the-art methods. More demos and code are available at https://guanjunwu.github.io/4dgs/.", + "arxiv_url": "http://arxiv.org/abs/2310.08528v3", + "pdf_url": "http://arxiv.org/pdf/2310.08528v3", + "published_date": "2023-10-12", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation", + "authors": [ + "Jiaxiang Tang", + "Jiawei Ren", + "Hang Zhou", + "Ziwei Liu", + "Gang Zeng" + ], + "abstract": "Recent advances in 3D content creation mostly leverage optimization-based 3D generation via score distillation sampling (SDS). Though promising results have been exhibited, these methods often suffer from slow per-sample optimization, limiting their practical usage. In this paper, we propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously. Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space. In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks. To further enhance the texture quality and facilitate downstream applications, we introduce an efficient algorithm to convert 3D Gaussians into textured meshes and apply a fine-tuning stage to refine the details. Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach. Notably, DreamGaussian produces high-quality textured meshes in just 2 minutes from a single-view image, achieving approximately 10 times acceleration compared to existing methods.", + "arxiv_url": "http://arxiv.org/abs/2309.16653v2", + "pdf_url": "http://arxiv.org/pdf/2309.16653v2", + "published_date": "2023-09-28", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Text-to-3D using Gaussian Splatting", + "authors": [ + "Zilong Chen", + "Feng Wang", + "Yikai Wang", + "Huaping Liu" + ], + "abstract": "Automatic text-to-3D generation that combines Score Distillation Sampling (SDS) with the optimization of volume rendering has achieved remarkable progress in synthesizing realistic 3D objects. Yet most existing text-to-3D methods by SDS and volume rendering suffer from inaccurate geometry, e.g., the Janus issue, since it is hard to explicitly integrate 3D priors into implicit 3D representations. Besides, it is usually time-consuming for them to generate elaborate 3D models with rich colors. In response, this paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation. GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under 3D point cloud diffusion prior along with the ordinary 2D SDS optimization, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative appearance refinement to enrich texture details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D assets with delicate details and accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. Our code is available at https://github.com/gsgen3d/gsgen", + "arxiv_url": "http://arxiv.org/abs/2309.16585v4", + "pdf_url": "http://arxiv.org/pdf/2309.16585v4", + "published_date": "2023-09-28", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/gsgen3d/gsgen", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction", + "authors": [ + "Ziyi Yang", + "Xinyu Gao", + "Wen Zhou", + "Shaohui Jiao", + "Yuqing Zhang", + "Xiaogang Jin" + ], + "abstract": "Implicit neural representation has paved the way for new approaches to dynamic scene reconstruction and rendering. Nonetheless, cutting-edge dynamic neural rendering methods rely heavily on these implicit representations, which frequently struggle to capture the intricate details of objects in the scene. Furthermore, implicit methods have difficulty achieving real-time rendering in general dynamic scenes, limiting their use in a variety of tasks. To address the issues, we propose a deformable 3D Gaussians Splatting method that reconstructs scenes using 3D Gaussians and learns them in canonical space with a deformation field to model monocular dynamic scenes. We also introduce an annealing smoothing training mechanism with no extra overhead, which can mitigate the impact of inaccurate poses on the smoothness of time interpolation tasks in real-world datasets. Through a differential Gaussian rasterizer, the deformable 3D Gaussians not only achieve higher rendering quality but also real-time rendering speed. Experiments show that our method outperforms existing methods significantly in terms of both rendering quality and speed, making it well-suited for tasks such as novel-view synthesis, time interpolation, and real-time rendering.", + "arxiv_url": "http://arxiv.org/abs/2309.13101v2", + "pdf_url": "http://arxiv.org/pdf/2309.13101v2", + "published_date": "2023-09-22", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "neural rendering", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Flexible Techniques for Differentiable Rendering with 3D Gaussians", + "authors": [ + "Leonid Keselman", + "Martial Hebert" + ], + "abstract": "Fast, reliable shape reconstruction is an essential ingredient in many computer vision applications. Neural Radiance Fields demonstrated that photorealistic novel view synthesis is within reach, but was gated by performance requirements for fast reconstruction of real scenes and objects. Several recent approaches have built on alternative shape representations, in particular, 3D Gaussians. We develop extensions to these renderers, such as integrating differentiable optical flow, exporting watertight meshes and rendering per-ray normals. Additionally, we show how two of the recent methods are interoperable with each other. These reconstructions are quick, robust, and easily performed on GPU or CPU. For code and visual examples, see https://leonidk.github.io/fmb-plus", + "arxiv_url": "http://arxiv.org/abs/2308.14737v1", + "pdf_url": "http://arxiv.org/pdf/2308.14737v1", + "published_date": "2023-08-28", + "categories": [ + "cs.CV", + "cs.AI", + "cs.GR", + "I.2.10; I.3.7; I.4.0" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Dynamic 3D Gaussians: Tracking by Persistent Dynamic View Synthesis", + "authors": [ + "Jonathon Luiten", + "Georgios Kopanas", + "Bastian Leibe", + "Deva Ramanan" + ], + "abstract": "We present a method that simultaneously addresses the tasks of dynamic scene novel-view synthesis and six degree-of-freedom (6-DOF) tracking of all dense scene elements. We follow an analysis-by-synthesis framework, inspired by recent work that models scenes as a collection of 3D Gaussians which are optimized to reconstruct input images via differentiable rendering. To model dynamic scenes, we allow Gaussians to move and rotate over time while enforcing that they have persistent color, opacity, and size. By regularizing Gaussians' motion and rotation with local-rigidity constraints, we show that our Dynamic 3D Gaussians correctly model the same area of physical space over time, including the rotation of that space. Dense 6-DOF tracking and dynamic reconstruction emerges naturally from persistent dynamic view synthesis, without requiring any correspondence or flow as input. We demonstrate a large number of downstream applications enabled by our representation, including first-person view synthesis, dynamic compositional scene synthesis, and 4D video editing.", + "arxiv_url": "http://arxiv.org/abs/2308.09713v1", + "pdf_url": "http://arxiv.org/pdf/2308.09713v1", + "published_date": "2023-08-18", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D Gaussian Splatting for Real-Time Radiance Field Rendering", + "authors": [ + "Bernhard Kerbl", + "Georgios Kopanas", + "Thomas Leimkühler", + "George Drettakis" + ], + "abstract": "Radiance Field methods have recently revolutionized novel-view synthesis of scenes captured with multiple photos or videos. However, achieving high visual quality still requires neural networks that are costly to train and render, while recent faster methods inevitably trade off speed for quality. For unbounded and complete scenes (rather than isolated objects) and 1080p resolution rendering, no current method can achieve real-time display rates. We introduce three key elements that allow us to achieve state-of-the-art visual quality while maintaining competitive training times and importantly allow high-quality real-time (>= 30 fps) novel-view synthesis at 1080p resolution. First, starting from sparse points produced during camera calibration, we represent the scene with 3D Gaussians that preserve desirable properties of continuous volumetric radiance fields for scene optimization while avoiding unnecessary computation in empty space; Second, we perform interleaved optimization/density control of the 3D Gaussians, notably optimizing anisotropic covariance to achieve an accurate representation of the scene; Third, we develop a fast visibility-aware rendering algorithm that supports anisotropic splatting and both accelerates training and allows realtime rendering. We demonstrate state-of-the-art visual quality and real-time rendering on several established datasets.", + "arxiv_url": "http://arxiv.org/abs/2308.04079v1", + "pdf_url": "http://arxiv.org/pdf/2308.04079v1", + "published_date": "2023-08-08", + "categories": [ + "cs.GR", + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "real-time rendering" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Decoherence in Neutrino Oscillation between 3D Gaussian Wave Packets", + "authors": [ + "Haruhi Mitani", + "Kin-ya Oda" + ], + "abstract": "There is renewed attention to whether we can observe the decoherence effect in neutrino oscillation due to the separation of wave packets with different masses in near-future experiments. As a contribution to this endeavor, we extend the existing formulation based on a single 1D Gaussian wave function to an amplitude between two distinct 3D Gaussian wave packets, corresponding to the neutrinos being produced and detected, with different central momenta and spacetime positions and with different widths. We find that the spatial widths-squared for the production and detection appear additively in the (de)coherence length and in the localization factor for governing the propagation of the wave packet, whereas they appear as the reduced one (inverse of the sum of inverse) in the momentum conservation factor. The overall probability is governed by the ratio of the reduced to the sum.", + "arxiv_url": "http://arxiv.org/abs/2307.12230v2", + "pdf_url": "http://arxiv.org/pdf/2307.12230v2", + "published_date": "2023-07-23", + "categories": [ + "hep-ph", + "hep-th" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NEAT: Distilling 3D Wireframes from Neural Attraction Fields", + "authors": [ + "Nan Xue", + "Bin Tan", + "Yuxi Xiao", + "Liang Dong", + "Gui-Song Xia", + "Tianfu Wu", + "Yujun Shen" + ], + "abstract": "This paper studies the problem of structured 3D reconstruction using wireframes that consist of line segments and junctions, focusing on the computation of structured boundary geometries of scenes. Instead of leveraging matching-based solutions from 2D wireframes (or line segments) for 3D wireframe reconstruction as done in prior arts, we present NEAT, a rendering-distilling formulation using neural fields to represent 3D line segments with 2D observations, and bipartite matching for perceiving and distilling of a sparse set of 3D global junctions. The proposed {NEAT} enjoys the joint optimization of the neural fields and the global junctions from scratch, using view-dependent 2D observations without precomputed cross-view feature matching. Comprehensive experiments on the DTU and BlendedMVS datasets demonstrate our NEAT's superiority over state-of-the-art alternatives for 3D wireframe reconstruction. Moreover, the distilled 3D global junctions by NEAT, are a better initialization than SfM points, for the recently-emerged 3D Gaussian Splatting for high-fidelity novel view synthesis using about 20 times fewer initial 3D points. Project page: \\url{https://xuenan.net/neat}.", + "arxiv_url": "http://arxiv.org/abs/2307.10206v2", + "pdf_url": "http://arxiv.org/pdf/2307.10206v2", + "published_date": "2023-07-14", + "categories": [ + "cs.CV", + "cs.GR" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "DiViNeT: 3D Reconstruction from Disparate Views via Neural Template Regularization", + "authors": [ + "Aditya Vora", + "Akshay Gadi Patil", + "Hao Zhang" + ], + "abstract": "We present a volume rendering-based neural surface reconstruction method that takes as few as three disparate RGB images as input. Our key idea is to regularize the reconstruction, which is severely ill-posed and leaving significant gaps between the sparse views, by learning a set of neural templates to act as surface priors. Our method, coined DiViNet, operates in two stages. It first learns the templates, in the form of 3D Gaussian functions, across different scenes, without 3D supervision. In the reconstruction stage, our predicted templates serve as anchors to help \"stitch'' the surfaces over sparse regions. We demonstrate that our approach is not only able to complete the surface geometry but also reconstructs surface details to a reasonable extent from a few disparate input views. On the DTU and BlendedMVS datasets, our approach achieves the best reconstruction quality among existing methods in the presence of such sparse views and performs on par, if not better, with competing methods when dense views are employed as inputs.", + "arxiv_url": "http://arxiv.org/abs/2306.04699v4", + "pdf_url": "http://arxiv.org/pdf/2306.04699v4", + "published_date": "2023-06-07", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Control4D: Efficient 4D Portrait Editing with Text", + "authors": [ + "Ruizhi Shao", + "Jingxiang Sun", + "Cheng Peng", + "Zerong Zheng", + "Boyao Zhou", + "Hongwen Zhang", + "Yebin Liu" + ], + "abstract": "We introduce Control4D, an innovative framework for editing dynamic 4D portraits using text instructions. Our method addresses the prevalent challenges in 4D editing, notably the inefficiencies of existing 4D representations and the inconsistent editing effect caused by diffusion-based editors. We first propose GaussianPlanes, a novel 4D representation that makes Gaussian Splatting more structured by applying plane-based decomposition in 3D space and time. This enhances both efficiency and robustness in 4D editing. Furthermore, we propose to leverage a 4D generator to learn a more continuous generation space from inconsistent edited images produced by the diffusion-based editor, which effectively improves the consistency and quality of 4D editing. Comprehensive evaluation demonstrates the superiority of Control4D, including significantly reduced training time, high-quality rendering, and spatial-temporal consistency in 4D portrait editing. The link to our project website is https://control4darxiv.github.io.", + "arxiv_url": "http://arxiv.org/abs/2305.20082v2", + "pdf_url": "http://arxiv.org/pdf/2305.20082v2", + "published_date": "2023-05-31", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction", + "authors": [ + "Xinhang Liu", + "Jiaben Chen", + "Shiu-hong Kao", + "Yu-Wing Tai", + "Chi-Keung Tang" + ], + "abstract": "Novel view synthesis via Neural Radiance Fields (NeRFs) or 3D Gaussian Splatting (3DGS) typically necessitates dense observations with hundreds of input images to circumvent artifacts. We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images, by leveraging a diffusion model pre-trained from multiview datasets. Different from using diffusion priors to regularize representation optimization, our method directly uses diffusion-generated images to train NeRF/3DGS as if they were real input views. Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality photorealistic pseudo-observations. To resolve consistency among pseudo-observations and real input views, we develop an uncertainty measure to guide the diffusion model's generation. Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times. Extensive experiments across diverse and challenging datasets validate that our approach outperforms existing state-of-the-art methods and is capable of synthesizing novel views with super-resolution in the few-view setting.", + "arxiv_url": "http://arxiv.org/abs/2305.15171v4", + "pdf_url": "http://arxiv.org/pdf/2305.15171v4", + "published_date": "2023-05-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "gaussian splatting", + "3d gaussian", + "nerf" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "NOVUM: Neural Object Volumes for Robust Object Classification", + "authors": [ + "Artur Jesslen", + "Guofeng Zhang", + "Angtian Wang", + "Wufei Ma", + "Alan Yuille", + "Adam Kortylewski" + ], + "abstract": "Discriminative models for object classification typically learn image-based representations that do not capture the compositional and 3D nature of objects. In this work, we show that explicitly integrating 3D compositional object representations into deep networks for image classification leads to a largely enhanced generalization in out-of-distribution scenarios. In particular, we introduce a novel architecture, referred to as NOVUM, that consists of a feature extractor and a neural object volume for every target object class. Each neural object volume is a composition of 3D Gaussians that emit feature vectors. This compositional object representation allows for a highly robust and fast estimation of the object class by independently matching the features of the 3D Gaussians of each category to features extracted from an input image. Additionally, the object pose can be estimated via inverse rendering of the corresponding neural object volume. To enable the classification of objects, the neural features at each 3D Gaussian are trained discriminatively to be distinct from (i) the features of 3D Gaussians in other categories, (ii) features of other 3D Gaussians of the same object, and (iii) the background features. Our experiments show that NOVUM offers intriguing advantages over standard architectures due to the 3D compositional structure of the object representation, namely: (1) An exceptional robustness across a spectrum of real-world and synthetic out-of-distribution shifts and (2) an enhanced human interpretability compared to standard models, all while maintaining real-time inference and a competitive accuracy on in-distribution data.", + "arxiv_url": "http://arxiv.org/abs/2305.14668v4", + "pdf_url": "http://arxiv.org/pdf/2305.14668v4", + "published_date": "2023-05-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Generation of artificial facial drug abuse images using Deep De-identified anonymous Dataset augmentation through Genetics Algorithm (3DG-GA)", + "authors": [ + "Hazem Zein", + "Lou Laurent", + "Régis Fournier", + "Amine Nait-Ali" + ], + "abstract": "In biomedical research and artificial intelligence, access to large, well-balanced, and representative datasets is crucial for developing trustworthy applications that can be used in real-world scenarios. However, obtaining such datasets can be challenging, as they are often restricted to hospitals and specialized facilities. To address this issue, the study proposes to generate highly realistic synthetic faces exhibiting drug abuse traits through augmentation. The proposed method, called \"3DG-GA\", Deep De-identified anonymous Dataset Generation, uses Genetics Algorithm as a strategy for synthetic faces generation. The algorithm includes GAN artificial face generation, forgery detection, and face recognition. Initially, a dataset of 120 images of actual facial drug abuse is used. By preserving, the drug traits, the 3DG-GA provides a dataset containing 3000 synthetic facial drug abuse images. The dataset will be open to the scientific community, which can reproduce our results and benefit from the generated datasets while avoiding legal or ethical restrictions.", + "arxiv_url": "http://arxiv.org/abs/2304.06106v1", + "pdf_url": "http://arxiv.org/pdf/2304.06106v1", + "published_date": "2023-04-12", + "categories": [ + "cs.CV", + "cs.AI" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Quantitative perfusion and water transport time model from multi b-value diffusion magnetic resonance imaging validated against neutron capture microspheres", + "authors": [ + "M. Liu", + "N. Saadat", + "Y. Jeong", + "S. Roth", + "M. Niekrasz", + "M. Giurcanu", + "T. Carroll", + "G. Christoforidis" + ], + "abstract": "Intravoxel Incoherent Motion (IVIM) is a non-contrast magnetic resonance imaging diffusion-based scan that uses a multitude of b-values to measure various speeds of molecular perfusion and diffusion, sidestepping inaccuracy of arterial input functions or bolus kinetics in quantitative imaging. We test a new method of IVIM quantification and compare our values to reference standard neutron capture microspheres across normocapnia, CO2 induced hypercapnia, and middle cerebral artery occlusion in a controlled animal model. Perfusion quantification in ml/100g/min compared to microsphere perfusion uses the 3D gaussian probability distribution and defined water transport time as when 50% of the molecules remain in the tissue of interest. Perfusion, water transport time, and infarct volume was compared to reference standards. Simulations were studied to suppress non-specific cerebrospinal fluid (CSF). Linear regression analysis of quantitative perfusion returned correlation (slope = .55, intercept = 52.5, $R^2$= .64). Linear regression for water transport time asymmetry in infarcted tissue was excellent (slope = .59, intercept = .3, $R^2$ = .93). Strong linear agreement also was found for infarct volume (slope = 1.01, $R^2$= .79). Simulation of CSF suppression via inversion recovery returned blood signal reduced by 82% from combined T1 and T2 effects. Intra-physiologic state comparison of perfusion shows potential partial volume effects which require further study especially in disease states. The accuracy and sensitivity of IVIM provides evidence that observed signal changes reflect cytotoxic edema and tissue perfusion. Partial volume contamination of CSF may be better removed during post-processing rather than with inversion recovery to avoid artificial loss of blood signal.", + "arxiv_url": "http://arxiv.org/abs/2304.01888v1", + "pdf_url": "http://arxiv.org/pdf/2304.01888v1", + "published_date": "2023-04-04", + "categories": [ + "physics.med-ph", + "eess.IV" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Light-Weight Pointcloud Representation with Sparse Gaussian Process", + "authors": [ + "Mahmoud Ali", + "Lantao Liu" + ], + "abstract": "This paper presents a framework to represent high-fidelity pointcloud sensor observations for efficient communication and storage. The proposed approach exploits Sparse Gaussian Process to encode pointcloud into a compact form. Our approach represents both the free space and the occupied space using only one model (one 2D Sparse Gaussian Process) instead of the existing two-model framework (two 3D Gaussian Mixture Models). We achieve this by proposing a variance-based sampling technique that effectively discriminates between the free and occupied space. The new representation requires less memory footprint and can be transmitted across limitedbandwidth communication channels. The framework is extensively evaluated in simulation and it is also demonstrated using a real mobile robot equipped with a 3D LiDAR. Our method results in a 70 to 100 times reduction in the communication rate compared to sending the raw pointcloud.", + "arxiv_url": "http://arxiv.org/abs/2301.11251v1", + "pdf_url": "http://arxiv.org/pdf/2301.11251v1", + "published_date": "2023-01-26", + "categories": [ + "cs.RO" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "FedGS: Federated Graph-based Sampling with Arbitrary Client Availability", + "authors": [ + "Zheng Wang", + "Xiaoliang Fan", + "Jianzhong Qi", + "Haibing Jin", + "Peizhen Yang", + "Siqi Shen", + "Cheng Wang" + ], + "abstract": "While federated learning has shown strong results in optimizing a machine learning model without direct access to the original data, its performance may be hindered by intermittent client availability which slows down the convergence and biases the final learned model. There are significant challenges to achieve both stable and bias-free training under arbitrary client availability. To address these challenges, we propose a framework named Federated Graph-based Sampling (FedGS), to stabilize the global model update and mitigate the long-term bias given arbitrary client availability simultaneously. First, we model the data correlations of clients with a Data-Distribution-Dependency Graph (3DG) that helps keep the sampled clients data apart from each other, which is theoretically shown to improve the approximation to the optimal model update. Second, constrained by the far-distance in data distribution of the sampled clients, we further minimize the variance of the numbers of times that the clients are sampled, to mitigate long-term bias. To validate the effectiveness of FedGS, we conduct experiments on three datasets under a comprehensive set of seven client availability modes. Our experimental results confirm FedGS's advantage in both enabling a fair client-sampling scheme and improving the model performance under arbitrary client availability. Our code is available at \\url{https://github.com/WwZzz/FedGS}.", + "arxiv_url": "http://arxiv.org/abs/2211.13975v3", + "pdf_url": "http://arxiv.org/pdf/2211.13975v3", + "published_date": "2022-11-25", + "categories": [ + "cs.LG" + ], + "github_url": "https://github.com/WwZzz/FedGS", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3DG-STFM: 3D Geometric Guided Student-Teacher Feature Matching", + "authors": [ + "Runyu Mao", + "Chen Bai", + "Yatong An", + "Fengqing Zhu", + "Cheng Lu" + ], + "abstract": "We tackle the essential task of finding dense visual correspondences between a pair of images. This is a challenging problem due to various factors such as poor texture, repetitive patterns, illumination variation, and motion blur in practical scenarios. In contrast to methods that use dense correspondence ground-truths as direct supervision for local feature matching training, we train 3DG-STFM: a multi-modal matching model (Teacher) to enforce the depth consistency under 3D dense correspondence supervision and transfer the knowledge to 2D unimodal matching model (Student). Both teacher and student models consist of two transformer-based matching modules that obtain dense correspondences in a coarse-to-fine manner. The teacher model guides the student model to learn RGB-induced depth information for the matching purpose on both coarse and fine branches. We also evaluate 3DG-STFM on a model compression task. To the best of our knowledge, 3DG-STFM is the first student-teacher learning method for the local feature matching task. The experiments show that our method outperforms state-of-the-art methods on indoor and outdoor camera pose estimations, and homography estimation problems. Code is available at: https://github.com/Ryan-prime/3DG-STFM.", + "arxiv_url": "http://arxiv.org/abs/2207.02375v2", + "pdf_url": "http://arxiv.org/pdf/2207.02375v2", + "published_date": "2022-07-06", + "categories": [ + "cs.CV" + ], + "github_url": "https://github.com/Ryan-prime/3DG-STFM", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Contour Generation with Realistic Inter-observer Variation", + "authors": [ + "Eliana Vásquez Osorio", + "Jane Shortall", + "Jennifer Robbins", + "Marcel van Herk" + ], + "abstract": "Contours are used in radiotherapy treatment planning to identify regions to be irradiated with high dose and regions to be spared. Therefore, any contouring uncertainty influences the whole treatment. Even though this is the biggest remaining source of uncertainty when daily IGRT or adaptation is used, it has not been accounted for quantitatively in treatment planning. Using probabilistic planning allows to directly account for contouring uncertainties in plan optimisation. The first step is to create an algorithm that can generate many realistic contours with variation matching actual inter-observer variation. We propose a methodology to generate random contours, based on measured spatial inter-observer variation, IOV, and a single parameter that controls its geometrical dependency: alpha, the width of the 3D Gaussian used as point spread function (PSF). We used a level set formulation of the median shape, with the level set function defined as the signed distance transform. To create a new contour, we added the median level set and a noise map which was weighted with the IOV map and then convolved with the PSF. Thresholding the level set function reconstructs the newly generated contour. We used data from 18 patients from the golden atlas, consisting of five prostate delineations on T2-w MRI scans. To evaluate the similarity between the contours, we calculated the maximum distance to agreement to the median shape (maxDTA), and the minimum dose of the contours using an ideal dose distribution. We used the two-sample Kolmogorov-Smirnov test to compare the distributions for maxDTA and minDose between the generated and manually delineated contours. Only alpha=0.75cm produced maxDTA and minDose distributions that were not significantly different from the manually delineated structures. Accounting for the PSF is essential to correctly simulate inter-observer variation.", + "arxiv_url": "http://arxiv.org/abs/2204.10098v1", + "pdf_url": "http://arxiv.org/pdf/2204.10098v1", + "published_date": "2022-04-21", + "categories": [ + "physics.med-ph" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "The Sloan Digital Sky Survey Peculiar Velocity Catalogue", + "authors": [ + "Cullan Howlett", + "Khaled Said", + "John R. Lucey", + "Matthew Colless", + "Fei Qin", + "Yan Lai", + "R. Brent Tully", + "Tamara M. Davis" + ], + "abstract": "We present a new catalogue of distances and peculiar velocities (PVs) of $34,059$ early-type galaxies derived from Fundamental Plane (FP) measurements using data from the Sloan Digital Sky Survey (SDSS). This $7016\\,\\mathrm{deg}^{2}$ homogeneous sample comprises the largest set of peculiar velocities produced to date and extends the reach of PV surveys up to a redshift limit of $z=0.1$. Our SDSS-based FP distance measurements have a mean uncertainty of 23%. Alongside the data, we produce an ensemble of 2,048 mock galaxy catalogues that reproduce the data selection function, and are used to validate our fitting pipelines and check for systematic errors. We uncover a significant trend between group richness and mean surface brightness within the sample, which may hint at an environmental dependence within the FP or the presence of unresolved systematics, and can result in biased peculiar velocities. This is removed using multiple FP fits as function of group richness, a procedure made tractable through a new analytic derivation for the integral of a 3D Gaussian over non-trivial limits. Our catalogue is calibrated to the zero-point of the CosmicFlows-III sample with an uncertainty of $0.004$ dex (not including cosmic variance or the error within CosmicFlows-III itself), which is validated using independent cross-checks with the predicted zero-point from the 2M++ reconstruction of our local velocity field. Finally, as an example of what is possible with our new catalogue, we obtain preliminary bulk flow measurements up to a depth of $135\\,h^{-1}\\mathrm{Mpc}$. We find a slightly larger-than-expected bulk flow at high redshift, although this could be caused by the presence of the Shapley supercluster which lies outside the SDSS PV footprint.", + "arxiv_url": "http://arxiv.org/abs/2201.03112v2", + "pdf_url": "http://arxiv.org/pdf/2201.03112v2", + "published_date": "2022-01-09", + "categories": [ + "astro-ph.CO", + "astro-ph.GA" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "The Yang-Mills heat flow with random distributional initial data", + "authors": [ + "Sky Cao", + "Sourav Chatterjee" + ], + "abstract": "We construct local solutions to the Yang-Mills heat flow (in the DeTurck gauge) for a certain class of random distributional initial data, which includes the 3D Gaussian free field. The main idea, which goes back to work of Bourgain as well as work of Da Prato-Debussche, is to decompose the solution into a rougher linear part and a smoother nonlinear part, and to control the latter by probabilistic arguments. In a companion work, we use the main results of this paper to propose a way towards the construction of 3D Yang-Mills measures.", + "arxiv_url": "http://arxiv.org/abs/2111.10652v4", + "pdf_url": "http://arxiv.org/pdf/2111.10652v4", + "published_date": "2021-11-20", + "categories": [ + "math.PR", + "hep-th", + "math-ph", + "math.AP", + "math.MP", + "35R60, 35A01, 60G60, 81T13" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Topology and geometry of Gaussian random fields II: on critical points, excursion sets, and persistent homology", + "authors": [ + "Pratyush Pranav" + ], + "abstract": "This paper is second in the series, following Pranav et al. (2019), focused on the characterization of geometric and topological properties of 3D Gaussian random fields. We focus on the formalism of persistent homology, the mainstay of Topological Data Analysis (TDA), in the context of excursion set formalism. We also focus on the structure of critical points of stochastic fields, and their relationship with formation and evolution of structures in the universe. The topological background is accompanied by an investigation of Gaussian field simulations based on the LCDM spectrum, as well as power-law spectra with varying spectral indices. We present the statistical properties in terms of the intensity and difference maps constructed from the persistence diagrams, as well as their distribution functions. We demonstrate that the intensity maps encapsulate information about the distribution of power across the hierarchies of structures in more detailed than the Betti numbers or the Euler characteristic. In particular, the white noise ($n = 0$) case with flat spectrum stands out as the divide between models with positive and negative spectral index. It has the highest proportion of low significance features. This level of information is not available from the geometric Minkowski functionals or the topological Euler characteristic, or even the Betti numbers, and demonstrates the usefulness of hierarchical topological methods. Another important result is the observation that topological characteristics of Gaussian fields depend on the power spectrum, as opposed to the geometric measures that are insensitive to the power spectrum characteristics.", + "arxiv_url": "http://arxiv.org/abs/2109.08721v1", + "pdf_url": "http://arxiv.org/pdf/2109.08721v1", + "published_date": "2021-09-17", + "categories": [ + "astro-ph.CO", + "math.AT" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "GaussiGAN: Controllable Image Synthesis with 3D Gaussians from Unposed Silhouettes", + "authors": [ + "Youssef A. Mejjati", + "Isa Milefchik", + "Aaron Gokaslan", + "Oliver Wang", + "Kwang In Kim", + "James Tompkin" + ], + "abstract": "We present an algorithm that learns a coarse 3D representation of objects from unposed multi-view 2D mask supervision, then uses it to generate detailed mask and image texture. In contrast to existing voxel-based methods for unposed object reconstruction, our approach learns to represent the generated shape and pose with a set of self-supervised canonical 3D anisotropic Gaussians via a perspective camera, and a set of per-image transforms. We show that this approach can robustly estimate a 3D space for the camera and object, while recent baselines sometimes struggle to reconstruct coherent 3D spaces in this setting. We show results on synthetic datasets with realistic lighting, and demonstrate object insertion with interactive posing. With our work, we help move towards structured representations that handle more real-world variation in learning-based object reconstruction.", + "arxiv_url": "http://arxiv.org/abs/2106.13215v1", + "pdf_url": "http://arxiv.org/pdf/2106.13215v1", + "published_date": "2021-06-24", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Probabilistic Localization of Insect-Scale Drones on Floating-Gate Inverter Arrays", + "authors": [ + "Priyesh Shukla", + "Ankith Muralidhar", + "Nick Iliev", + "Theja Tulabandhula", + "Sawyer B. Fuller", + "Amit Ranjan Trivedi" + ], + "abstract": "We propose a novel compute-in-memory (CIM)-based ultra-low-power framework for probabilistic localization of insect-scale drones. The conventional probabilistic localization approaches rely on the three-dimensional (3D) Gaussian Mixture Model (GMM)-based representation of a 3D map. A GMM model with hundreds of mixture functions is typically needed to adequately learn and represent the intricacies of the map. Meanwhile, localization using complex GMM map models is computationally intensive. Since insect-scale drones operate under extremely limited area/power budget, continuous localization using GMM models entails much higher operating energy -- thereby, limiting flying duration and/or size of the drone due to a larger battery. Addressing the computational challenges of localization in an insect-scale drone using a CIM approach, we propose a novel framework of 3D map representation using a harmonic mean of \"Gaussian-like\" mixture (HMGM) model. The likelihood function useful for drone localization can be efficiently implemented by connecting many multi-input inverters in parallel, each programmed with the parameters of the 3D map model represented as HMGM. When the depth measurements are projected to the input of the implementation, the summed current of the inverters emulates the likelihood of the measurement. We have characterized our approach on an RGB-D indoor localization dataset. The average localization error in our approach is $\\sim$0.1125 m which is only slightly degraded than software-based evaluation ($\\sim$0.08 m). Meanwhile, our localization framework is ultra-low-power, consuming as little as $\\sim$17 $\\mu$W power while processing a depth frame in 1.33 ms over hundred pose hypotheses in the particle-filtering (PF) algorithm used to localize the drone.", + "arxiv_url": "http://arxiv.org/abs/2102.08247v2", + "pdf_url": "http://arxiv.org/pdf/2102.08247v2", + "published_date": "2021-02-16", + "categories": [ + "cs.RO", + "cs.AR", + "eess.IV", + "B.7; I.2.9" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Visual Analysis of Large Multivariate Scattered Data using Clustering and Probabilistic Summaries", + "authors": [ + "Tobias Rapp", + "Christoph Peters", + "Carsten Dachsbacher" + ], + "abstract": "Rapidly growing data sizes of scientific simulations pose significant challenges for interactive visualization and analysis techniques. In this work, we propose a compact probabilistic representation to interactively visualize large scattered datasets. In contrast to previous approaches that represent blocks of volumetric data using probability distributions, we model clusters of arbitrarily structured multivariate data. In detail, we discuss how to efficiently represent and store a high-dimensional distribution for each cluster. We observe that it suffices to consider low-dimensional marginal distributions for two or three data dimensions at a time to employ common visual analysis techniques. Based on this observation, we represent high-dimensional distributions by combinations of low-dimensional Gaussian mixture models. We discuss the application of common interactive visual analysis techniques to this representation. In particular, we investigate several frequency-based views, such as density plots in 1D and 2D, density-based parallel coordinates, and a time histogram. We visualize the uncertainty introduced by the representation, discuss a level-of-detail mechanism, and explicitly visualize outliers. Furthermore, we propose a spatial visualization by splatting anisotropic 3D Gaussians for which we derive a closed-form solution. Lastly, we describe the application of brushing and linking to this clustered representation. Our evaluation on several large, real-world datasets demonstrates the scaling of our approach.", + "arxiv_url": "http://arxiv.org/abs/2008.09544v2", + "pdf_url": "http://arxiv.org/pdf/2008.09544v2", + "published_date": "2020-08-21", + "categories": [ + "cs.GR" + ], + "github_url": "", + "keywords": [ + "3d gaussian" + ], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Algebraic 3D Graphic Statics: reciprocal constructions", + "authors": [ + "Márton Hablicsek", + "Masoud Akbarzadeh", + "Yi Guo" + ], + "abstract": "The recently developed 3D graphic statics (3DGS) lacks a rigorous mathematical definition relating the geometrical and topological properties of the reciprocal polyhedral diagrams as well as a precise method for the geometric construction of these diagrams. This paper provides a fundamental algebraic formulation for 3DGS by developing equilibrium equations around the edges of the primal diagram and satisfying the equations by the closeness of the polygons constructed by the edges of the corresponding faces in the dual/reciprocal diagram. The research provides multiple numerical methods for solving the equilibrium equations and explains the advantage of using each technique. The approach of this paper can be used for compression-and-tension combined form-finding and analysis as it allows constructing both the form and force diagram based on the interpretation of the input diagram. Besides, the paper expands on the geometric/static degrees of (in)determinacies of the diagrams using the algebraic formulation and shows how these properties can be used for the constrained manipulation of the polyhedrons in an interactive environment without breaking the reciprocity between the two.", + "arxiv_url": "http://arxiv.org/abs/2007.15720v1", + "pdf_url": "http://arxiv.org/pdf/2007.15720v1", + "published_date": "2020-07-30", + "categories": [ + "cs.CG", + "J.6; J.2" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "Algebraic 3D Graphic Statics: Constrained Areas", + "authors": [ + "Masoud Akbarzadeh", + "Marton Hablicsek" + ], + "abstract": "This research provides algorithms and numerical methods to geometrically control the magnitude of the internal and external forces in the reciprocal diagrams of 3D/Polyhedral Graphic statics (3DGS). In 3DGS, the form of the structure and its equilibrium of forces is represented by two polyhedral diagrams that are geometrically and topologically related. The areas of the faces of the force diagram represent the magnitude of the internal and external forces in the system. For the first time, the methods of this research allow the user to control and constrain the areas and edge lengths of the faces of general polyhedrons that can be convex, self-intersecting, or concave. As a result, a designer can explicitly control the force magnitudes in the force diagram and explore the equilibrium of a variety of compression and tension-combined funicular structural forms. In this method, a quadratic formulation is used to compute the area of a single face based on its edge lengths. The approach is applied to manipulating the face geometry with a predefined area and the edge lengths. Subsequently, the geometry of the polyhedron is updated with newly changed faces. This approach is a multi-step algorithm where each step includes computing the geometry of a single face and updating the polyhedral geometry. One of the unique results of this framework is the construction of the zero-area, self-intersecting faces, where the sum of the signed areas of a self-intersecting face is zero, representing a member with zero force in the form diagram. The methodology of this research can clarify the equilibrium of some systems that could not be previously justified using reciprocal polyhedral diagrams. Therefore, it generalizes the principle of the equilibrium of polyhedral frames and opens a completely new horizon in the design of highly-sophisticated funicular polyhedral structures beyond compression-only systems.", + "arxiv_url": "http://arxiv.org/abs/2007.15133v1", + "pdf_url": "http://arxiv.org/pdf/2007.15133v1", + "published_date": "2020-07-29", + "categories": [ + "cs.CG", + "physics.app-ph", + "J.6; J.2" + ], + "github_url": "", + "keywords": [], + "citations": 0, + "semantic_url": "" + }, + { + "title": "3D-GMNet: Single-View 3D Shape Recovery as A Gaussian Mixture", + "authors": [ + "Kohei Yamashita", + "Shohei Nobuhara", + "Ko Nishino" + ], + "abstract": "In this paper, we introduce 3D-GMNet, a deep neural network for 3D object shape reconstruction from a single image. As the name suggests, 3D-GMNet recovers 3D shape as a Gaussian mixture. In contrast to voxels, point clouds, or meshes, a Gaussian mixture representation provides an analytical expression with a small memory footprint while accurately representing the target 3D shape. At the same time, it offers a number of additional advantages including instant pose estimation and controllable level-of-detail reconstruction, while also enabling interpretation as a point cloud, volume, and a mesh model. We train 3D-GMNet end-to-end with single input images and corresponding 3D models by introducing two novel loss functions, a 3D Gaussian mixture loss and a 2D multi-view loss, which collectively enable accurate shape reconstruction as kernel density estimation. We thoroughly evaluate the effectiveness of 3D-GMNet with synthetic and real images of objects. The results show accurate reconstruction with a compact representation that also realizes novel applications of single-image 3D reconstruction.", + "arxiv_url": "http://arxiv.org/abs/1912.04663v2", + "pdf_url": "http://arxiv.org/pdf/1912.04663v2", + "published_date": "2019-12-10", + "categories": [ + "cs.CV" + ], + "github_url": "", + "keywords": [ + "3d gaussian", + "3d reconstruction" + ], + "citations": 0, + "semantic_url": "" + } +] \ No newline at end of file