Zhang, Xiaoshuai

Learning to Capture, Understand, and Generate Large-Scale 3D Scenes

2024

Zhang, Xiaoshuai
Advisor(s): Su, Hao

Abstract

As the world becomes increasingly digitized, the demand for advanced 3D scene understanding has expanded beyond academic research into practical applications such as virtual reality (VR), augmented reality (AR), autonomous robotics, urban planning, and entertainment industries like gaming and film. The central aim of this dissertation is to push the boundaries of how we capture, interpret, and generate these large-scale 3D scenes, advancing both theoretical understanding and practical implementations.

Our key contributions include a novel framework, NeRFusion, for fast and scalable radiance field reconstruction, specifically designed for large indoor environments. By utilizing recurrent neural networks and sparse voxel grids, this framework achieves a balance between geometric accuracy and photorealism, significantly improving efficiency over traditional methods. Additionally, this dissertation introduces nerflets, an innovative 3D scene representation that breaks down complex scenes into smaller, interpretable radiance fields. This allows for more efficient storage and enhanced semantic understanding, enabling advanced tasks like 3D panoptic segmentation and interactive scene editing. The dissertation also proposes the ConDense pre-training scheme, which unifies 2D and 3D feature learning. Through a ray-marching process inspired by Neural Radiance Fields (NeRF), ConDense ensures consistent 2D-3D feature alignment during pre-training, improving performance across various downstream tasks, such as 2D and 3D classification and segmentation tasks, and cross-modality scene query and retrieval. Finally, the dissertation briefly touches on a novel methodology for generating 3D scenes by combining 2D diffusion models with 3D implicit scene representations, highlighting a promising direction for further study in the field.

The research pushes the boundaries of 3D scene capturing, understanding, and generation, offering solutions that are both practical and theoretically significant. These innovations not only advance the field but also provide valuable tools for industries reliant on high-fidelity 3D environments, paving the way for more intelligent, interactive digital worlds.

Main Content

For improved accessibility of PDF content, download the file to your device.

UC San Diego

Learning to Capture, Understand, and Generate Large-Scale 3D Scenes