Eschernet: A generative model for scalable view synthesis

X Kong, S Liu, X Lyu, M Taher, X Qi… - Proceedings of the …, 2024 - openaccess.thecvf.com
Proceedings of the IEEE/CVF Conference on Computer Vision and …, 2024openaccess.thecvf.com
We introduce EscherNet a multi-view conditioned diffusion model for view synthesis.
EscherNet learns implicit and generative 3D representations coupled with a specialised
camera positional encoding allowing precise and continuous relative control of the camera
transformation between an arbitrary number of reference and target views. EscherNet offers
exceptional generality flexibility and scalability in view synthesis---it can generate more than
100 consistent target views simultaneously on a single consumer-grade GPU despite being …
Abstract
We introduce EscherNet a multi-view conditioned diffusion model for view synthesis. EscherNet learns implicit and generative 3D representations coupled with a specialised camera positional encoding allowing precise and continuous relative control of the camera transformation between an arbitrary number of reference and target views. EscherNet offers exceptional generality flexibility and scalability in view synthesis---it can generate more than 100 consistent target views simultaneously on a single consumer-grade GPU despite being trained with a fixed number of 3 reference views to 3 target views. As a result EscherNet not only addresses zero-shot novel view synthesis but also naturally unifies single-and multi-image 3D reconstruction combining these diverse tasks into a single cohesive framework. Our extensive experiments demonstrate that EscherNet achieves state-of-the-art performance in multiple benchmarks even when compared to methods specifically tailored for each individual problem. This remarkable versatility opens up new directions for designing scalable neural architectures for 3D vision. Project page: https://kxhit. github. io/EscherNet.
openaccess.thecvf.com
Showing the best result for this search. See all results