Heal-swin: A vision transformer on the sphere
Proceedings of the IEEE/CVF Conference on Computer Vision and …, 2024•openaccess.thecvf.com
High-resolution wide-angle fisheye images are becoming more and more important for
robotics applications such as autonomous driving. However using ordinary convolutional
neural networks or vision transformers on this data is problematic due to projection and
distortion losses introduced when projecting to a rectangular grid on the plane. We introduce
the HEAL-SWIN transformer which combines the highly uniform Hierarchical Equal Area iso-
Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical …
robotics applications such as autonomous driving. However using ordinary convolutional
neural networks or vision transformers on this data is problematic due to projection and
distortion losses introduced when projecting to a rectangular grid on the plane. We introduce
the HEAL-SWIN transformer which combines the highly uniform Hierarchical Equal Area iso-
Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical …
Abstract
High-resolution wide-angle fisheye images are becoming more and more important for robotics applications such as autonomous driving. However using ordinary convolutional neural networks or vision transformers on this data is problematic due to projection and distortion losses introduced when projecting to a rectangular grid on the plane. We introduce the HEAL-SWIN transformer which combines the highly uniform Hierarchical Equal Area iso-Latitude Pixelation (HEALPix) grid used in astrophysics and cosmology with the Hierarchical Shifted-Window (SWIN) transformer to yield an efficient and flexible model capable of training on high-resolution distortion-free spherical data. In HEAL-SWIN the nested structure of the HEALPix grid is used to perform the patching and windowing operations of the SWIN transformer enabling the network to process spherical representations with minimal computational overhead. We demonstrate the superior performance of our model on both synthetic and real automotive datasets as well as a selection of other image datasets for semantic segmentation depth regression and classification tasks. Our code is publicly available.
openaccess.thecvf.com
Showing the best result for this search. See all results