Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Themyr, Loic; Rambour, Clement; Thome, Nicolas; Collins, Toby; Hostettler, Alexandre

Computer Science > Computer Vision and Pattern Recognition

arXiv:2212.07890 (cs)

[Submitted on 15 Dec 2022]

Title:Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Authors:Loic Themyr, Clement Rambour, Nicolas Thome, Toby Collins, Alexandre Hostettler

View PDF

Abstract:Transformers have proved to be very effective for visual recognition tasks. In particular, vision transformers construct compressed global representations through self-attention and learnable class tokens. Multi-resolution transformers have shown recent successes in semantic segmentation but can only capture local interactions in high-resolution feature maps. This paper extends the notion of global tokens to build GLobal Attention Multi-resolution (GLAM) transformers. GLAM is a generic module that can be integrated into most existing transformer backbones. GLAM includes learnable global tokens, which unlike previous methods can model interactions between all image regions, and extracts powerful representations during training. Extensive experiments show that GLAM-Swin or GLAM-Swin-UNet exhibit substantially better performances than their vanilla counterparts on ADE20K and Cityscapes. Moreover, GLAM can be used to segment large 3D medical images, and GLAM-nnFormer achieves new state-of-the-art performance on the BCV dataset.

Comments:	Winter Conference on Applications of Computer Vision (WACV 2023)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
MSC classes:	68T45
Cite as:	arXiv:2212.07890 [cs.CV]
	(or arXiv:2212.07890v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2212.07890

Submission history

From: Loic Themyr [view email]
[v1] Thu, 15 Dec 2022 15:19:09 UTC (6,111 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators