Vision-Perceptual Transformer Network for Semantic Scene Understanding

Mohamad Alansari; Hamad AlRemeithi; Hamad AlRemeithi; Bilal Hassan; Bilal Hassan; Sara Alansari; Jorge Dias; Jorge Dias; Majid Khonji; Majid Khonji; Naoufel Werghi; Naoufel Werghi; Naoufel Werghi; Sajid Javed; Sajid Javed

Research.Publish.Connect.

*Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

*Please fill out at least one Field.

Name:
Country:
Subject:

Advanced Search Affiliations Search

If you're looking for an exact phrase use quotation marks on text fields.

Proceedings

Proceedings Search *Please fill out at least one Field. *Value must be an number!

Title:
ISBN:
Year:
Acronym:
Subject:

Advanced Search Proceedings Search

If you're looking for an exact phrase use quotation marks on text fields.

Papers

Papers Search *Please fill out at least one Field.

Title:
Author:
Affiliation:
Subject:

Advanced Search Papers Search

If you're looking for an exact phrase use quotation marks on text fields.

Authors

Authors Search *Please fill out at least one Field.

Name:
Affiliation:
Country:
Conference:
Subject:

Advanced Search Authors Search

If you're looking for an exact phrase use quotation marks on text fields.

Advanced Search

Paper

Vision-Perceptual Transformer Network for Semantic Scene Understanding

Topics: Segmentation and Grouping; Vision for Robotics

In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3 VISAPP: VISAPP, 325-332, 2024 , Rome, Italy

Authors: Mohamad Alansari ¹ ; Hamad AlRemeithi ^{1

;

2} ; Bilal Hassan ^{1

;

3} ; Sara Alansari ¹ ; Jorge Dias ^{1

;

3} ; Majid Khonji ^{1

;

3} ; Naoufel Werghi ^{1

;

3

;

4} and Sajid Javed ^{1

;

3}

Affiliations: ¹ Department of Electrical Engineering and Computer Science, Khalifa University, Abu Dhabi, U.A.E. ; ² Research and Technology Development Department, Tawauzn Technology & Innovation, Abu Dhabi, U.A.E. ; ³ Center for Autonomous Robotic Systems, Khalifa University, Abu Dhabi, U.A.E. ; ⁴ Center for Cyber-Physical Systems, Khalifa University, Abu Dhabi, U.A.E.

Keyword(s): Attention Mechanisms, Computational Resources, Pyramid Vision Transformers, Scene Understanding, Semantic Segmentation.

Abstract: Semantic segmentation, essential in computer vision, involves labeling each image pixel with its semantic class. Transformer-based models, recognized for their exceptional performance, have been pivotal in advancing this field. Our contribution, the Vision-Perceptual Transformer Network (VPTN), ingeniously combines transformer encoders with a feature pyramid-based decoder to deliver precise segmentation maps with minimal computational burden. VPTN’s transformative power lies in its integration of the pyramiding technique, enhancing multi-scale variations handling. In direct comparisons with Vision Transformer-based networks and variants, VPTN consistently excels. On average, it achieves 4.2%, 3.41%, and 6.24% higher mean Intersection over Union (mIoU) compared to Dense Prediction (DPT), Data-efficient image Transformer (DeiT), and Swin Transformer networks, while demanding only 15.63%, 3.18%, and 10.05% of their Giga Floating-Point Operations (GFLOPs). Our validation spans five diver se datasets, including Cityscapes, BDD100K, Mapil-lary Vistas, CamVid, and ADE20K. VPTN secures the position of state-of-the-art (SOTA) on BDD100K and CamVid and consistently outperforms existing deep learning models on other datasets, boasting mIoU scores of 82.6%, 67.29%, 61.2%, 86.3%, and 55.3%, respectively. Impressively, it does so with an average computational complexity just 11.44% of SOTA models. VPTN represents a significant advancement in semantic segmentation, balancing efficiency and performance. It shows promising potential, especially for autonomous driving and natural setting computer vision applications. (More)

CC BY-NC-ND 4.0

Guest: Register as new SciTePress user now for free.

SciTePress user: please login.

My Papers

You are not signed in, therefore limits apply to your IP address 3.146.221.205

In the current month:

Recent papers: 100 available of 100 total

2⁺ years older papers: 200 available of 200 total

Paper citation in several formats:

Alansari, M. ; AlRemeithi, H. ; Hassan, B. ; Alansari, S. ; Dias, J. ; Khonji, M. ; Werghi, N. and Javed, S. (2024). Vision-Perceptual Transformer Network for Semantic Scene Understanding. In Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP; ISBN 978-989-758-679-8; ISSN 2184-4321, SciTePress, pages 325-332. DOI: 10.5220/0012313800003660

@conference{visapp24,
author={Mohamad Alansari and Hamad AlRemeithi and Bilal Hassan and Sara Alansari and Jorge Dias and Majid Khonji and Naoufel Werghi and Sajid Javed},
title={Vision-Perceptual Transformer Network for Semantic Scene Understanding},
booktitle={Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP},
year={2024},
pages={325-332},
publisher={SciTePress},
organization={INSTICC},
doi={10.5220/0012313800003660},
isbn={978-989-758-679-8},
issn={2184-4321},
}

TY - CONF

JO - Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 3: VISAPP
TI - Vision-Perceptual Transformer Network for Semantic Scene Understanding
SN - 978-989-758-679-8
IS - 2184-4321
AU - Alansari, M.
AU - AlRemeithi, H.
AU - Hassan, B.
AU - Alansari, S.
AU - Dias, J.
AU - Khonji, M.
AU - Werghi, N.
AU - Javed, S.
PY - 2024
SP - 325
EP - 332
DO - 10.5220/0012313800003660
PB - SciTePress