Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Jin, Youngwan; Park, Incheol; Song, Hanbin; Ju, Hyeongjin; Nalcakan, Yagiz; Kim, Shiho

Computer Science > Computer Vision and Pattern Recognition

arXiv:2409.16706 (cs)

[Submitted on 25 Sep 2024]

Title:Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Authors:Youngwan Jin, Incheol Park, Hanbin Song, Hyeongjin Ju, Yagiz Nalcakan, Shiho Kim

View PDF HTML (experimental)

Abstract:This paper proposes Pix2Next, a novel image-to-image translation framework designed to address the challenge of generating high-quality Near-Infrared (NIR) images from RGB inputs. Our approach leverages a state-of-the-art Vision Foundation Model (VFM) within an encoder-decoder architecture, incorporating cross-attention mechanisms to enhance feature integration. This design captures detailed global representations and preserves essential spectral characteristics, treating RGB-to-NIR translation as more than a simple domain transfer problem. A multi-scale PatchGAN discriminator ensures realistic image generation at various detail levels, while carefully designed loss functions couple global context understanding with local feature preservation. We performed experiments on the RANUS dataset to demonstrate Pix2Next's advantages in quantitative metrics and visual quality, improving the FID score by 34.81% compared to existing methods. Furthermore, we demonstrate the practical utility of Pix2Next by showing improved performance on a downstream object detection task using generated NIR data to augment limited real NIR datasets. The proposed approach enables the scaling up of NIR datasets without additional data acquisition or annotation efforts, potentially accelerating advancements in NIR-based computer vision applications.

Comments:	19 pages,12 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2409.16706 [cs.CV]
	(or arXiv:2409.16706v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2409.16706

Submission history

From: Youngwan Jin [view email]
[v1] Wed, 25 Sep 2024 07:51:47 UTC (44,732 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators