Google Scholar

CTFusion: Convolutions Integrate with Transformers for Multi-modal Image Fusion

Z Shen, J Wang, Z Pan, J Wang, Y Li - Chinese Conference on Pattern …, 2022 - Springer

Z Shen, J Wang, Z Pan, J Wang, Y Li

Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2022•Springer

Abstract

In this paper, we propose a novel pseudo-end-to-end Pre-training multi-model image fusion network, termed CTFusion, to take advantage of convolution operations and vision transformer for multi-modal image fusion. Unlike existing pre-trained models that are based on public datasets, which contain two stages of training with a single input and a fusion strategy designed manually, our method is a simple single-stage pseudo-end-to-end model that uses a dual input adaptive fusion method and can be tested directly. Specifically, the fusion network first adopts a dual dense convolution network to obtain the abundant semantic information, and then the feature map is converted to a token and fed into a multi-path transformer fusion block to model the global-local information of sources images. Finally, we obtain the fusion image by a followed convolutional neural network block. Extensive experiments have been carried out on two publicly available multi-modal datasets, experiment results demonstrate that the proposed model outperforms state-of-the-art methods.

Springer

Show moreShow less

Save Cite Cited by 2 Related articles All 2 versions

Showing the best result for this search. See all results

Cite

Advanced search

Saved to My library

CTFusion: Convolutions Integrate with Transformers for Multi-modal Image Fusion