CTFusion: Convolutions Integrate with Transformers for Multi-modal Image Fusion

Z Shen, J Wang, Z Pan, J Wang, Y Li - Chinese Conference on Pattern …, 2022 - Springer
Z Shen, J Wang, Z Pan, J Wang, Y Li
Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2022Springer
In this paper, we propose a novel pseudo-end-to-end Pre-training multi-model image fusion
network, termed CTFusion, to take advantage of convolution operations and vision
transformer for multi-modal image fusion. Unlike existing pre-trained models that are based
on public datasets, which contain two stages of training with a single input and a fusion
strategy designed manually, our method is a simple single-stage pseudo-end-to-end model
that uses a dual input adaptive fusion method and can be tested directly. Specifically, the …
Abstract
In this paper, we propose a novel pseudo-end-to-end Pre-training multi-model image fusion network, termed CTFusion, to take advantage of convolution operations and vision transformer for multi-modal image fusion. Unlike existing pre-trained models that are based on public datasets, which contain two stages of training with a single input and a fusion strategy designed manually, our method is a simple single-stage pseudo-end-to-end model that uses a dual input adaptive fusion method and can be tested directly. Specifically, the fusion network first adopts a dual dense convolution network to obtain the abundant semantic information, and then the feature map is converted to a token and fed into a multi-path transformer fusion block to model the global-local information of sources images. Finally, we obtain the fusion image by a followed convolutional neural network block. Extensive experiments have been carried out on two publicly available multi-modal datasets, experiment results demonstrate that the proposed model outperforms state-of-the-art methods.
Springer
Showing the best result for this search. See all results