SAR and Optical Image Registration Based on Deep Learning with Co-Attention Matching Module
Abstract
:1. Introduction
- A pair-based deep learning registration framework is proposed. The previous methods are image-based or patch-based, which extract the keypoints by focusing only on a single image or a patch, while a pair-based approach is proposed in this paper. Firstly, deep and dense structure features are extracted by DCNN from individual images in a joint detect-and-describe way, and then the detected keypoint candidates are corrected based on the structure features of the corresponding areas.
- A CAMM for integrating the structure feature maps of both images is proposed. A CAMM is proposed for exploiting the structure features of both images to generate reliable keypoint feature maps. First, the dependencies among all the corresponding features of an image pair are computed based on the co-attention mechanism. Then, the keypoint features in the other image are incorporated into a complementary feature map, according to the attention scores for the keypoint feature map to be extracted.
- A sampling strategy for SAR and optical image descriptor learning is proposed. To better train the Siamese network, new sampling strategies for the descriptors to learn the modality-invariant representations are proposed. Considering the imaging mechanism differences between the SAR and optical images, the pixels that are semantically corresponding in the image pair may not be corresponding in the spatial position of the images, so the positive samples are selected from the neighborhoods of the corresponding pixels, instead of directly adopting the patches centered at the corresponding pixels as the positive samples. Moreover, patches randomly sampled from the whole image are added to the negative samples as distractors in order to augment the distinguishing ability of the descriptor.
2. Related Works
2.1. Joint Image Keypoint Detection and Feature Description
2.2. Attention Mechanism
3. Methodology
3.1. Overall Framework
3.2. CAMM for Pairwise Structural Information
3.3. Loss Function
3.3.1. Detector Loss
3.3.2. Descriptor Loss
3.4. Model Inference
4. Experiment and Analysis
4.1. Experimental Datasets
4.2. Evaluation Metrics
- (a)
- Repeatability. This metric is used to determine the repeatability of the keypoints detected in the image pair. Suppose that image 1 is detected with keypoints and the set of the coordinates is , while image 2 is detected with keypoints and the set of the coordinates is . The points in image 1 are called the inner points if they satisfy the following equation:
- (b)
- Localization error (LE). LE is used to calculate the error between the coordinates of the inner point in (a) and the ground-truth coordinates in order to further evaluate the accuracy of the keypoints extracted by the detector.
- (c)
- Mean matching accuracy (MMA) [34]. MMA is also the percentage of the inner points, but unlike (a), the inner points here are defined as those selected by the homography transformation model estimated by the RANSAC. This metric measures the accuracy of the correspondences generated by mutual brute-force matching of the descriptors, reflecting the performance of both the detectors and descriptors.
- (d)
- Nearest neighbor mean average precision (NN mAP) [26]. This criterion shows the detection ability of detectors and the distinguishing ability of descriptors. It is computed by measuring the area under the curve (AUC) of the precision–recall (PR) curve, using the nearest neighbor matching strategy.
- (e)
- Average corner error (ACE) [35]. This metric integrates the performance of the entire keypoint detection, feature description, and feature-matching pipeline. It is defined as the distance between real coordinates and estimated coordinates of the four corners, which are computed separately according to the ground-truth homography matrix and the estimated one by the RANSAC.
4.3. Experimental Results and Analysis
4.3.1. Detection Capability
4.3.2. Description Capability
4.3.3. Overall Registration Performance
4.3.4. Component Analysis
- (a)
- CAMM
- (b) Sampling strategies
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Xie, H.; Shi, S.; An, D.; Wang, G.; Wang, G.; Xiao, H.; Huang, X.; Zhou, Z.; Xie, C.; Wang, F.; et al. Fast Factorized Backprojection Algorithm for One-Stationary Bistatic Spotlight Circular SAR Image Formation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1494–1510. [Google Scholar]
- Hu, X.; Xie, K.; Xie, H.; Zhang, L.; Hu, J.; He, J.; Yi, S.; Jiang, H. Fast Factorized Backprojection Algorithm in Orthogonal Elliptical Coordinate System for Ocean Scenes Imaging Using Geosynchronous Spaceborne-Airborne VHF UWB Bistatic SAR. Remote Sens. 2023, 15, 2215. [Google Scholar]
- Jiang, X.; Xie, H.; Chen, J.; Zhang, J.; Wang, G.; Xie, K. Arbitrary-Oriented Ship Detection Method Based on Long-Edge Decomposition Rotated Bounding Box Encoding in SAR Images. Remote Sens. 2023, 14, 3599. [Google Scholar]
- Xie, H.; Hu, J.; Duan, K.; Wang, G. High-Efficiency and High-Precision Reconstruction Strategy for P-Band Ultra-Wideband Bistatic Synthetic Aperture Radar Raw Data Including Motion Errors. IEEE Access 2020, 8, 31143–31158. [Google Scholar]
- Kulkarni, S.; Rege, P. Pixel Level Fusion Techniques for SAR and Optical Images: A Review. Inf. Fusion 2020, 59, 13–29. [Google Scholar]
- Wurm, M.; Stark, T.; Zhu, X.; Weigand, M.; Taubenboeck, H. Semantic Segmentation of Slums in Satellite Images Using Transfer Learning on Fully Convolutional Neural Networks. ISPRS J. Photogramm. Remote Sens. 2019, 150, 59–69. [Google Scholar]
- Sun, Y.; Lei, L.; Guan, D.; Kuang, G. Iterative Robust Graph for Unsupervised Change Detection of Heterogeneous Remote Sensing Images. IEEE Trans. Image Process. 2021, 30, 6277–6291. [Google Scholar]
- Hartmann, W.; Havlena, M.; Schindler, K. Recent Developments in Large-Scale Tie-Point Matching. ISPRS J. Photogramm. Remote Sens. 2016, 115, 47–62. [Google Scholar]
- Xiang, D.; Xie, Y.; Cheng, J.; Xu, Y.; Zhang, H.; Zheng, Y. Optical and SAR Image Registration Based on Feature Decoupling Network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5235913. [Google Scholar]
- Ye, Y.; Shan, J.; Bruzzone, L.; Shen, L. Robust Registration of Multimodal Remote Sensing Images Based on Structural Similarity. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2941–2958. [Google Scholar]
- Inglada, J.; Giros, A. On the Possibility of Automatic Multisensor Image Registration. IEEE Trans. Geosci. Remote Sens. 2004, 42, 2104–2120. [Google Scholar]
- Hel-Or, Y.; Hel-Or, H.; David, E. Fast Template Matching in Non-Linear Tone-Mapped Images. In Proceedings of the International Conference on Computer Vision (ICCV), Barcelona, Spain, 6–13 November 2011; pp. 1355–1362. [Google Scholar]
- Jiang, X.; Ma, J.; Xiao, G.; Shao, Z.; Guo, X. A Review of Multimodal Image Matching: Methods and Applications. Inf. Fusion 2021, 73, 22–71. [Google Scholar]
- Ye, Y.; Bruzzone, L.; Shan, J.; Bovolo, F.; Zhu, Q. Fast and Robust Matching for Multimodal Remote Sensing Image Registration. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9059–9070. [Google Scholar]
- Li, J.; Hu, Q.; Ai, M. RIFT: Multi-Modal Image Matching Based on Radiation-Variation Insensitive Feature Transform. IEEE Trans. Image Process. 2020, 29, 3296–3310. [Google Scholar]
- Lowe, D. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar]
- Dellinger, F.; Delon, J.; Gousseau, Y.; Michel, J.; Tupin, F. SAR-SIFT: A SIFT-Like Algorithm for SAR Images. IEEE Trans. Geosci. Remote Sens. 2015, 53, 453–466. [Google Scholar]
- Ma, W.; Wen, Z.; Wu, Y.; Jiao, L.; Gong, M.; Zheng, Y.; Liu, L. Remote Sensing Image Registration with Modified SIFT and Enhanced Feature Matching. IEEE Geosci. Remote Sens. Lett. 2017, 14, 3–7. [Google Scholar]
- Zhu, X.; Montazeri, S.; Ali, M.; Hua, Y.; Wang, Y.; Mou, L.; Shi, Y.; Xu, F.; Bamler, R. Deep Learning Meets SAR: Concepts, Models, Pitfalls, and Perspectives. IEEE Geosci. Remote Sens. Mag. 2021, 9, 143–172. [Google Scholar]
- Zhang, H.; Ni, W.; Yan, W.; Xiang, D.; Wu, J.; Yang, X.; Bian, H. Registration of Multimodal Remote Sensing Image Based on Deep Fully Convolutional Neural Network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 3028–3042. [Google Scholar]
- Zhang, H.; Lei, L.; Ni, W.; Tang, T.; Wu, J.; Xiang, D.; Kuang, G. Optical and SAR Image Matching Using Pixelwise Deep Dense Features. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6000705. [Google Scholar]
- Cui, S.; Ma, A.; Zhang, L.; Xu, M.; Zhong, Y. MAP-Net: SAR and Optical Image Matching via Image-Based Convolutional Network with Attention Mechanism and Spatial Pyramid Aggregated Pooling. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1000513. [Google Scholar] [CrossRef]
- Li, L.; Han, L.; Ye, Y. Self-Supervised Keypoint Detection and Cross-Fusion Matching Networks for Multimodal Remote Sensing Image Registration. Remote Sens. 2022, 14, 3599. [Google Scholar]
- Wiles, O.; Ehrhardt, S.; Zisserman, A. Co-Attention for Conditioned Image Matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 15915–15924. [Google Scholar]
- Yi, K.; Trulls, E.; Lepetit, V.; Fua, P. LIFT: Learned Invariant Feature Transform. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 457–483. [Google Scholar]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. SuperPoint: Self-Supervised Interest Point Detection and Description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 337–349. [Google Scholar]
- Dusmanu, M.; Rocco, I.; Pajdla, T.; Pollefeys, M.; Sivic, J.; Torii, A.; Sattler, T. D2-Net: A Trainable CNN for Joint Description and Detection of Local Features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 8084–8093. [Google Scholar]
- Revaud, J.; Weinzaepfel, P.; De Souza, C.; Pion, N.; Csurka, G.; Cabon, Y.; Humenberger, M. R2D2: Repeatable and reliable detector and descriptor. arXiv 2019, arXiv:1906.06195. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advanced Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- He, K.; Lu, Y.; Sclaroff, S. Local Descriptors Optimized for Average Precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 596–605. [Google Scholar]
- Schmitt, M.; Hughes, L.; Zhu, X. The Sen1-2 Dataset for Deep Learning in Sar-Optical Data Fusion. arXiv 2018, arXiv:1807.01569. [Google Scholar]
- Alsallakh, B.; Kokhlikyan, N.; Miglani, V.; Yuan, J.; Reblitz-Richardson, O. Mind the Pad--CNNs Can Develop Blind Spots. arXiv 2020, arXiv:2010.02178. [Google Scholar]
- Mikolajczyk, K.; Schmid, C. Scale & Affine Invariant Interest Point Detectors. Int. J. Comput. Vis. 2004, 60, 63–86. [Google Scholar]
- Ye, Y.; Tang, T.; Zhu, B.; Yang, C.; Li, B.; Hao, S. A Multiscale Framework with Unsupervised Learning for Remote Sensing Image Registration. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5622215. [Google Scholar]
SIFT | SAR-SIFT | RIFT | Superpoint | R2D2 | CAM-Net | |
---|---|---|---|---|---|---|
LE | 1.9149 | 1.8319 | 1.7311 | 1.9429 | 1.9018 | 1.9619 |
Repeatability | 0.1309 | 0.1402 | 0.4238 | 0.3914 | 0.3626 | 0.4338 |
SIFT | SAR-SIFT | RIFT | Superpoint | R2D2 | CAM-Net | |
---|---|---|---|---|---|---|
NN mAP | 0.0011 | 0.1692 | 0.0595 | 0.1796 | 0.1794 | 0.3090 |
SIFT | SAR-SIFT | RIFT | Superpoint | R2D2 | CAM-Net | |
---|---|---|---|---|---|---|
ACE | - | - | 49.7911 | 37.0389 | 27.0098 | 7.1459 |
Repeatability | LE | NN mAP | ACE | |
---|---|---|---|---|
CAM-Net | 0.4338 | 1.9619 | 0.3090 | 7.1459 |
CAM-Net w/o CAMM | 0.3839 | 1.9293 | 0.2476 | 20.5072 |
PSP | NSA | Repeatability | LE | NN mAP | ACE | |
---|---|---|---|---|---|---|
CAM-Net w/o PSP and NSA | 0.2062 | 1.9891 | 0.0307 | - | ||
CAM-Net w/o NSA | √ | 0.2030 | 1.9886 | 0.0417 | - | |
CAM-Net w/o PSP | √ | 0.3238 | 1.9474 | 0.2950 | 8.0344 | |
CAM-Net | √ | √ | 0.4338 | 1.9619 | 0.3090 | 7.1459 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, J.; Xie, H.; Zhang, L.; Hu, J.; Jiang, H.; Wang, G. SAR and Optical Image Registration Based on Deep Learning with Co-Attention Matching Module. Remote Sens. 2023, 15, 3879. https://doi.org/10.3390/rs15153879
Chen J, Xie H, Zhang L, Hu J, Jiang H, Wang G. SAR and Optical Image Registration Based on Deep Learning with Co-Attention Matching Module. Remote Sensing. 2023; 15(15):3879. https://doi.org/10.3390/rs15153879
Chicago/Turabian StyleChen, Jiaxing, Hongtu Xie, Lin Zhang, Jun Hu, Hejun Jiang, and Guoqian Wang. 2023. "SAR and Optical Image Registration Based on Deep Learning with Co-Attention Matching Module" Remote Sensing 15, no. 15: 3879. https://doi.org/10.3390/rs15153879
APA StyleChen, J., Xie, H., Zhang, L., Hu, J., Jiang, H., & Wang, G. (2023). SAR and Optical Image Registration Based on Deep Learning with Co-Attention Matching Module. Remote Sensing, 15(15), 3879. https://doi.org/10.3390/rs15153879