RSMT: A Remote Sensing Image-to-Map Translation Model via Adversarial Deep Transfer Learning
Abstract
:1. Introduction
- 1.
- We propose a novel remote sensing image-to-map translation framework named RSMT which extensively achieves functional capabilities to generate maps for multiple regions using adversarial deep transfer learning schemes. RMST has the ability to learn generalized patterns by extracting the content and style representations from remote sensing images and maps, which solves the limitation that the previous map generation model only applied to the testing datasets of the same region with training sets.
- 2.
- RSMT uses spatial attention feature maps extracted from the discriminator to help the generator explicitly capture the point of interest in source classes and unseen target classes. To further improve the proposed model’s performance, we proposed a feature map loss function based on the spatial attention computed by discriminator to preserve domain-specific features during training. Moreover, we also introduce a novel map identity loss to improve the transfer capability of generator.
- 3.
- To demonstrate the effectiveness of deploying spatial attention mechanism in remote sensing image-to-map translation tasks, we conduct extensive experiments on different datasets worldwide to validate the usability and applicability of the proposed method. The quantitative and qualitative results show that RSMT significantly outperforms the state-of-the-art models.
2. Related Work
2.1. Image-to-Image Translation
2.2. Attentional Mechanism
2.3. Deep Transfer Learning Based on Adversarial Mechanism
3. Methods
3.1. Overall Architecture
3.2. Generator
3.3. Discriminator with Spatial Attention Mechanism
3.4. Loss Function
3.4.1. Adversarial Loss
3.4.2. Map Consistency Loss
4. Results
4.1. Datasets and Experimental Setups
4.2. Evaluation Metrics
4.2.1. Root Mean Square Error
4.2.2. Structural Similarity
4.2.3. Pixel Accuracy
4.3. Baselines
- CycleGAN: The cycle-consistency constrains are proposed by CycleGAN [2], which makes a mapping from X to Y, :X→Y and its inverse mapping from Y to X, :Y→X. The objective function restricts to , and .
- MapGen-GAN: MapGen-GAN [3] is our previous work for map translation tasks. The framework is based on circularity and geometrical consistency constraints, transforming remote sensing images to maps directly and reducing the translation’s semantic distortions.
- FUNIT: FUNIT [4] is an image-to-image translation framework based on few-shot learning that can translate images for previous unseen target classes. It can train multiple image-to-image processes in a network and even load multiple datasets in the same network.
4.4. Comparisons with Baselines
4.4.1. Objective Evaluation
4.4.2. Subjective Evaluation
4.5. Numbers of Input Map Classes during Training
4.6. Ablation Study
- RSMT-no-: RSMT without spatial attention fed to the generator.
- RSMT-no-: RSMT without feature map loss function constraint.
- RSMT-no-: RSMT without map consistency loss function constraint.
4.7. Limitations
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Fu, H.; Gong, M.; Wang, C.; Batmanghelich, K.; Zhang, K.; Tao, D. Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2427–2436. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Song, J.; Li, J.; Chen, H.; Wu, J. MapGen-GAN: A Fast Translator for Remote Sensing Image to Map Via Unsupervised Adversarial Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 2021, 14, 2341–2357. [Google Scholar] [CrossRef]
- Liu, M.Y.; Huang, X.; Mallya, A.; Karras, T.; Aila, T.; Lehtinen, J.; Kautz, J. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 10551–10560. [Google Scholar]
- Choi, Y.; Uh, Y.; Yoo, J.; Ha, J.W. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8188–8197. [Google Scholar]
- Kim, J.; Kim, M.; Kang, H.; Lee, K.H. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Tang, H.; Xu, D.; Sebe, N.; Yan, Y. Attention-guided generative adversarial networks for unsupervised image-to-image translation. In Proceedings of the IEEE 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–8. [Google Scholar]
- Alami Mejjati, Y.; Richardt, C.; Tompkin, J.; Cosker, D.; Kim, K.I. Unsupervised Attention-guided Image-to-Image Translation. Adv. Neural Inf. Process. Syst. 2018, 31, 3693–3703. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Adv. Neural Inf. Process. Syst. 2014, 3, 2672–2680. [Google Scholar] [CrossRef]
- Kim, T.; Cha, M.; Kim, H.; Lee, J.K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1857–1865. [Google Scholar]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
- Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.Y. Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 319–345. [Google Scholar]
- Wu, P.W.; Lin, Y.J.; Chang, C.H.; Chang, E.Y.; Liao, S.W. Relgan: Multi-domain image-to-image translation via relative attributes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 5914–5922. [Google Scholar]
- Alharbi, Y.; Smith, N.; Wonka, P. Latent filter scaling for multimodal unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1458–1466. [Google Scholar]
- Almahairi, A.; Rajeshwar, S.; Sordoni, A.; Bachman, P.; Courville, A. Augmented cyclegan: Learning many-to-many mappings from unpaired data. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 195–204. [Google Scholar]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8789–8797. [Google Scholar]
- Lin, J.; Xia, Y.; Wang, Y.; Qin, T.; Chen, Z. Image-to-image translation with multi-path consistency regularization. arXiv 2019, arXiv:1905.12498. [Google Scholar]
- Hui, L.; Li, X.; Chen, J.; He, H.; Yang, J. Unsupervised multi-domain image translation with domain-specific encoders/decoders. In Proceedings of the IEEE 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 2044–2049. [Google Scholar]
- Zhao, B.; Chang, B.; Jie, Z.; Sigal, L. Modular generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 150–165. [Google Scholar]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1412–1421. [Google Scholar]
- Cheng, Y. Agreement-based joint training for bidirectional attention-based neural machine translation. In Joint Training for Neural Machine Translation; Springer: Singapore, 2019; pp. 11–23. [Google Scholar]
- Bahuleyan, H.; Mou, L.; Vechtomova, O.; Poupart, P. Variational Attention for Sequence-to-Sequence Models. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 21–25 August 2018; pp. 1672–1682. [Google Scholar]
- Chiu, C.C.; Sainath, T.N.; Wu, Y.; Prabhavalkar, R.; Nguyen, P.; Chen, Z.; Kannan, A.; Weiss, R.J.; Rao, K.; Gonina, E.; et al. State-of-the-art speech recognition with sequence-to-sequence models. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 4774–4778. [Google Scholar]
- Chen, L.C.; Yang, Y.; Wang, J.; Xu, W.; Yuille, A.L. Attention to scale: Scale-aware semantic image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3640–3649. [Google Scholar]
- Rout, L.; Misra, I.; Moorthi, S.M.; Dhar, D. S2a: Wasserstein gan with spatio-spectral laplacian attention for multi-spectral band synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 188–189. [Google Scholar]
- Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 2048–2057. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
- Emami, H.; Aliabadi, M.M.; Dong, M.; Chinnam, R.B. Spa-gan: Spatial attention gan for image-to-image translation. IEEE Trans. Multimed. 2020, 23, 391–401. [Google Scholar] [CrossRef] [Green Version]
- Tang, H.; Liu, H.; Xu, D.; Torr, P.H.; Sebe, N. Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks. arXiv 2019, arXiv:1911.11897. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In International Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2018; pp. 270–279. [Google Scholar]
- Ganin, Y.; Lempitsky, V. Unsupervised domain adaptation by backpropagation. In Proceedings of the International Conference on Machine Learning, Lille, France, 7–9 July 2015; pp. 1180–1189. [Google Scholar]
- Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M. Domain-adversarial neural networks. arXiv 2014, arXiv:1412.4446. [Google Scholar]
- Tzeng, E.; Hoffman, J.; Darrell, T.; Saenko, K. Simultaneous deep transfer across domains and tasks. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4068–4076. [Google Scholar]
- Tzeng, E.; Hoffman, J.; Saenko, K.; Darrell, T. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7167–7176. [Google Scholar]
- Luo, Z.; Zou, Y.; Hoffman, J.; Fei-Fei, L. Label efficient learning of transferable representations across domains and tasks. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 164–176. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv 2016, arXiv:1612.03928. [Google Scholar]
- Lee, H.Y.; Tseng, H.Y.; Huang, J.B.; Singh, M.; Yang, M.H. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 35–51. [Google Scholar]
- Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1501–1510. [Google Scholar]
- Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2337–2346. [Google Scholar]
- Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5769–5779. [Google Scholar]
Method | Los Angeles | Riyadh | Vancouver | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE | SSIM | ACC (%) | RMSE | SSIM | ACC (%) | RMSE | SSIM | ACC (%) | |
CycleGAN | 30.2375 | 0.4931 | 22.1454 | 34.2375 | 0.4683 | 20.8642 | 27.9847 | 0.5543 | 23.8743 |
DualGAN | 32.4654 | 0.4865 | 22.5231 | 35.1483 | 0.4526 | 19.3203 | 28.4632 | 0.5499 | 24.0580 |
MapGenGAN | 33.7613 | 0.5046 | 25.6213 | 34.0984 | 0.4659 | 21.6732 | 27.0352 | 0.5460 | 26.8993 |
FUNIT | 24.8874 | 0.6673 | 34.4627 | 27.4577 | 0.6475 | 32.7814 | 23.6244 | 0.6698 | 35.9849 |
RSMT | 20.2485 | 0.6822 | 45.7756 | 23.5894 | 0.6622 | 42.6528 | 20.0353 | 0.6964 | 43.5721 |
Method | Los Angeles | Riyadh | Vancouver | ||||||
---|---|---|---|---|---|---|---|---|---|
Similarity | Fidelity | Availability | Similarity | Fidelity | Availability | Similarity | Fidelity | Availability | |
CycleGAN | 3.55 | 3.70 | 2.85 | 2.65 | 3.40 | 3.35 | 3.55 | 3.30 | 3.45 |
DualGAN | 4.15 | 4.10 | 4.20 | 3.60 | 3.75 | 3.55 | 4.15 | 4.00 | 3.90 |
MapGenGAN | 5.45 | 5.30 | 5.75 | 4.45 | 4.20 | 4.80 | 4.60 | 4.05 | 4.55 |
FUNIT | 7.70 | 7.95 | 8.30 | 7.35 | 7.60 | 8.15 | 7.40 | 7.35 | 7.95 |
RSMT | 8.95 | 9.25 | 9.20 | 8.65 | 8.50 | 9.00 | 9.05 | 8.75 | 8.90 |
Method | ||||||
---|---|---|---|---|---|---|
RMSE | SSIM | ACC(%) | RMSE | SSIM | ACC(%) | |
RSMT-no- | 25.9228 | 0.5212 | 40.2405 | 24.8307 | 0.5456 | 41.7637 |
RSMT-no- | 23.1028 | 0.5738 | 43.7245 | 22.9706 | 0.5869 | 44.0631 |
RSMT-no- | 21.2417 | 0.6446 | 45.2739 | 21.0378 | 0.6798 | 45.0195 |
RSMT | 20.7162 | 0.6784 | 45.5392 | 20.2485 | 0.6822 | 45.7756 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Song, J.; Li, J.; Chen, H.; Wu, J. RSMT: A Remote Sensing Image-to-Map Translation Model via Adversarial Deep Transfer Learning. Remote Sens. 2022, 14, 919. https://doi.org/10.3390/rs14040919
Song J, Li J, Chen H, Wu J. RSMT: A Remote Sensing Image-to-Map Translation Model via Adversarial Deep Transfer Learning. Remote Sensing. 2022; 14(4):919. https://doi.org/10.3390/rs14040919
Chicago/Turabian StyleSong, Jieqiong, Jun Li, Hao Chen, and Jiangjiang Wu. 2022. "RSMT: A Remote Sensing Image-to-Map Translation Model via Adversarial Deep Transfer Learning" Remote Sensing 14, no. 4: 919. https://doi.org/10.3390/rs14040919
APA StyleSong, J., Li, J., Chen, H., & Wu, J. (2022). RSMT: A Remote Sensing Image-to-Map Translation Model via Adversarial Deep Transfer Learning. Remote Sensing, 14(4), 919. https://doi.org/10.3390/rs14040919