Deep Learning-Based Digital Surface Model Reconstruction of ZY-3 Satellite Imagery
Abstract
:1. Introduction
2. Methods
- The upsampling procedure within the feature extraction and regularization module substitutes transpose convolution with a PixelShuffle technique and an interpolation method. This approach effectively prevents information loss, thereby producing high-resolution images.
- The network’s convolution component employs grouped convolution in lieu of standard convolution. This approach minimizes the quantity of parameters and reduces system overhead while concurrently ensuring the extraction of crucial information.
- The height map, after the coarse estimation phase, employs a variance-based uncertainty estimation technique. This method adaptively modifies the height search range for the subsequent fine estimation stage, thereby enhancing the precision of the final predicted height value.
2.1. Feature Extraction Module
2.2. Cost Volume Construction
2.2.1. Affine Transformation Based on the RPC Model
2.2.2. Feature Volume Fusion
2.3. Cost Volume Regularization
2.4. Altitude Map Prediction
2.5. DSM Reconstruction Methods and Evaluation Metrics
- Mean absolute error (MAE): This metric represents the average of L1 distances of all grid cell height values between the ground truth DSM and the estimated DSM, as delineated in Equation (12).
- In Equation (12), and denote the ground truth DSM and the estimated DSM, respectively. A function, F(X), is defined to quantify the number of valid grid cells. When X is true, the function returns a value of 1; otherwise, it returns 0. and represent the height estimation value and height ground truth, respectively.
- Root mean square error (RMSE): This metric quantifies the standard deviation of all grid cell height values residuals between the ground truth DSM and the estimated DSM, as delineated in Equation (13).
- Values < 2.5 m and <7.5 m: The proportion of grid cells where the L1 distance (also known as Manhattan distance) between the estimated height value and the actual height value is less than 2.5 m and 7.5 m, respectively. The calculation is delineated in Equation (14).
- Comp: Percentage of grid cells with valid high values in the final DSM, as delineated in Equation (15).
3. Experiments and Results
3.1. Experimental Environment and Dataset
3.2. DSM Reconstruction Process
- The open-source dataset comprises a large-scale resource, specifically the ZY-3 satellite image of dimensions 5120 × 5120 captured in three distinct views. Initially, this image is cropped based on the corresponding overlap rate to yield an output that aligns with the hardware specifications and the capacity of the model being utilized.
- The network model is utilized to estimate the height map of small-scale images, with 768 × 384 small-scale satellite images being inputted.
- The height map from various viewpoints undergoes threshold filtering and is subjected to a left–right consistency check. This is achieved by projecting the data back to the WGS-84 geodetic coordinate system within the object plane. Subsequently, the three-dimensional geographical coordinates, which include latitude, longitude, and altitude, are transformed into planar coordinates using the UTM projection system. These transformed coordinates are then stored as point clouds.
- The grid is partitioned into regions, and the point cloud undergoes resampling to generate a Discrete Surface Model (DSM). This DSM is subsequently compared with the actual DSM to assess the efficacy of the reconstruction process.
3.3. Analysis of Experimental Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kai, F.; Yi, W.; Rui, Z. Deconstruction of Related Technologies of Ground Image Processing Based on High-Resolution Satellite Remote Sensing Images. Mob. Inf. Syst. 2023, 2023, 2896471. [Google Scholar] [CrossRef]
- Xinming, T.; Qingxing, Y.; Xiaoming, G. China DSM Generation and Accuracy Acessment Using ZY3 Images. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 6757–6760. [Google Scholar] [CrossRef]
- Yanan, Z.; Fuguang, D.; Changqing, Z. DEM Extraction and Accuracy Assessment Based on ZY-3 Stereo Images. In Proceedings of the 2012 2nd International Conference on Computer Science and Network Technology, Changchun, China, 29–31 December 2012; pp. 1439–1442. [Google Scholar] [CrossRef]
- Yang, W.; Li, X.; Yang, B.; Yang, Y.; Yan, Y. Dense Matching for DSM Generation From ZY-3 Satellite Imagery. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 3677–3680. [Google Scholar] [CrossRef]
- Hou, Y.; Liu, C.; An, B.; Liu, Y. Stereo Matching Algorithm Based on Improved Census Transform and Texture Filtering. Optik 2022, 249, 168186. [Google Scholar] [CrossRef]
- Lv, D.; Jiao, G. Experiment of Stereo Matching Algorithm Based on Binocular Vision. J. Phys. Conf. Ser. 2020, 1574, 012173. [Google Scholar] [CrossRef]
- Li, G.; Song, H.; Li, C. Matching Algorithm and Parallax Extraction Based on Binocular Stereo Vision. In Proceedings of the Smart Innovations in Communication and Computational Sciences; Panigrahi, B.K., Trivedi, M.C., Mishra, K.K., Tiwari, S., Singh, P.K., Eds.; Springer: Singapore, 2019; pp. 347–355. [Google Scholar] [CrossRef]
- Hartley, R.I.; Saxena, T. The Cubic Rational Polynomial Camera Model. In Proceedings of the Image Understanding Workshop, New Orleans, LA, USA, 11–14 May 1997; Volume 649, p. 653. [Google Scholar]
- Zhang, G.; Yuan, X. On RPC Model of Satellite Imagery. Geo-Spat. Inf. Sci. 2006, 9, 285–292. [Google Scholar] [CrossRef]
- Zhang, L.; Balz, T.; Liao, M. Satellite SAR Geocoding with Refined RPC Model. ISPRS J. Photogramm. Remote Sens. 2012, 69, 37–49. [Google Scholar] [CrossRef]
- Qin, R. Rpc Stereo Processor (Rsp)—A Software Package For Digital Surface Model And Orthophoto Generation From Satellite Stereo ImagerY. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 3, 77–82. [Google Scholar] [CrossRef]
- De Franchis, C.; Meinhardt-Llopis, E.; Michel, J.; Morel, J.-M.; Facciolo, G. An Automatic and Modular Stereo Pipeline for Pushbroom Images. In Proceedings of the ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, Zürich, Switzerland, 5–7 September 2014; Volume II–3, pp. 49–56. [Google Scholar] [CrossRef]
- Facciolo, G.; de Franchis, C.; Meinhardt, E. MGM: A Significantly More Global Matching for Stereovision. In Proceedings of the BMVC 2015, Swansea, UK, 7–10 September 2015. [Google Scholar] [CrossRef]
- Mandun, Z.; Lichao, Q.; Guodong, C.; Ming, Y. A Triangulation Method in 3D Reconstruction from Image Sequences. In Proceedings of the 2009 Second International Conference on Intelligent Networks and Intelligent Systems, Tianjian, China, 1–3 November 2009; pp. 306–308. [Google Scholar] [CrossRef]
- Schönberger, J.L.; Frahm, J.-M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar] [CrossRef]
- Liu, Y.; Li, C.; Gong, J. An Object Reconstruction Method Based on Binocular Stereo Vision. In Proceedings of the Intelligent Robotics and Applications, Wuhan, China, 16–18 August 2017; Huang, Y., Wu, H., Liu, H., Yin, Z., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 486–495. [Google Scholar] [CrossRef]
- Zhang, K.; Snavely, N.; Sun, J. Leveraging Vision Reconstruction Pipelines for Satellite Imagery. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 2139–2148. [Google Scholar] [CrossRef]
- Liu, J.; Ji, S. A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-View Stereo Reconstruction from An Open Aerial Dataset. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Žbontar, J.; Lecun, Y. Computing the Stereo Matching Cost with a Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar] [CrossRef]
- Ji, S.; Liu, J.; Lu, M. CNN-Based Dense Image Matching for Aerial Remote Sensing Images. Photogramm. Eng. Remote Sens. 2019, 85, 415–424. [Google Scholar] [CrossRef]
- Yao, Y.; Luo, Z.; Li, S.; Fang, T.; Quan, L. MVSNet: Depth Inference for Unstructured Multi-View Stereo. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 785–801. [Google Scholar] [CrossRef]
- Gu, X.; Fan, Z.; Zhu, S.; Dai, Z.; Tan, F.; Tan, P. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2492–2501. [Google Scholar] [CrossRef]
- Cheng, S.; Xu, Z.; Zhu, S.; Li, Z.; Li, L.E.; Ramamoorthi, R.; Su, H. Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2521–2531. [Google Scholar] [CrossRef]
- Chen, K.; Zhou, Z.; Li, Y.; Ji, X.; Wu, J.; Coatrieux, J.-L.; Chen, Y.; Coatrieux, G. RED-Net: Residual and Enhanced Discriminative Network for Image Steganalysis in the Internet of Medical Things and Telemedicine. IEEE J. Biomed. Health Inform. 2024, 28, 1611–1622. [Google Scholar] [CrossRef] [PubMed]
- Shewalkar, A.; Nyavanandi, D.; Ludwig, S. Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: Rnn, LSTM and GRU. J. Artif. Intell. Soft Comput. Res. 2019, 9, 235–245. [Google Scholar] [CrossRef]
- Dey, R.; Salem, F.M. Gate-Variants of Gated Recurrent Unit (GRU) Neural Networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar] [CrossRef]
- Singh, R.D.; Mittal, A.; Bhatia, R.K. 3D Convolutional Neural Network for Object Recognition: A Review. Multimed. Tools Appl. 2019, 78, 15951–15995. [Google Scholar] [CrossRef]
- Juarez-Salazar, R.; Zheng, J.; Diaz-Ramirez, V. Distorted Pinhole Camera Modeling and Calibration. Appl. Opt. 2020, 59, 11310–11318. [Google Scholar] [CrossRef] [PubMed]
- Gao, J.; Liu, J.; Ji, S. Rational Polynomial Camera Model Warping for Deep Learning Based Satellite Multi-View Stereo Matching. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 6128–6137. [Google Scholar] [CrossRef]
- Bi, J.; Zhu, Z.; Meng, Q. Transformer in Computer Vision. In Proceedings of the 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China, 24–26 September 2021; pp. 178–188. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 87–110. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
- Sarvamangala, D.R.; Kulkarni, R.V. Convolutional Neural Networks in Medical Image Understanding: A Survey. Evol. Intell. 2022, 15, 1–22. [Google Scholar] [CrossRef] [PubMed]
- Lu, H.; Zhang, Q. Applications of Deep Convolutional Neural Network in Computer Vision. J. Data Acquis. Process. 2016, 31, 1–17. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
- Hisham, M.B.; Yaakob, S.N.; Raof, R.A.A.; Nazren, A.B.A.; Wafi, N.M. Template Matching Using Sum of Squared Difference and Normalized Cross Correlation. In Proceedings of the 2015 IEEE Student Conference on Research and Development (SCOReD), Kuala Lumpur, Malaysia, 13–14 December 2015; pp. 100–104. [Google Scholar] [CrossRef]
- Bindu, N.S.; Sheshadri, H.S. A Comparative Study of Correlation Based Stereo Matching Algorithms: Illumination and Exposure. In Proceedings of the Intelligent Computing, Communication and Devices; Jain, L.C., Patnaik, S., Ichalkaranje, N., Eds.; Springer: New Delhi, India, 2015; pp. 191–201. [Google Scholar] [CrossRef]
- Wei, L.; Zheng, C.; Hu, Y. Oriented Object Detection in Aerial Images Based on the Scaled Smooth L1 Loss Function. Remote Sens. 2023, 15, 1350. [Google Scholar] [CrossRef]
- Feng, Y. An Overview of Deep Learning Optimization Methods and Learning Rate Attenuation Methods. Hans J. Data Min. 2018, 8, 186–200. [Google Scholar] [CrossRef]
- Ding, X.; Yang, H.; Chan, R.H.; Hu, H.; Peng, Y.; Zeng, T. A New Initialization Method for Neural Networks with Weight Sharing. In Proceedings of the Mathematical Methods in Image Processing and Inverse Problems, Beijing, China, 21–24 April 2021; Tai, X.-C., Wei, S., Liu, H., Eds.; Springer: Singapore, 2021; pp. 165–179. [Google Scholar] [CrossRef]
- Zou, F.; Shen, L.; Jie, Z.; Zhang, W.; Liu, W. A Sufficient Condition for Convergences of Adam and RMSProp. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 11119–11127. [Google Scholar] [CrossRef]
- Luus, F.P.S.; Khan, N.; Akhalwaya, I. Active Learning with TensorBoard Projector. arXiv 2019. [Google Scholar] [CrossRef]
Name | Version Parameters and Roles |
---|---|
Operating system | Windows11 (Microsoft, Redmond, WA, USA) |
CPU configuration | Intel(R) Core(TM)i5-12500 @ 3.10 GHz (Intel, Chandler, AZ, USA) |
RAM | 16.0 GB |
GPU configuration | NVIDIA GeForce RTX 3060 (12.0 GB) (NVIDIA, Santa Clara, CA, USA) |
Deep learning framework | Pytorch 1.8.0 (Meta, San Francisco, CA, USA), Achieve GPU acceleration |
Parallel computing architecture | CUDA 11.1/cuDNN 7.6.5, Improve GPU parallel computing capability |
Programming language | Python 3.7.15 |
Management software | Anaconda3 (Anaconda, Austin, TX, USA), Environment manager |
Parameter Name | Parameter Value | Loss (m) ⬇ | MAE (m) ⬇ | RMSE (m) ⬇ | <2.5 m (%) ⬆ | <7.5 m (%) ⬆ |
---|---|---|---|---|---|---|
Initial learning rate | 0.002 | 3.205 | 2.431 | 3.751 | 79.91 | 96.09 |
0.001 | 3.163 | 2.109 | 3.628 | 79.13 | 96.72 | |
0.0001 | 3.192 | 2.317 | 3.940 | 78.39 | 96.13 | |
Learning rate scheduler | StepLR | 3.151 | 2.212 | 4.198 | 79.63 | 96.62 |
LambdaLR | 3.195 | 2.278 | 3.832 | 78.93 | 96.01 | |
SequentialLR | 3.391 | 2.319 | 4.271 | 76.27 | 96.26 | |
Optimizer | SGD | 3.209 | 2.391 | 3.821 | 78.95 | 96.31 |
RMSprop | 2.995 | 2.112 | 3.353 | 80.32 | 96.77 | |
Adam | 3.281 | 2.302 | 3.611 | 79.26 | 96.39 | |
AadamW | 3.271 | 2.293 | 3.297 | 79.79 | 96.64 | |
Weight initialization | Xavier | 3.112 | 2.209 | 3.416 | 79.89 | 96.69 |
Kaiming | 3.097 | 2.261 | 3.304 | 80.51 | 96.38 |
Method | MAE (m) ⬇ | RMSE (m) ⬇ | <2.5 m (%) ⬆ | <7.5 m (%) ⬆ | Comp (%) ⬆ | Params | Model_Size (MB) |
---|---|---|---|---|---|---|---|
SatMVS (RED-Net) | 1.945 | 4.070 | 77.93 | 96.59 | 82.29 | 1,094,523 | 8.43 |
SatMVS (CasMVSNet) | 2.020 | 3.841 | 76.79 | 96.73 | 81.54 | 934,304 | 7.20 |
SatMVS (UCS-Net) | 2.026 | 3.921 | 77.01 | 96.54 | 82.21 | 938,496 | 7.24 |
TC-SatMVSnet | 1.963 | 3.811 | 77.21 | 96.58 | 82.53 | 624,546 | 4.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Y.; Liu, Y.; Gao, S.; Liu, G.; Wan, Z.; Hu, D. Deep Learning-Based Digital Surface Model Reconstruction of ZY-3 Satellite Imagery. Remote Sens. 2024, 16, 2567. https://doi.org/10.3390/rs16142567
Zhao Y, Liu Y, Gao S, Liu G, Wan Z, Hu D. Deep Learning-Based Digital Surface Model Reconstruction of ZY-3 Satellite Imagery. Remote Sensing. 2024; 16(14):2567. https://doi.org/10.3390/rs16142567
Chicago/Turabian StyleZhao, Yanbin, Yang Liu, Shuang Gao, Guohua Liu, Zhiqiang Wan, and Denghui Hu. 2024. "Deep Learning-Based Digital Surface Model Reconstruction of ZY-3 Satellite Imagery" Remote Sensing 16, no. 14: 2567. https://doi.org/10.3390/rs16142567