HAVANA: Hard Negative Sample-Aware Self-Supervised Contrastive Learning for Airborne Laser Scanning Point Cloud Semantic Segmentation
Abstract
:1. Introduction
- A self-supervised contrastive learning scheme is introduced for point cloud semantic segmentation. The meaningful information representation of unlabeled large-scale ALS point clouds is learned using contrastive learning, and then transferred to small local samples for high-level semantic segmentation tasks, resulting in improved semantic segmentation performance.
- We design AbsPAN, a strategy for selecting positive and negative samples for contrastive learning. This strategy employs an unsupervised clustering algorithm to remove potentially false-negative samples, ensuring that contrastive learning obtains meaningful information.
- When full of training data are used, the proposed method performed better than the strategy of training from scratch. This means that self-supervised learning is a promising way to improve the performance of deep learning methods for point cloud semantic segmentation.
2. Methodology
2.1. Overview
2.2. Point Cloud Contrastive Learning
2.2.1. Auxiliary Pre-Task
2.2.2. Backbone Network
2.2.3. AbsPAN: Negative Samples Mining Based on Clustering
- ( is experimentally set to 4096 in this paper) anchor point is randomly selected in block 1. Then, correspondence matched point is chosen as the positive sample for each anchor point ;
- ( is experimentally set to 2048 in this paper) anchor points are randomly selected from the pair for the hardest negative sample selection. To ensure that the negative sample belongs to a different category, the point with the closest features (the feature is embedded by the KP-FCNN) to point in block 2 is chosen as the hardest negative sample candidate and denoted as . If the pseudo label of is not equal to , is a true candidate for the hardest negative sample of . Otherwise, will be removed and the next point with the nearest feature will be validated until the true candidate is obtained. After getting the hardest negative sample of in block 2, the same process will be performed for the to search for the hardest negative sample in block 1.
2.2.4. Loss Function Design
3. Experiments
3.1. Experimental Dataset
3.2. Evaluation Metrics
3.3. Parameters Set Up
3.4. Experimental Results and Analysis
3.4.1. Effectiveness of SSL
3.4.2. Hardest-Contrastive vs. AbsPAN
3.4.3. Performance with Different Amounts of Training Data
3.4.4. Further Experiment Results
4. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Liu, X. High technologies of surveying and mapping for social progress. Sci. Surv. Mapp. 2019, 44, 1–15. [Google Scholar]
- Guo, R.; Lin, H.; He, B.; Zhao, Z. GIS framework for smart cities. Geomat. Inf. Sci. Wuhan Univ. 2020, 45, 1829–1835. [Google Scholar]
- Liu, W.; Zang, Y.; Xiong, Z.; Bian, X.; Wen, C.; Lu, X.; Wang, C.; Junior, J.M.; Gonçalves, W.N.; Li, J. 3D building model generation from MLS point cloud and 3D mesh using multi-source data fusion. Int. J. Appl. Earth Obs. Geoinf. 2023, 116, 103171. [Google Scholar] [CrossRef]
- Weinmann, M.; Schmidt, A.; Mallet, C.; Hinz, S.; Rottensteiner, F.; Jutzi, B. Contextual classification of point cloud data by exploiting individual 3D neigbourhoods. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. II-3 2015, 2, 271–278. [Google Scholar] [CrossRef]
- Weinmann, M.; Jutzi, B.; Hinz, S.; Mallet, C. Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS J. Photogramm. Remote Sens. 2015, 105, 286–304. [Google Scholar] [CrossRef]
- Niemeyer, J.; Rottensteiner, F.; Soergel, U. Contextual classification of lidar data and building object detection in urban areas. ISPRS J. Photogramm. Remote Sens. 2014, 87, 152–165. [Google Scholar] [CrossRef]
- Rusu, R.B.; Marton, Z.C.; Blodow, N.; Dolha, M.; Beetz, M. Towards 3D point cloud based object maps for household environments. Robot. Auton. Syst. 2008, 56, 927–941. [Google Scholar] [CrossRef]
- Tombari, F.; Salti, S.; Di Stefano, L. Unique signatures of histograms for local surface description. In Proceedings of the Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Greece, 5–11 September 2010; pp. 356–369. [Google Scholar]
- Jie, S.; Zulong, L. Airborne LiDAR Feature Selection for Urban Classification Using Random Forests. Geomat. Inf. Sci. Wuhan Univ. 2014, 39, 1310. [Google Scholar]
- Weinmann, M. Feature relevance assessment for the semantic interpretation of 3D point cloud data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, 2, 313–318. [Google Scholar] [CrossRef]
- Zhao, R.; Pang, M.; Wang, J. Classifying airborne LiDAR point clouds via deep features learned by a multi-scale convolutional neural network. Int. J. Geogr. Inf. Sci. 2018, 32, 960–979. [Google Scholar] [CrossRef]
- Schmohl, S.; Sörgel, U. Submanifold sparse convolutional networks for semantic segmentation of large-scale ALS point clouds. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, 4, 77–84. [Google Scholar] [CrossRef]
- Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; Savarese, S. Segcloud: Semantic segmentation of 3d point clouds. In Proceedings of the 2017 international conference on 3D vision (3DV), Qingdao, China, 10–12 October 2017; pp. 537–547. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C. Pointsift: A sift-like network module for 3d point cloud semantic segmentation. arXiv 2018, arXiv:1807.00652. [Google Scholar]
- Arief, H.A.; Indahl, U.G.; Strand, G.H.; Tveite, H. Addressing overfitting on point cloud classification using Atrous XCRF. ISPRS J. Photogramm. Remote Sens. 2019, 155, 90–101. [Google Scholar] [CrossRef]
- Wen, C.; Li, X.; Yao, X.; Peng, L.; Chi, T. Airborne LiDAR point cloud classification with global-local graph attention convolution neural network. ISPRS J. Photogramm. Remote Sens. 2021, 173, 181–194. [Google Scholar] [CrossRef]
- Shorten, C.; Khoshgoftaar, T.M. A survey on image data augmentation for deep learning. J. Big Data 2019, 6, 1–48. [Google Scholar] [CrossRef]
- Wang, P.; Yao, W. A new weakly supervised approach for ALS point cloud semantic segmentation. ISPRS J. Photogramm. Remote Sens. 2022, 188, 237–254. [Google Scholar] [CrossRef]
- Lei, X.; Guan, H.; Ma, L.; Yu, Y.; Dong, Z.; Gao, K.; Delavar, M.R.; Li, J. WSPointNet: A multi-branch weakly supervised learning network for semantic segmentation of large-scale mobile laser scanning point clouds. Int. J. Appl. Earth Obs. Geoinf. 2022, 115, 103129. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, J.; Kan, M.; Shan, S.; Chen, X. Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12275–12284. [Google Scholar]
- Ayush, K.; Uzkent, B.; Meng, C.; Tanmay, K.; Burke, M.; Lobell, D.; Ermon, S. Geography-aware self-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10181–10190. [Google Scholar]
- Sharma, C.; Kaul, M. Self-supervised few-shot learning on point clouds. Adv. Neural Inf. Process. Syst. 2020, 33, 7212–7221. [Google Scholar]
- Liu, Y.; Yi, L.; Zhang, S.; Fan, Q.; Funkhouser, T.; Dong, H. P4Contrast: Contrastive Learning with Pairs of Point-Pixel Pairs for RGB-D Scene Understanding. arXiv 2020, arXiv:1807.00652. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
- Rao, Y.; Lu, J.; Zhou, J. Global-local bidirectional reasoning for unsupervised representation learning of 3d point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5376–5385. [Google Scholar]
- Xie, S.; Gu, J.; Guo, D.; Qi, C.R.; Guibas, L.; Litany, O. Pointcontrast: Unsupervised pre-training for 3d point cloud understanding. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; pp. 574–591. [Google Scholar]
- Sauder, J.; Sievers, B. Self-supervised deep learning on point clouds by reconstructing space. Adv. Neural Inf. Process. Syst. 2019, 32, 12962–12972. [Google Scholar]
- Poursaeed, O.; Jiang, T.; Qiao, H.; Xu, N.; Kim, V.G. Self-supervised learning of point clouds via orientation estimation. In Proceedings of the 2020 International Conference on 3D Vision (3DV), Fukuoka, Japan, 25–28 November 2020; pp. 1018–1028. [Google Scholar]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Wang, J.; Song, Y.; Leung, T.; Rosenberg, C.; Wang, J.; Philbin, J.; Chen, B.; Wu, Y. Learning fine-grained image similarity with deep ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1386–1393. [Google Scholar]
- Oh Song, H.; Xiang, Y.; Jegelka, S.; Savarese, S. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4004–4012. [Google Scholar]
- Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. Adv. Neural Inf. Process. Syst. 2016, 29. Available online: https://dl.acm.org/doi/10.5555/3157096.3157304 (accessed on 14 January 2024).
- Choy, C.; Park, J.; Koltun, V. Fully convolutional geometric features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8958–8966. [Google Scholar]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 6411–6420. [Google Scholar]
- Hou, J.; Graham, B.; Nießner, M.; Xie, S. Exploring data-efficient 3d scene understanding with contrastive scene contexts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15587–15597. [Google Scholar]
- Varney, N.; Asari, V.K.; Graehling, Q. DALES: A large-scale aerial LiDAR data set for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 186–187. [Google Scholar]
- Choy, C.; Gwak, J.; Savarese, S. 4d spatio-temporal convnets: Minkowski convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3075–3084. [Google Scholar]
- Zhang, Z.; Girdhar, R.; Joulin, A.; Misra, I. Self-supervised pretraining of 3d features on any point-cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10252–10263. [Google Scholar]
- Wen, C.; Yang, L.; Li, X.; Peng, L.; Chi, T. Directionally constrained fully convolutional neural network for airborne LiDAR point cloud classification. ISPRS J. Photogramm. Remote Sens. 2020, 162, 50–62. [Google Scholar] [CrossRef]
- Huang, R.; Xu, Y.; Hong, D.; Yao, W.; Ghamisi, P.; Stilla, U. Deep point embedding for urban classification using ALS point clouds: A new perspective from local to global. ISPRS J. Photogramm. Remote Sens. 2020, 163, 62–81. [Google Scholar] [CrossRef]
- Ye, Z.; Xu, Y.; Huang, R.; Tong, X.; Li, X.; Liu, X.; Luan, K.; Hoegner, L.; Stilla, U. Lasdu: A large-scale aerial lidar dataset for semantic labeling in dense urban areas. ISPRS Int. J. Geo-Inf. 2020, 9, 450. [Google Scholar] [CrossRef]
- Huang, R.; Xu, Y.; Stilla, U. GraNet: Global relation-aware attentional network for semantic segmentation of ALS point clouds. ISPRS J. Photogramm. Remote Sens. 2021, 177, 1–20. [Google Scholar] [CrossRef]
Geometric Feature | Design Formulas |
---|---|
Planarity | |
Surface Variation | ) |
Verticality | |
Normal Vector |
Hyper-Parameter | Layer 1 | Layer 2 | Layer 3 | Layer 4 | Layer 5 |
---|---|---|---|---|---|
Down-sampling grid size (m) | 0.4 | 0.8 | 1.6 | 3.2 | 6.4 |
Convolution radius (m) | 2.5 | 5.0 | 10 | 20 | 40 |
Settings | Methods | Power | Low_VEG | Imp_SURF | Car | Fence/Hedge | Roof | Facade | Shrub | Tree | OA | Avg. F1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
training dataset (100%) | PointNet++ | 57.9 | 79.6 | 90.6 | 66.1 | 31.5 | 91.6 | 54.3 | 41.6 | 77.0 | 81.2 | 65.6 |
PointSIFT | 55.7 | 80.7 | 90.9 | 77.8 | 30.5 | 92.5 | 56.9 | 44.4 | 79.6 | 82.2 | 67.7 | |
D-FCN | 70.4 | 80.2 | 91.4 | 78.1 | 37.0 | 93.0 | 60.5 | 46.0 | 79.4 | 82.2 | 70.7 | |
RandLA-Net | 76.4 | 80.2 | 91.7 | 78.4 | 37.4 | 94.2 | 60.1 | 45.2 | 79.9 | 82.8 | 71.5 | |
DPE | 68.1 | 86.5 | 99.3 | 75.2 | 19.5 | 91.1 | 44.2 | 39.4 | 72.6 | 83.2 | 66.2 | |
KP-FCNN | 63.1 | 82.3 | 91.4 | 72.5 | 25.2 | 94.4 | 60.3 | 44.9 | 81.2 | 83.7 | 68.4 | |
HAVANA | 57.6 | 82.2 | 91.4 | 79.8 | 39.3 | 94.8 | 63.9 | 46.5 | 82.6 | 84.5 | 70.9 | |
training dataset (10%) | Pointnet++ * | 58.6 | 66.2 | 78.6 | 28.9 | 25.1 | 87.1 | 61.2 | 43.1 | 72.6 | 72.1 | 57.9 |
KP-FCNN * | 63.2 | 76.4 | 85.9 | 50.4 | 18.8 | 84.7 | 54.7 | 40.8 | 69.9 | 75.9 | 60.5 | |
HAVANA * | 60.2 | 80.1 | 90.2 | 52.5 | 26.2 | 90.0 | 55.6 | 46.4 | 72.3 | 79.8 | 63.7 |
Methods | OA (%) | Avg. F1 (%) |
---|---|---|
MinkowskiNet | 74.6 | 58.8 |
MinkowskiNet (Hardest-Contrastive) | 76.4 | 59.5 |
MinkowskiNet (AbsPAN) | 76.8 | 61.4 |
KP-FCNN | 75.9 | 60.5 |
KP-FCNN (Hardest-Contrastive) | 78.9 | 63.1 |
KP-FCNN (AbsPAN) | 79.8 | 64.1 |
Methods | Artifacts | Buildings | Ground | Low_veg | Trees | OA | Avg. F1 |
---|---|---|---|---|---|---|---|
Pointnet++ | 31.3 | 90.6 | 87.7 | 63.2 | 82.0 | 82.8 | 71.0 |
PointSIFT | 38.0 | 94.3 | 88.8 | 64.4 | 85.5 | 84.9 | 74.2 |
KP-FCNN | 44.2 | 95.7 | 88.7 | 65.6 | 85.9 | 85.4 | 76.0 |
DPE | 36.9 | 93.2 | 88.7 | 65.2 | 82.2 | 84.4 | 73.3 |
GraNet | 42.4 | 95.8 | 89.9 | 64.7 | 86.1 | 86.2 | 75.8 |
HAVANA (Ours) | 47.2 (+3.0) | 96.1 (+0.3) | 90.8 (+0.9) | 65.7 (+0.1) | 87.8 (+1.7) | 87.6 (+1.4) | 77.5 (+1.5) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Yao, J.; Zhang, R.; Wang, X.; Chen, S.; Fu, H. HAVANA: Hard Negative Sample-Aware Self-Supervised Contrastive Learning for Airborne Laser Scanning Point Cloud Semantic Segmentation. Remote Sens. 2024, 16, 485. https://doi.org/10.3390/rs16030485
Zhang Y, Yao J, Zhang R, Wang X, Chen S, Fu H. HAVANA: Hard Negative Sample-Aware Self-Supervised Contrastive Learning for Airborne Laser Scanning Point Cloud Semantic Segmentation. Remote Sensing. 2024; 16(3):485. https://doi.org/10.3390/rs16030485
Chicago/Turabian StyleZhang, Yunsheng, Jianguo Yao, Ruixiang Zhang, Xuying Wang, Siyang Chen, and Han Fu. 2024. "HAVANA: Hard Negative Sample-Aware Self-Supervised Contrastive Learning for Airborne Laser Scanning Point Cloud Semantic Segmentation" Remote Sensing 16, no. 3: 485. https://doi.org/10.3390/rs16030485