O2SAT: Object-Oriented-Segmentation-Guided Spatial-Attention Network for 3D Object Detection in Autonomous Vehicles
Abstract
:1. Introduction
- We propose SAT to augment object points from the road surface. This enables an attention-based, channel-wise feature reweighting module to learn more discriminating features, enhancing the overall detection performance.
- We designed the OOS module to segment the road surface, with its road-aware orientation branch feeding into the object-aware sampling module and the 3D detection head. This enables the detection of objects without spatial limitations for anchor-free center point generation.
- The SFR module leverages self-attention-based spatial encoding to embed contextual information into each point feature, enhancing feature representation and suppressing noise. This adaptive reweighting enhances feature discrimination, thereby improving the detection system performance.
- Our method’s effectiveness and robustness across diverse datasets (KITTI and SlopedKITTI) have been demonstrated through experiments conducted in various urban traffic environments.
2. Related Work
2.1. Point Cloud Representations for 3D Object Detection
2.1.1. Voxel-Based Methods
2.1.2. Point-Based Methods
2.1.3. Ground Segmentation from 3D Object Detection
2.1.4. Attention-Based Methods
3. Methodology
3.1. Overview
3.2. Point-Based Backbone
3.3. Object-Oriented Segmentation
3.3.1. Road-Aware Orientation
3.3.2. Object-Aware Sampling Approach
Algorithm 1 Object-oriented segmentation based on road-aware orientation |
|
3.4. Spatial Attention
3.4.1. Spatial Feature Encoding
3.4.2. Spatial Feature Reweighting
3.5. 3D Detection Head
4. Loss
5. Experiment and Results
5.1. Datsets
5.2. Metrics
5.3. Implementation Details
5.3.1. Network Architecture
5.3.2. Training
5.4. Main Results
5.5. Effectiveness
5.6. Ablation Studies
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, Y.; Ma, L.; Zhong, Z.; Liu, F.; Chapman, M.A.; Cao, D.; Li, J. Deep Learning for LiDAR Point Clouds in Autonomous Driving: A Review. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 3412–3432. [Google Scholar] [CrossRef]
- Mukhtar, A.; Xia, L.; Tang, T.B. Vehicle Detection Techniques for Collision Avoidance Systems: A Review. IEEE Trans. Intell. Transp. Syst. 2015, 16, 2318–2338. [Google Scholar] [CrossRef]
- Ye, Y.; Fu, L.; Li, B. Object detection and tracking using multi-layer laser for autonomous urban driving. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 259–264. [Google Scholar] [CrossRef]
- Guojun, W.; Wu, J.; He, R.; Yang, S. A Point Cloud Based Robust Road Curb Detection and Tracking Method. IEEE Access 2019, 7, 24611–24625. [Google Scholar] [CrossRef]
- Dieterle, T.; Particke, F.; Patino-Studencki, L.; Thielecke, J. Sensor data fusion of LIDAR with stereo RGB-D camera for object tracking. In Proceedings of the 2017 IEEE SENSORS, Glasgow, UK, 29 October–1 November 2017; pp. 1–3. [Google Scholar]
- Zhao, C.; Fu, C.; Dolan, J.M.; Wang, J. L-Shape Fitting-Based Vehicle Pose Estimation and Tracking Using 3D-LiDAR. IEEE Trans. Intell. Veh. 2021, 6, 787–798. [Google Scholar] [CrossRef]
- Li, Y.; Ibanez-Guzman, J. Lidar for Autonomous Driving: The Principles, Challenges, and Trends for Automotive Lidar and Perception Systems. IEEE Signal Process. Mag. 2020, 37, 50–61. [Google Scholar] [CrossRef]
- Sualeh, M.; Kim, G.W. Dynamic Multi-LiDAR Based Multiple Object Detection and Tracking. Sensors 2019, 19, 1474. [Google Scholar] [CrossRef]
- Kim, D.; Jo, K.; Lee, M.; Sunwoo, M. L-Shape Model Switching-Based Precise Motion Tracking of Moving Vehicles Using Laser Scanners. IEEE Trans. Intell. Transp. Syst. 2018, 19, 598–612. [Google Scholar] [CrossRef]
- Jin, X.; Yang, H.; Li, Z. Vehicle Detection Framework Based on LiDAR for Autonoumous Driving. In Proceedings of the 2021 5th CAA International Conference on Vehicular Control and Intelligence (CVCI), Tianjin, China, 29–31 October 2021; pp. 1–5. [Google Scholar] [CrossRef]
- Mushtaq, H.; Deng, X.; Ali, M.; Hayat, B.; Raza Sherazi, H.H. DFA-SAT: Dynamic Feature Abstraction with Self-Attention-Based 3D Object Detection for Autonomous Driving. Sustainability 2023, 15, 13667. [Google Scholar] [CrossRef]
- Zhang, Y.; Hu, Q.; Xu, G.; Ma, Y.; Wan, J.; Guo, Y. Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar] [CrossRef]
- Yang, Z.; Sun, Y.; Liu, S.; Jia, J. 3DSSD: Point-based 3d single stage object detector. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3D object detection network for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Zhou, Y.; Sun, P.; Zhang, Y.; Anguelov, D.; Gao, J.; Ouyang, T.; Guo, J.; Ngiam, J.; Vasudevan, V. End-to-end multi-view fusion for 3d object detection in lidar point clouds. In Proceedings of the Conference on Robot Learning, Virtual, 16–18 November 2020; pp. 923–932. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-Net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Huang, W.; Liang, H.; Lin, L.; Wang, Z.; Wang, S.; Yu, B.; Niu, R. A Fast Point Cloud Ground Segmentation Approach Based on Coarse-To-Fine Markov Random Field. IEEE Trans. Intell. Transp. Syst. 2022, 23, 7841–7854. [Google Scholar] [CrossRef]
- Chu, P.M.; Cho, S.; Park, J.; Fong, S.; Cho, K. Enhanced Ground Segmentation Method for Lidar Point Clouds in Human-Centric Autonomous Robot Systems. Hum.-Centric Comput. Inf. Sci. 2019, 9, 17. [Google Scholar] [CrossRef]
- Qi, C.R.; Litany, O.; He, K.; Guibas, L. Deep hough voting for 3D object detection in point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Shi, S.; Wang, X.; Li, H. PointRCNN: 3D object proposal generation and detection from point cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Yang, Z.; Wang, L. Learning relationships for multi-view 3D object recognition. In In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Shi, S.; Jiang, L.; Deng, J.; Wang, Z.; Guo, C.; Shi, J.; Wang, X.; Li, H. PV-RCNN++: Point-voxel feature set abstraction with local vector representation for 3D object detection. Int. J. Comput. Vis. 2023, 131, 531–551. [Google Scholar]
- Liu, Z.; Tang, H.; Lin, Y.; Han, S. Point-voxel cnn for efficient 3d deep learning. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Jiang, T.; Song, N.; Liu, H.; Yin, R.; Gong, Y.; Yao, J. Vic-net: Voxelization information compensation network for point cloud 3d object detection. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13408–13414. [Google Scholar]
- Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar] [CrossRef]
- Zhou, D.; Fang, J.; Song, X.; Guan, C.; Yin, J.; Dai, Y.; Yang, R. IoU Loss for 2D/3D Object Detection. In Proceedings of the 2019 International Conference on 3D Vision (3DV), Quebec City, QC, Canada, 16–19 September 2019. [Google Scholar] [CrossRef]
- Shi, H.; Hou, D.; Li, X. Center-Aware 3D Object Detection with Attention Mechanism Based on Roadside LiDAR. Sustainability 2023, 15, 2628. [Google Scholar] [CrossRef]
- Yin, T.; Zhou, X.; Krahenbuhl, P. Center-based 3d object detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 11784–11793. [Google Scholar]
- Li, L.; Yang, F.; Zhu, H.; Li, D.; Li, Y.; Tang, L. An Improved RANSAC for 3D Point Cloud Plane Segmentation Based on Normal Distribution Transformation Cells. Remote Sens. 2017, 9, 433. [Google Scholar] [CrossRef]
- Miądlicki, K.; Pajor, M.; Saków, M. Ground plane estimation from sparse LIDAR data for loader crane sensor fusion system. In Proceedings of the 2017 22nd International Conference on Methods and Models in Automation and Robotics (MMAR), Międzyzdroje, Poland, 28–31 August 2017; pp. 717–722. [Google Scholar] [CrossRef]
- Narksri, P.; Takeuchi, E.; Ninomiya, Y.; Morales, Y.; Akai, N.; Kawaguchi, N. A Slope-robust Cascaded Ground Segmentation in 3D Point Cloud for Autonomous Vehicles. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 497–504. [Google Scholar] [CrossRef]
- Liang, M.; Yang, B.; Wang, S.; Urtasun, R. Deep Continuous Fusion for Multi-sensor 3D Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11220 LNCS. [Google Scholar] [CrossRef]
- Ku, J.; Mozifian, M.; Lee, J.; Harakeh, A.; Waslander, S.L. Joint 3D Proposal Generation and Object Detection from View Aggregation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018. [Google Scholar] [CrossRef]
- Yang, B.; Luo, W.; Urtasun, R. Pixor: Real-time 3d object detection from point clouds. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7652–7660. [Google Scholar]
- Yan, Y.; Mao, Y.; Li, B. Second: Sparsely embedded convolutional detection. Sensors 2018, 18, 3337. [Google Scholar] [CrossRef] [PubMed]
- Lang, A.H.; Vora, S.; Caesar, H.; Zhou, L.; Yang, J.; Beijbom, O. Pointpillars: Fast encoders for object detection from point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Zhao, X.; Liu, Z.; Hu, R.; Huang, K. 3D object detection using scale invariant and feature reweighting networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 9267–9274. [Google Scholar]
- Xie, L.; Xiang, C.; Yu, Z.; Xu, G.; Yang, Z.; Cai, D.; He, X. PI-RCNN: An efficient multi-sensor 3D object detector with point-based attentive cont-conv fusion module. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12460–12467. [Google Scholar]
- Li, S.; Geng, K.; Yin, G.; Wang, Z.; Qian, M. MVMM: Multi-View Multi-Modal 3D Object Detection for Autonomous Driving. IEEE Trans. Ind. Inform. 2023, 20, 845–853. [Google Scholar] [CrossRef]
- Noh, J.; Lee, S.; Ham, B. HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Liu, Z.; Zhao, X.; Huang, T.; Hu, R.; Zhou, Y.; Bai, X. TANet: Robust 3D object detection from point clouds with triple attention. Proc. AAAI Conf. Artif. Intell. 2020, 34, 11677–11684. [Google Scholar] [CrossRef]
- Rukhovich, D.; Vorontsova, A.; Konushin, A. ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022. [Google Scholar] [CrossRef]
- Xu, W.; Hu, J.; Chen, R.; An, Y.; Xiong, Z.; Liu, H. Keypoint-Aware Single-Stage 3D Object Detector for Autonomous Driving. Sensors 2022, 22, 1451. [Google Scholar] [CrossRef] [PubMed]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph Cnn for learning on point clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Shi, S.; Guo, C.; Jiang, L.; Wang, Z.; Shi, J.; Wang, X.; Li, H. PV-RCNN: Point-voxel feature set abstraction for 3D object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Guo, J.; Xing, X.; Quan, W.; Yan, D.M.; Gu, Q.; Liu, Y.; Zhang, X. Efficient Center Voting for Object Detection and 6D Pose Estimation in 3D Point Cloud. IEEE Trans. Image Process. 2021, 30, 5072–5084. [Google Scholar] [CrossRef]
- Chen, W.; Duan, J.; Basevi, H.; Chang, H.J.; Leonardis, A. PointPoseNet: Point Pose Network for Robust 6D Object Pose Estimation. In Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 2813–2822. [Google Scholar]
- Gao, G.; Lauri, M.; Wang, Y.; Hu, X.; Zhang, J.; Frintrop, S. 6D Object Pose Regression via Supervised Learning on Point Clouds. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 3643–3649. [Google Scholar] [CrossRef]
- He, Y.; Sun, W.; Huang, H.; Liu, J.; Fan, H.; Sun, J. PVN3D: A Deep Point-Wise 3D Keypoints Voting Network for 6DoF Pose Estimation. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11629–11638. [Google Scholar] [CrossRef]
- Hagelskjær, F.; Buch, A.G. Pointvotenet: Accurate Object Detection And 6 DOF Pose Estimation In Point Clouds. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Virtual, 25–28 October 2020; pp. 2641–2645. [Google Scholar] [CrossRef]
- Gao, G.; Lauri, M.; Hu, X.; Zhang, J.; Frintrop, S. CloudAAE: Learning 6D Object Pose Regression with On-line Data Synthesis on Point Clouds. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 11081–11087. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef] [PubMed]
- Douillard, B.; Underwood, J.; Kuntz, N.; Vlaskine, V.; Quadros, A.; Morton, P.; Frenkel, A. On the segmentation of 3D LIDAR point clouds. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 2798–2805. [Google Scholar] [CrossRef]
- Rummelhard, L.; Paigwar, A.; Nègre, A.; Laugier, C. Ground estimation and point cloud segmentation using SpatioTemporal Conditional Random Field. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 1105–1110. [Google Scholar] [CrossRef]
- Xu, X.; Dong, S.; Xu, T.; Ding, L.; Wang, J.; Jiang, P.; Song, L.; Li, J. FusionRCNN: LiDAR-Camera Fusion for Two-Stage 3D Object Detection. Remote Sens. 2023, 15, 1839. [Google Scholar] [CrossRef]
- Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.J.; Martin, R.R.; Hu, S.M. Pct: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
- Engel, N.; Belagiannis, V.; Dietmayer, K. Point transformer. IEEE Access 2021, 9, 134826–134840. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Lai, X.; Liu, J.; Jiang, L.; Wang, L.; Zhao, H.; Liu, S.; Qi, X.; Jia, J. Stratified transformer for 3d point cloud segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8500–8509. [Google Scholar]
- Yan, X.; Zheng, C.; Li, Z.; Wang, S.; Cui, S. PointasNL: Robust point clouds processing using nonlocal neural networks with adaptive sampling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Chen, C.; Chen, Z.; Zhang, J.; Tao, D. Sasa: Semantics-augmented set abstraction for point-based 3d object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 221–229. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012. [Google Scholar] [CrossRef]
- Shuang, F.; Huang, H.; Li, Y.; Qu, R.; Li, P. AFE-RCNN: Adaptive Feature Enhancement RCNN for 3D Object Detection. Remote Sens. 2022, 14, 1176. [Google Scholar] [CrossRef]
- Nabhani, A.; Sjølie, H.K. TreeSim: An object-oriented individual tree simulator and 3D visualization tool in Python. SoftwareX 2022, 20, 101221. [Google Scholar] [CrossRef]
- Yoo, J.H.; Kim, Y.; Kim, J.; Choi, J.W. 3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-view Spatial Feature Fusion for 3D Object Detection. In Proceedings of the 16th European Conference, Glasgow, UK, 23–28 August 2020; Volume 12372 LNCS. [Google Scholar] [CrossRef]
Method | Category | Mod | mAP | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | ||||
Voxel | SECOND [36] | L | 81.80 | 88.07 | 79.37 | 77.95 | 55.10 | 46.27 | 44.76 | 73.67 | 56.04 | 48.78 |
PointPillars [37] | L | 84.76 | 88.35 | 86.10 | 79.83 | 58.66 | 50.23 | 47.19 | 79.19 | 62.25 | 56.00 | |
PV-RCNN [21] | L | 87.41 | 92.13 | 87.39 | 82.72 | 54.77 | 46.13 | 42.84 | 82.56 | 67.24 | 60.28 | |
VoxelNet [26] | L | 82.0 | 89.35 | 79.26 | 77.39 | 46.13 | 40.74 | 38.11 | 66.70 | 54.76 | 50.55 | |
Parallel | MV3D [14] | C + L | 78.45 | 86.62 | 78.93 | 69.80 | - | - | - | - | - | - |
Contfuse [33] | C + L | 85.10 | 94.07 | 85.35 | 75.88 | - | - | - | - | - | - | |
PIXOR++ [35] | L | 83.68 | 89.38 | 83.70 | 77.97 | - | - | - | - | - | - | |
MVMM [40] | C + L | 88.78 | 92.17 | 88.70 | 85.47 | 53.75 | 46.84 | 44.87 | 81.84 | 70.17 | 63.84 | |
Point | F-PointNet [17] | C + L | 83.54 | 91.17 | 84.67 | 74.77 | 57.13 | 49.57 | 45.48 | 77.26 | 61.37 | 53.78 |
Avod [34] | C + L | 85.14 | 90.99 | 84.82 | 79.62 | - | ||||||
PointRCNN [21] | L | 87.41 | 92.13 | 87.39 | 82.72 | 54.77 | 46.13 | 42.84 | 82.56 | 67.24 | 60.28 | |
PSIFT + SENet [38] | C + L | 82.99 | 88.80 | 83.96 | 76.21 | - | ||||||
SAT (Our) | L | 89.94 | 93.86 | 89.25 | 84.53 | 54.94 | 47.50 | 45.78 | 84.59 | 72.83 | 65.13 |
Method | Category | Mod | mAP | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | ||||
Voxel | SECOND [36] | L | 74.33 | 83.34 | 72.55 | 65.82 | 48.96 | 38.78 | 34.91 | 71.33 | 52.08 | 45.83 |
PointPillars [37] | L | 74.11 | 82.58 | 74.31 | 68.99 | 51.45 | 41.92 | 38.89 | 77.10 | 58.65 | 51.92 | |
PV-RCNN [39] | C + L | 76.41 | 84.37 | 74.82 | 70.03 | - | - | - | - | - | - | |
Parallel | VoxelNet [26] | L | 66.77 | 77.47 | 65.11 | 57.73 | 39.48 | 33.69 | 31.5 | 61.22 | 48.36 | 44.37 |
MV3D [14] | C + L | 64.20 | 74.97 | 63.63 | 54.0 | - | - | - | - | - | - | |
Contfuse [33] | C + L | 71.38 | 83.68 | 68.78 | 61.67 | - | - | - | - | - | - | |
HVPR [41] | L | 79.11 | 86.38 | 77.92 | 73.04 | 53.47 | 43.96 | 40.64 | - | |||
MVMM [40] | C + L | 80.08 | 87.59 | 78.87 | 73.78 | 47.54 | 40.49 | 38.36 | 77.82 | 64.81 | 58.79 | |
Point | PSIFT + SENet [38] | C + L | 77.14 | 85.99 | 72.72 | 72.72 | - | - | - | - | - | - |
PointRCNN [21] | C + L | 77.77 | 86.96 | 75.64 | 70.70 | 47.98 | 39.37 | 36.01 | 74.96 | 58.82 | 52.53 | |
TANET [42] | L | 76.38 | 84.39 | 75.94 | 68.82 | 53.72 | 44.34 | 40.49 | 75.70 | 59.44 | 52.53 | |
F-PointNet [17] | C + L | 70.86 | 82.19 | 69.79 | 60.59 | 50.53 | 42.15 | 38.08 | 72.27 | 56.17 | 49.01 | |
SAT (Our) | L | 81.25 | 88.72 | 80.02 | 74.11 | 48.93 | 41.25 | 39.10 | 79.10 | 65.83 | 60.18 |
Method | Category | Mod | ATS ↑ | AP ↑ | AOS ↑ | ASS ↑ | RODS ↑ |
---|---|---|---|---|---|---|---|
Voxel | SECOND [36] | L | 49.50 | 77.22 | 86.49 | 76.33 | 64.76 |
PointPillars [37] | L | 47.37 | 76.95 | 86.22 | 77.94 | 63.87 | |
PV-RCNN [39] | C + L | 46.94 | 79.80 | 86.81 | 83.00 | 65.07 | |
VoxelNet [26] | L | 50.99 | 78.59 | 86.85 | 78.60 | 66.17 | |
Parallel | MV3D [14] | C + L | 50.15 | 77.37 | 86.46 | 80.34 | 65.77 |
Contfuse [33] | C + L | 51.06 | 78.17 | 86.64 | 77.73 | 65.95 | |
HVPR [41] | L | 50.99 | 78.59 | 86.85 | 78.60 | 66.17 | |
MVMM [40] | C + L | 46.94 | 79.80 | 86.81 | 83.00 | 65.07 | |
Point | PSIFT + SENet [38] | C + L | 74.12 | 68.47 | 83.99 | 64.38 | 72.20 |
PointRCNN [21] | C + L | 67.83 | 70.22 | 83.87 | 63.19 | 70.13 | |
TANET [42] | L | 72.01 | 69.23 | 83.33 | 69.12 | 72.94 | |
F-PointNet [17] | L | 74.12 | 68.47 | 83.99 | 64.38 | 72.20 | |
SAT (Our) | C | 69.36 | 69.04 | 82.43 | 70.98 | 71.75 |
Method | Category | Car 3D—Test | Car 3D—Validation | ||||
---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | ||
Voxel | SECOND [36] | 83.13 | 73.66 | 66.20 | 87.43 | 76.48 | 69.10 |
PointPillars [37] | 82.58 | 74.31 | 68.99 | - | 77.98 | - | |
PV-RCNN [39] | 87.81 | 78.49 | 73.51 | 89.47 | 79.47 | 78.54 | |
VoxelNet [26] | 90.25 | 81.43 | 76.82 | 89.35 | 83.69 | 78.70 | |
Parallel | MV3D [14] | - | - | - | 87.72 | 79.48 | 77.17 |
Contfuse [33] | 90.90 | 81.62 | 77.06 | 89.41 | 84.52 | 78.93 | |
HVPR [41] | 86.96 | 75.64 | 70.70 | 88.88 | 78.63 | 77.38 | |
MVMM [40] | 88.36 | 79.57 | 74.55 | 89.71 | 79.45 | 78.67 | |
Point | PSIFT + SENet [38] | 88.76 | 82.16 | 77.16 | 89.38 | 84.80 | 79.01 |
PointRCNN [21] | 88.87 | 80.32 | 75.10 | - | 79.57 | - | |
TANET [42] | 86.52 | 78.95 | 73.12 | 88.38 | 83.14 | 77.48 | |
F-PointNet [17] | 87.34 | 79.26 | 73.85 | 88.17 | 82.41 | 78.74 | |
SAT (Our) | 88.81 | 80.62 | 75.55 | 89.24 | 84.57 | 78.45 |
OOS | SFR | R3D | 3D AP | BEV AP | Mean Error | ||
---|---|---|---|---|---|---|---|
Moderate | ↑ | ↑ | ↑ | ↓ | |||
- | - | - | 37.45 | 70.12 | 73.78 | 0.16 | 0.44 |
- | - | ✓ | 71.72 | 66.67 | 82.98 | 0.16 | 0.45 |
- | ✓ | ✓ | 60.75 | 73.10 | 82.15 | 0.21 | 0.16 |
✓ | - | ✓ | 72.48 | 84.27 | 86.57 | 0.13 | 0.06 |
PointRCNN | PV-RCNN | 2SR | Par (M) | Moderate AP | Car | Pedestrian | Cyclist | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | Easy | Mod. | Hard | |||||
✓ | 20 | 76.57 | 86.21 | 78.65 | 76.15 | 62.04 | 40.41 | 38.15 | 76.59 | 63.37 | 57.47 | ||
✓ | 20 | 80.68 | 86.95 | 79.65 | 74.39 | 64.30 | 42.52 | 39.40 | 78.81 | 65.48 | 58.48 | ||
✓ | ✓ | 28 | 79.12 | 88.54 | 78.60 | 74.42 | 63.90 | 41.69 | 39.35 | 78.97 | 65.43 | 59.28 | |
✓ | ✓ | 32 | 82.36 | 89.83 | 81.13 | 75.23 | 49.86 | 42.36 | 39.52 | 80.04 | 66.94 | 61.29 |
Method | OOS | SFR | R3D | ms/image | FPS | Param/MB |
---|---|---|---|---|---|---|
PointRCNN | ✓ | 18 | 65 | 72.8 | ||
✓ | ✓ | 24 | 51 | 73.8 | ||
✓ | ✓ | 20 | 60 | 72.9 | ||
✓ | ✓ | ✓ | 21 | 51 | 76.2 | |
PV-RCNN | ✓ | 40 | 29 | 64.7 | ||
✓ | ✓ | 58 | 21 | 65.9 | ||
✓ | ✓ | 53 | 22 | 65.6 | ||
✓ | ✓ | ✓ | 62 | 21 | 65.8 |
Method | k-Points | AP3D (%) | APBEV (%) | ||||
---|---|---|---|---|---|---|---|
Easy | Mod. | Hard | Easy | Mod. | Hard | ||
PointRCNN | 9 | 88.47 | 80.35 | 74.89 | 94.66 | 90.42 | 84.89 |
16 | 88.32 | 79.94 | 74.26 | 93.78 | 89.46 | 84.13 | |
32 | 89.47 | 80.53 | 74.85 | 94.66 | 90.15 | 84.29 | |
PV-RCNN | 9 | 87.85 | 79.46 | 73.97 | 93.57 | 89.62 | 83.97 |
16 | 87.34 | 78.93 | 73.95 | 92.89 | 88.57 | 83.62 | |
32 | 89.56 | 80.16 | 75.27 | 93.87 | 89.54 | 84.29 |
Abbreviation | Definition | Abbreviation | Definition |
---|---|---|---|
AVs [36] | Autonomous vehicles | FPS [17] | Feature point sampling |
LiDAR [37] | Light detection and ranging | D-FPS [21] | Deterministic feature point sampling |
3D [14] | Three-dimensional | Feat-FPS [21] | Feature-based feature point sampling |
SAT | Object-Oriented-Segmentation-Guided Spatial-Attention | AP [21] | Average precision |
OOS [67] | Object-Oriented Segmentation | FFN [57] | Feed-forward network |
SFR [68] | Spatial-Attention-Based Feature Reweighting | R3D [4] | Road-Aware 3D Detection Head |
IoU [26] | Intersection over union | BEV [40] | Bird’s-eye view |
MLPs [37] | Multilayer perceptrons | mAP [39] | Mean average precision |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mushtaq, H.; Deng, X.; Ullah, I.; Ali, M.; Malik, B.H. O2SAT: Object-Oriented-Segmentation-Guided Spatial-Attention Network for 3D Object Detection in Autonomous Vehicles. Information 2024, 15, 376. https://doi.org/10.3390/info15070376
Mushtaq H, Deng X, Ullah I, Ali M, Malik BH. O2SAT: Object-Oriented-Segmentation-Guided Spatial-Attention Network for 3D Object Detection in Autonomous Vehicles. Information. 2024; 15(7):376. https://doi.org/10.3390/info15070376
Chicago/Turabian StyleMushtaq, Husnain, Xiaoheng Deng, Irshad Ullah, Mubashir Ali, and Babur Hayat Malik. 2024. "O2SAT: Object-Oriented-Segmentation-Guided Spatial-Attention Network for 3D Object Detection in Autonomous Vehicles" Information 15, no. 7: 376. https://doi.org/10.3390/info15070376
APA StyleMushtaq, H., Deng, X., Ullah, I., Ali, M., & Malik, B. H. (2024). O2SAT: Object-Oriented-Segmentation-Guided Spatial-Attention Network for 3D Object Detection in Autonomous Vehicles. Information, 15(7), 376. https://doi.org/10.3390/info15070376