Secure Grasping Detection of Objects in Stacked Scenes Based on Single-Frame RGB Images
Abstract
:1. Introduction
- (1)
- Analyzing how to use an adjacency matrix to represent an object stack. We used the mathematical properties of the adjacency matrix and post-processing to obtain a secure grasp.
- (2)
- Using the Hungarian algorithm of Deformable DETR [12] to generate predictions for object queries and corresponding relationships between objects, and then using this relationship and visual features learned by Encoder to generate an adjacency matrix. We analyzed the impact of multi-scale features and variable self-attention mechanisms on overall model performance. Adding residual modules between the original feature map and the output of Encoder provides adequate visual features for the input of the MLP that generates the adjacency matrix.
- (3)
- Combining the CSL [14] idea with the one-stage object detection model YOLOv5 [13]. We demonstrated that angle prediction can be transformed from a regression problem to a classification problem using one-hot encoding and using Gaussian functions as a window function to improve the rationality of loss calculation.
2. Related Work
2.1. Object Detection
2.2. Stacking Relationship Detection
2.3. Grasping Detection
3. The Method of Grasping in Stacked Scenes
3.1. Initialization with Adjacent Matrix
3.2. GrRN
3.3. CSL-YOLO
4. Experiment and Result Analysis
4.1. Experimental Setup for GrRN
4.2. Experimental Results of GrRN
- For objects and where i is placed on j, and .
- For objects and that have no direct relationship, and .
- Relationship Recall (RR): The number of correctly detected relationships divided by the total number of correct stacking relationships.
- Relationship Precision (RP): The quantity of correctly predicted relationships divided by the total quantity of detected relationships. If the tuple is correct, the detected relationship is considered correct, where represents the i-th object and represents the relationship between the two objects in the indices.
- Image Accuracy (IA): In the test set, RR and RP are both 100% for all the existing objects in the image. The notation IA-x represents the presence of x objects in the image.
4.3. Experimental Setup for CSL-YOLO
4.4. Experimental Results for CSL-YOLO
- IW: The entire dataset is shuffled and randomly divided into training and test sets to test the model’s generalization ability for previously seen objects when they appear at new positions and rotation angles.
- OW: The dataset is divided by object instance, and the objects in the test set have not appeared in the training set before, to test the model’s generalization ability for unseen objects.
4.5. Experiments in Real-World Scenarios
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Du, G.; Wang, K.; Lian, S.; Zhao, K. Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: A review. Artif. Intell. Rev. 2020, 54, 1677–1734. [Google Scholar] [CrossRef]
- Chen, W.; Jia, X.; Chang, H.J.; Duan, J.; Leonardis, A. G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4232–4241. [Google Scholar]
- Sundermeyer, M.; Mousavian, A.; Triebel, R.; Fox, D. Contact-GraspNet: Efficient 6-DoF Grasp Generation in Cluttered Scenes. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 13438–13444. [Google Scholar]
- Mousavian, A.; Eppner, C.; Fox, D. 6-Dof graspnet: Variational grasp generation for object manipulation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2901–2910. [Google Scholar]
- Chen, W.; Liang, H.; Chen, Z.; Sun, F.; Zhang, J. Improving Object Grasp Performance via Transformer-Based Sparse Shape Completion. J. Intell. Robot. Syst. 2022, 104, 45. [Google Scholar] [CrossRef]
- Cammarata, A.; Sinatra, R.; Maddío, P.D. Interface reduction in flexible multibody systems using the Floating Frame of Reference Formulation. J. Sound Vib. 2022, 523, 116720. [Google Scholar] [CrossRef]
- Depierre, A.; Dellandr’ea, E.; Chen, L. Optimizing Correlated Graspability Score and Grasp Regression for Better Grasp Prediction. arXiv 2020, arXiv:2002.00872. [Google Scholar]
- Morrison, D.; Corke, P.; Leitner, J. Closing the Loop for Robotic Grasping: A Real-time, Generative Grasp Synthesis Approach. arXiv 2018, arXiv:1804.05172. [Google Scholar]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2015; pp. 770–778. [Google Scholar]
- Tchuiev, V.; Miron, Y.; Castro, D.D. DUQIM-Net: Probabilistic Object Hierarchy Representation for Multi-View Manipulation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 10470–10477. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2020, arXiv:2010.04159. [Google Scholar]
- Jocher, G. YOLOv5 by Ultralytics, Version 7.0; Computer software; Zenodo: Geneva, Switzerland, 2020. [CrossRef]
- Yang, X.; Yan, J.; He, T. On the Arbitrary-Oriented Object Detection: Classification Based Approaches Revisited. Int. J. Comput. Vis. 2020, 130, 1340–1365. [Google Scholar] [CrossRef]
- Zhang, H.; Lan, X.; Zhou, X.; Tian, Z.; Zhang, Y.; Zheng, N. Visual Manipulation Relationship Network for Autonomous Robotics. In Proceedings of the 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids), Beijing, China, 6–9 November 2018; pp. 118–125. [Google Scholar]
- Jiang, Y.; Moseson, S.; Saxena, A. Efficient grasping from RGBD images: Learning using a new rectangle representation. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3304–3311. [Google Scholar]
- Girshick, R.B.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2013; pp. 580–587. [Google Scholar]
- Girshick, R.B. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.B.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 39, 1137–1149. [Google Scholar] [CrossRef]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2015. [Google Scholar]
- Redmon, J.; Divvala, S.K.; Girshick, R.B.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2015; pp. 779–788. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. arXiv 2020, arXiv:2005.12872. [Google Scholar]
- Zhang, H.; Lan, X.; Bai, S.; Wan, L.; Yang, C.; Zheng, N. A Multi-task Convolutional Neural Network for Autonomous Robotic Grasping in Object Stacking Scenes. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2018; pp. 6435–6442. [Google Scholar]
- Park, D.; Seo, Y.; Shin, D.; Choi, J.; Chun, S.Y. A Single Multi-Task Deep Neural Network with Post-Processing for Object Detection with Reasoning and Robotic Grasp Detection. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2019; pp. 7300–7306. [Google Scholar]
- Chi, J.; Wu, X.; Ma, C.; Yu, X.; Wu, C. A Robot Grasp Relationship Detection Network Based on the Fusion of Multiple Features. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 1479–1484. [Google Scholar]
- Maitin-Shepard, J.B.; Cusumano-Towner, M.F.; Lei, J.; Abbeel, P. Cloth grasp point detection based on multiple-view geometric cues with application to robotic towel folding. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 2308–2315. [Google Scholar]
- Bohg, J.; Morales, A.; Asfour, T.; Kragic, D. Data-Driven Grasp Synthesis—A Survey. IEEE Trans. Robot. 2013, 30, 289–309. [Google Scholar] [CrossRef]
- Guo, D.; Sun, F.; Liu, H.; Kong, T.; Fang, B.; Xi, N. A hybrid deep architecture for robotic grasp detection. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 1609–1614. [Google Scholar]
- Chu, F.; Xu, R.; Vela, P.A. Real-World Multiobject, Multigrasp Detection. IEEE Robot. Autom. Lett. 2018, 3, 3355–3362. [Google Scholar] [CrossRef]
- Dong, M.; Wei, S.; Yu, X.; Yin, J. Mask-GD Segmentation Based Robotic Grasp Detection. Comput. Commun. 2021, 178, 124–130. [Google Scholar] [CrossRef]
Model | OR (%) | OP (%) |
---|---|---|
VMRN | 86.0 | 88.8 |
VSE | 89.2 | 90.2 |
Adj-Net | 90.1 | 93.5 |
Ours | 91.9 | 94.8 |
Model | RR (%) | RP (%) | IA (%) |
---|---|---|---|
VMRN | 86.0 | 88.8 | 67.1 |
VSE | - | - | 73.7 |
Adj-Net | 88.9 | 91.5 | 74.4 |
Ours | 91.2 | 93.1 | 78.0 |
Model | Total (%) | IA-2 | IA-3 | IA-4 | IA-5 |
---|---|---|---|---|---|
VMRN | 67.1 | 57/65 | 134/209 | 60/106 | 51/70 |
VSE | 73.7 | 57/65 | 146/209 | 75/106 | 54/70 |
Adj-Net | 74.4 | 56/65 | 155/209 | 74/106 | 50/70 |
Ours | 78.0 | 60/65 | 160/209 | 79/106 | 52/70 |
Model | OR (%) | OP (%) | RR (%) | RP (%) | IA (%) |
---|---|---|---|---|---|
GrRN-DETR | 86.1 | 88.7 | 86.5 | 89.7 | 71.2 |
GrRN-Decoder | 92.3 | 95.2 | 54.4 | 59.6 | 30.3 |
GrRN | 91.9 | 94.8 | 91.2 | 93.1 | 78.0 |
Model | Grasp Detection Accuracy (%) | |
---|---|---|
IW | OW | |
Guo | 93.2 | 89.1 |
Chu | 96.0 | 96.1 |
Dong | 96.4 | 95.5 |
) | 95.1 | 94.9 |
) | 97.7 | 97.2 |
) | 98.0 | 97.4 |
) | 97.3 | 97.1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, H.; Sun, Q.; Liu, W.; Yang, M. Secure Grasping Detection of Objects in Stacked Scenes Based on Single-Frame RGB Images. Sensors 2023, 23, 8054. https://doi.org/10.3390/s23198054
Xu H, Sun Q, Liu W, Yang M. Secure Grasping Detection of Objects in Stacked Scenes Based on Single-Frame RGB Images. Sensors. 2023; 23(19):8054. https://doi.org/10.3390/s23198054
Chicago/Turabian StyleXu, Hao, Qi Sun, Weiwei Liu, and Minghao Yang. 2023. "Secure Grasping Detection of Objects in Stacked Scenes Based on Single-Frame RGB Images" Sensors 23, no. 19: 8054. https://doi.org/10.3390/s23198054
APA StyleXu, H., Sun, Q., Liu, W., & Yang, M. (2023). Secure Grasping Detection of Objects in Stacked Scenes Based on Single-Frame RGB Images. Sensors, 23(19), 8054. https://doi.org/10.3390/s23198054