Weakly Supervised Object Detection with Symmetry Context
Abstract
:1. Introduction
- Two context proposal mining strategies are proposed to better capture the diverse discriminative information for objects of interest.
- A Symmetry Context Module (SCM) is introduced to improve the detection accuracy of our two-stream neural network model.
- Experimental results on the popular PASCAL VOC 2007 and 2012 datasets demonstrate that our method achieves better performance compared with other state-of-the-art approaches.
2. Related Work
2.1. MIL and WSOD
2.2. Using Contextual Information in WSOD
3. Methodology
3.1. Overall Framework
3.2. Context Proposal Mining
3.2.1. Naive Context Proposal Mining
3.2.2. Gaussian-Based Context Proposal Mining
3.3. Symmetry Context Module
4. Experiments
4.1. Datasets and Experimental Setup
4.2. Ablation Study
4.2.1. Context Proposals Location
4.2.2. Effect of Number of Context Proposals
4.3. Comparison with Other Baselines
5. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations and Notation Description
Abbreviations/Notation | Description |
CNN | Convolutional Neural Networks |
WSOD | Weakly Supervised Object Detection |
FSOD | Fully Supervised Object Detection |
MIL | Multiple Instance Learning |
WSDDN | Weakly Supervised Deep Detection Network |
OICR | Online Instance Classifier Refinement |
mAP | Mean Average Precision |
MIST | Multiple Instance Self-Training |
I | input image |
image labels | |
score matrix of localization stream and detection stream in SCM | |
fused context proposal score matrix of localization stream | |
C | number of object classes |
K | the number of refinement stages |
feature vectors of region proposals | |
feature vectors of context proposals | |
image score of a specific class c | |
output score vector of proposal j of the kth instance classifier | |
label for proposal j of the kth instance classifier |
References
- Wang, H.; Li, H.; Qian, W.; Diao, W.; Zhao, L.; Zhang, J.; Zhang, D. Dynamic Pseudo-Label Generation for Weakly Supervised Object Detection in Remote Sensing Images. Remote Sens. 2021, 13, 1461. [Google Scholar] [CrossRef]
- Huang, Z.; Zou, Y.; Kumar, B.; Huang, D. Comprehensive Attention Self-Distillation for Weakly-Supervised Object Detection. Adv. Neural Inf. Process. Syst. 2020, 33, 16797–16807. [Google Scholar]
- Xu, C.; Zheng, X.; Lu, X. Multi-Level Alignment Network for Cross-Domain Ship Detection. Remote Sens. 2022, 14, 2389. [Google Scholar] [CrossRef]
- Zheng, J.; Fu, H.; Li, W.; Wu, W.; Zhao, Y.; Dong, R.; Yu, L. Cross-Regional Oil Palm Tree Counting and Detection via a Multi-Level Attention Domain Adaptation Network. ISPRS J. Photogramm. Remote Sens. 2020, 167, 154–177. [Google Scholar] [CrossRef]
- Wan, F.; Liu, C.; Ke, W.; Ji, X.; Jiao, J.; Ye, Q. C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2194–2203. [Google Scholar] [CrossRef]
- Bilen, H.; Vedaldi, A. Weakly Supervised Deep Detection Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2846–2854. [Google Scholar] [CrossRef]
- Tang, P.; Wang, X.; Bai, X.; Liu, W. Multiple Instance Detection Network with Online Instance Classifier Refinement. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 3059–3067. [Google Scholar] [CrossRef]
- Ren, Z.; Yu, Z.; Yang, X.; Liu, M.Y.; Lee, Y.J.; Schwing, A.G.; Kautz, J. Instance-Aware, Context-Focused, and Memory-Efficient Weakly Supervised Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10595–10604. [Google Scholar] [CrossRef]
- Torralba; Murphy; Freeman; Rubin. Context-Based Vision System for Place and Object Recognition. In Proceedings of the IEEE International Conference on Computer Vision, Nice, France, 14–17 October 2003; Volume 1, pp. 273–280. [Google Scholar] [CrossRef]
- Gidaris, S.; Komodakis, N. Object Detection via a Multi-region and Semantic Segmentation-Aware CNN Model. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1134–1142. [Google Scholar] [CrossRef]
- Wei, Y.; Shen, Z.; Cheng, B.; Shi, H.; Xiong, J.; Feng, J.; Huang, T. TS2C: Tight Box Mining with Surrounding Segmentation Context for Weakly Supervised Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 434–450. [Google Scholar] [CrossRef] [Green Version]
- Kantorov, V.; Oquab, M.; Cho, M.; Laptev, I. ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 350–365. [Google Scholar] [CrossRef]
- Zhang, D.; Han, J.; Cheng, G.; Yang, M. Weakly Supervised Object Localization and Detection: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 167, 154–177. [Google Scholar] [CrossRef] [PubMed]
- Huang, X.; Xu, K.; Huang, C.; Wang, C.; Qin, K. Multiple Instance Learning Convolutional Neural Networks for Fine-Grained Aircraft Recognition. Remote Sens. 2021, 13, 5132. [Google Scholar] [CrossRef]
- Han, T.; Wang, L.; Wen, B. The Kernel Based Multiple Instances Learning Algorithm for Object Tracking. Electronics 2018, 7, 97. [Google Scholar] [CrossRef]
- Wu, L.; Liu, Q. Weakly Supervised Object Co-Localization via Sharing Parts Based on a Joint Bayesian Model. Symmetry 2018, 10, 142. [Google Scholar] [CrossRef]
- Ali, M.U.; Sultani, W.; Ali, M. Destruction from Sky: Weakly Supervised approach for Destruction Detection in Satellite Imagery. ISPRS J. Photogramm. Remote. Sens. 2020, 162, 115–124. [Google Scholar] [CrossRef]
- Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective Search for Object Recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
- Zitnick, C.L.; Dollár, P. Edge Boxes: Locating Object Proposals from Edges. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 8–14 September 2014; pp. 391–405. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
- Tang, P.; Wang, X.; Bai, S.; Shen, W.; Bai, X.; Liu, W.; Yuille, A. PCL: Proposal Cluster Learning for Weakly Supervised Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 176–191. [Google Scholar] [CrossRef]
- Diba, A.; Sharma, V.; Pazandeh, A.; Pirsiavash, H.; Van Gool, L. Weakly Supervised Cascaded Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5131–5139. [Google Scholar] [CrossRef]
- Zeng, Z.; Liu, B.; Fu, J.; Chao, H.; Zhang, L. WSOD2: Learning Bottom-Up and Top-Down Objectness Distillation for Weakly-Supervised Object Detection. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 8291–8299. [Google Scholar] [CrossRef]
- Bilen, H.; Pedersoli, M.; Tuytelaars, T. Weakly Supervised Object Detection with Posterior Regularization. In Proceedings of the BMVC 2014, Nottingham, UK, 1–5 September 2014; pp. 1–12. [Google Scholar] [CrossRef]
- Dong, B.; Huang, Z.; Guo, Y.; Wang, Q.; Niu, Z.; Zuo, W. Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 2876–2885. [Google Scholar]
- Inoue, N.; Furuta, R.; Yamasaki, T.; Aizawa, K. Cross-Domain Weakly-Supervised Object Detection through Progressive Domain Adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5001–5009. [Google Scholar]
- Li, J.; Zhang, C.; Yang, B. Global Contextual Dependency Network for Object Detection. Future Internet 2022, 14, 27. [Google Scholar] [CrossRef]
- Liang, H.; Zhou, H.; Zhang, Q.; Wu, T. Object Detection Algorithm Based on Context Information and Self-Attention Mechanism. Symmetry 2022, 14, 904. [Google Scholar] [CrossRef]
- Chen, Z.M.; Jin, X.; Zhao, B.R.; Zhang, X.; Guo, Y. HCE: Hierarchical Context Embedding for Region-Based Object Detection. IEEE Trans. Image Process. 2021, 30, 6917–6929. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. Available online: http://host.robots.ox.ac.uk/pascal/VOC/index.html (accessed on 15 June 2022).
- Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Available online: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (accessed on 15 June 2022).
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
- Lai, B.; Gong, X. Saliency Guided End-to-End Learning For Weakly Supervised Object Detection. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, IJCAI’17, Melbourne, Australia, 19–25 August 2017; pp. 2053–2059. [Google Scholar] [CrossRef] [Green Version]
- Tang, P.; Wang, X.; Wang, A.; Yan, Y.; Liu, W.; Huang, J.; Yuille, A. Weakly Supervised Region Proposal Network and Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 352–368. [Google Scholar] [CrossRef]
- Li, X.; Kan, M.; Shan, S.; Chen, X. Weakly Supervised Object Detection with Segmentation Collaboration. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9734–9743. [Google Scholar] [CrossRef]
- Yang, K.; Li, D.; Dou, Y. Towards Precise End-to-End Weakly Supervised Object Detection Network. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
- Jin, R.; Lin, G.; Wen, C. Online Active Proposal Set Generation for Weakly Supervised Object Detection. Knowl. Based Syst. 2022, 237, 107726. [Google Scholar] [CrossRef]
- Jiang, W.; Zhao, Z.; Su, F.; Fang, Y. Dynamic Proposal Sampling for Weakly Supervised Object Detection. Neurocomputing 2021, 441, 248–259. [Google Scholar] [CrossRef]
Context Proposal Mining | Distance to Region Proposal Boundary | mAP |
---|---|---|
No context | - | 42.26 |
NCP | 0 | 45.22 |
NCP | 0.9 | 44.16 |
GCP | 0.1 | 43.99 |
GCP | 0.2 | 45.10 |
Context Proposal Mining | Distance to Region Proposal Boundary | Number of Context Proposals per Side | mAP |
---|---|---|---|
GCP | 0.1 | 2 | 43.46 |
GCP | 0.1 | 1 | 43.99 |
Method | Context Proposal Mining | Distance to Region Proposal Boundary | Fusion Method | mAP |
---|---|---|---|---|
OICR (+MIST + Reg.) | No context | - | - | 50.91 |
Ours | GCP | 0.2 | mean | 51.85 |
Ours | GCP | 0.2 | max | 52.38 |
Method | Aero | Bike | Bird | Boat | Bottle | Bus | Car | Cat | Chair | Cow | Table | Dog | Horse | Mbike | Persn | Plant | Sheep | Sofa | Train | TV | mAP |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ContextLocNet [12] | 57.1 | 52.0 | 31.5 | 7.6 | 11.5 | 55.0 | 53.1 | 34.1 | 1.7 | 33.1 | 49.2 | 42.0 | 47.3 | 56.6 | 15.3 | 12.8 | 24.8 | 48.9 | 44.4 | 47.8 | 36.3 |
Bilen [6] | 46.4 | 58.3 | 35.5 | 25.9 | 14.0 | 66.7 | 53.0 | 39.2 | 8.9 | 41.8 | 26.6 | 38.6 | 44.7 | 59.0 | 10.8 | 17.3 | 40.7 | 49.6 | 56.9 | 50.8 | 39.3 |
OICR [7] | 58.0 | 62.4 | 31.1 | 19.4 | 13.0 | 65.1 | 62.2 | 28.4 | 24.8 | 44.7 | 30.6 | 25.3 | 37.8 | 65.5 | 15.7 | 24.1 | 41.7 | 46.9 | 64.3 | 62.6 | 41.2 |
OICR [7] | 56.1 | 72.7 | 40.9 | 26.7 | 25.7 | 66.6 | 67.1 | 13.0 | 24.2 | 48.4 | 39.5 | 16.4 | 20.3 | 69.4 | 8.1 | 23.9 | 49.2 | 47.5 | 63.9 | 65.8 | 42.3 |
Diba [24] | 49.5 | 60.6 | 38.6 | 29.2 | 16.2 | 70.8 | 56.9 | 42.5 | 10.9 | 44.1 | 29.9 | 42.2 | 47.9 | 64.1 | 13.8 | 23.5 | 45.9 | 54.1 | 60.8 | 54.5 | 42.8 |
SGWSOD [37] | 48.4 | 61.5 | 33.3 | 30.0 | 15.3 | 72.4 | 62.4 | 59.1 | 10.9 | 42.3 | 34.3 | 53.1 | 48.4 | 65.0 | 20.5 | 16.6 | 40.6 | 46.5 | 54.6 | 55.1 | 43.5 |
TS2C [11] | 59.3 | 57.5 | 43.7 | 27.3 | 13.5 | 63.9 | 61.7 | 59.9 | 24.1 | 46.9 | 36.7 | 45.6 | 39.9 | 62.6 | 10.3 | 23.6 | 41.7 | 52.4 | 58.7 | 56.6 | 44.3 |
WSRPN [38] | 57.9 | 70.5 | 37.8 | 5.7 | 21.0 | 66.1 | 69.2 | 59.4 | 3.4 | 57.1 | 57.3 | 35.2 | 64.2 | 68.6 | 32.8 | 28.6 | 50.8 | 49.5 | 41.1 | 30.0 | 45.3 |
PCL [23] | 62.3 | 69.3 | 50.6 | 28.1 | 22.1 | 71.8 | 68.1 | 56.8 | 24.0 | 61.3 | 43.1 | 59.4 | 45.0 | 66.2 | 12.3 | 23.3 | 45.3 | 52.0 | 65.1 | 57.2 | 49.2 |
SDCN [39] | 59.4 | 71.5 | 38.9 | 32.2 | 21.5 | 67.7 | 64.5 | 68.9 | 20.4 | 49.2 | 47.6 | 60.9 | 55.9 | 67.4 | 31.2 | 22.9 | 45.0 | 53.2 | 60.9 | 64.4 | 50.2 |
C-MIL [5] | 62.5 | 58.4 | 49.5 | 32.1 | 19.8 | 70.5 | 66.1 | 63.4 | 20.0 | 60.5 | 52.9 | 53.5 | 57.4 | 68.9 | 8.4 | 24.6 | 51.8 | 58.7 | 66.7 | 63.5 | 50.5 |
Yang et al. [40] | 57.6 | 70.8 | 50.7 | 28.3 | 27.2 | 72.5 | 69.1 | 65.0 | 26.9 | 64.5 | 47.4 | 47.7 | 53.5 | 66.9 | 13.7 | 29.3 | 56.0 | 54.9 | 63.4 | 65.2 | 51.5 |
OPG [41] | 63.0 | 65.3 | 49.2 | 31.7 | 25.3 | 70.9 | 70.9 | 58.1 | 27.4 | 58.6 | 44.7 | 47.0 | 47.2 | 69.8 | 13.1 | 26.1 | 49.9 | 51.8 | 61.7 | 68.2 | 50.0 |
Jiang et al. [42] | 60.1 | 74.5 | 51.9 | 29.6 | 30.2 | 68.8 | 72.6 | 44.6 | 19.8 | 66.0 | 48.8 | 43.7 | 63.2 | 68.2 | 17.7 | 25.1 | 53.7 | 60.8 | 56.1 | 63.1 | 50.9 |
OICR + MIST + Reg. [8] | 67.9 | 78.6 | 55.6 | 25.6 | 29.1 | 69.8 | 75.4 | 50.3 | 27.6 | 67.2 | 39.6 | 28.2 | 50.2 | 72.0 | 15.7 | 26.1 | 62.7 | 52.2 | 68.0 | 56.7 | 50.9 |
Ours | 71.4 | 79.2 | 55.5 | 31.6 | 22.6 | 71.5 | 75.5 | 52.3 | 20.4 | 64.8 | 44.9 | 35.2 | 49.8 | 71.8 | 22.3 | 27.9 | 59.6 | 52.3 | 70.6 | 68.3 | 52.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gu, X.; Zhang, Q.; Lu, Z. Weakly Supervised Object Detection with Symmetry Context. Symmetry 2022, 14, 1832. https://doi.org/10.3390/sym14091832
Gu X, Zhang Q, Lu Z. Weakly Supervised Object Detection with Symmetry Context. Symmetry. 2022; 14(9):1832. https://doi.org/10.3390/sym14091832
Chicago/Turabian StyleGu, Xinyu, Qian Zhang, and Zheng Lu. 2022. "Weakly Supervised Object Detection with Symmetry Context" Symmetry 14, no. 9: 1832. https://doi.org/10.3390/sym14091832