객체 탐지

80개의 공통 객체를 탐지할 수 있는 COCO 데이터셋에서 훈련된 YOLOv3 모델을 사용함으로써 OpenCV의 딥 뉴럴 네트워크로 감지된 물체.

객체 탐지(客體探知, object detection)는 컴퓨터 비전과 이미지 처리와 관련된 컴퓨터 기술로서, 디지털 이미지와 비디오로 특정한 계열의 시맨틱 객체 인스턴스(예: 인간, 건물, 자동차)를 감지하는 일을 다룬다. 잘 연구된 객체 탐지 분야로는 얼굴 검출, 보행자 검출이 포함된다. 객체 탐지는 영상 복구, 비디오 감시를 포함한 수많은 컴퓨터 비전 분야에 응용되고 있다.

방식

객체 탐지를 위한 방식은 일반적으로 기계 학습 기반 접근 또는 딥 러닝 기반 접근으로 분류된다. 기계 학습 접근의 경우 우선 아래의 방식들 가운데 하나를 사용하여 정의한 다음 서포트 벡터 머신(SVM) 등의 기법을 사용하여 분류하는 일이 필요하다. 한편, 딥 러닝 기법은 기능을 구체적으로 정의하지 않고서도 단대단 객체 탐지를 할 수 있으며 합성곱 신경망(CNN)에 기반을 두는 것이 보통이다.

기계 학습 접근:
- 하르 특징(Haar features) 기반 비올라–존스 객체탐지 프레임워크
- 척도 불변 특징 변환(Scale-invariant feature transform, SIFT)
- 경사지향 히스토그램(Histogram of oriented gradients, HOG) 기능^[1]
딥 러닝 접근:
- Region Proposals (R-CNN,^[2] Fast R-CNN,^[3] Faster R-CNN^[4], cascade R-CNN^[5])
- Single Shot MultiBox Detector (SSD)^[6]
- You Only Look Once (YOLO)^[7]^[8]^[9]^[10]
- Single-Shot Refinement Neural Network for Object Detection (RefineDet)^[11]
- Retina-Net^[12]^[5]
- Deformable convolutional networks^[13]^[14]

같이 보기

객체 인식 개요

각주

↑ Dalal, Navneet (2005). “Histograms of oriented gradients for human detection” (PDF). 《Computer Vision and Pattern Recognition》 1. 2019년 6월 17일에 원본 문서 (PDF)에서 보존된 문서. 2019년 11월 15일에 확인함.
↑ Ross, Girshick (2014). “Rich feature hierarchies for accurate object detection and semantic segmentation” (PDF). 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》 (IEEE). doi:10.1109/CVPR.2014.81. 2019년 12월 2일에 원본 문서 (PDF)에서 보존된 문서. 2019년 11월 15일에 확인함.
↑ Girschick, Ross (2015). “Fast R-CNN” (PDF). 《Proceedings of the IEEE International Conference on Computer Vision》: 1440–1448. arXiv:1504.08083. 2019년 10월 31일에 원본 문서 (PDF)에서 보존된 문서. 2019년 11월 15일에 확인함.
↑ Shaoqing, Ren (2015). “Faster R-CNN” (PDF). 《Advances in Neural Information Processing Systems》. arXiv:1506.01497.
↑ ^가 ^나 Pang, Jiangmiao; Chen, Kai; Shi, Jianping; Feng, Huajun; Ouyang, Wanli; Lin, Dahua (2019년 4월 4일). “Libra R-CNN: Towards Balanced Learning for Object Detection”. arXiv:1904.02701v1 [cs.CV].
↑ Liu, Wei (October 2016). 《SSD: Single shot multibox detector》. 《European Conference on Computer Vision》. Lecture Notes in Computer Science 9905. 21–37쪽. arXiv:1512.02325. doi:10.1007/978-3-319-46448-0_2. ISBN 978-3-319-46447-3.
↑ Redmon, Joseph (2016). “You only look once: Unified, real-time object detection”. 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》. arXiv:1506.02640.
↑ Redmon, Joseph (2017). “YOLO9000: better, faster, stronger”. arXiv:1612.08242 [cs.CV].
↑ Redmon, Joseph (2018). “Yolov3: An incremental improvement”. arXiv:1804.02767 [cs.CV].
↑ Bochkovskiy, Alexey (2020). “Yolov4: Optimal Speed and Accuracy of Object Detection”. arXiv:2004.10934 [cs.CV].
↑ Zhang, Shifeng (2018). 《Single-Shot Refinement Neural Network for Object Detection》. 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》. 4203–4212쪽. arXiv:1711.06897.
↑ Lin, Tsung-Yi (2020). “Focal Loss for Dense Object Detection”. 《IEEE Transactions on Pattern Analysis and Machine Intelligence》 42 (2): 318–327. arXiv:1708.02002. Bibcode:2017arXiv170802002L. doi:10.1109/TPAMI.2018.2858826. PMID 30040631. S2CID 47252984.
↑ Zhu, Xizhou (2018). “Deformable ConvNets v2: More Deformable, Better Results”. arXiv:1811.11168 [cs.CV].
↑ Dai, Jifeng (2017). “Deformable Convolutional Networks”. arXiv:1703.06211 [cs.CV].

“Object Class Detection”. Vision.eecs.ucf.edu. 2013년 7월 14일에 원본 문서에서 보존된 문서. 2013년 10월 9일에 확인함.
“ETHZ - Computer Vision Lab: Publications”. Vision.ee.ethz.ch. 2013년 6월 3일에 원본 문서에서 보존된 문서. 2013년 10월 9일에 확인함.

외부 링크

[1] Dalal, Navneet (2005). “Histograms of oriented gradients for human detection” (PDF). 《Computer Vision and Pattern Recognition》 1. 2019년 6월 17일에 원본 문서 (PDF)에서 보존된 문서. 2019년 11월 15일에 확인함.

[2] Ross, Girshick (2014). “Rich feature hierarchies for accurate object detection and semantic segmentation” (PDF). 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》 (IEEE). doi:10.1109/CVPR.2014.81. 2019년 12월 2일에 원본 문서 (PDF)에서 보존된 문서. 2019년 11월 15일에 확인함.

[3] Girschick, Ross (2015). “Fast R-CNN” (PDF). 《Proceedings of the IEEE International Conference on Computer Vision》: 1440–1448. arXiv:1504.08083. 2019년 10월 31일에 원본 문서 (PDF)에서 보존된 문서. 2019년 11월 15일에 확인함.

[4] Shaoqing, Ren (2015). “Faster R-CNN” (PDF). 《Advances in Neural Information Processing Systems》. arXiv:1506.01497.

[Pang_Chen_Shi_Feng_2019-5] 가 ^나 Pang, Jiangmiao; Chen, Kai; Shi, Jianping; Feng, Huajun; Ouyang, Wanli; Lin, Dahua (2019년 4월 4일). “Libra R-CNN: Towards Balanced Learning for Object Detection”. arXiv:1904.02701v1 [cs.CV].

[6] Liu, Wei (October 2016). 《SSD: Single shot multibox detector》. 《European Conference on Computer Vision》. Lecture Notes in Computer Science 9905. 21–37쪽. arXiv:1512.02325. doi:10.1007/978-3-319-46448-0_2. ISBN 978-3-319-46447-3.

[7] Redmon, Joseph (2016). “You only look once: Unified, real-time object detection”. 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》. arXiv:1506.02640.

[8] Redmon, Joseph (2017). “YOLO9000: better, faster, stronger”. arXiv:1612.08242 [cs.CV].

[9] Redmon, Joseph (2018). “Yolov3: An incremental improvement”. arXiv:1804.02767 [cs.CV].

[yolov4-10] Bochkovskiy, Alexey (2020). “Yolov4: Optimal Speed and Accuracy of Object Detection”. arXiv:2004.10934 [cs.CV].

[11] Zhang, Shifeng (2018). 《Single-Shot Refinement Neural Network for Object Detection》. 《Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition》. 4203–4212쪽. arXiv:1711.06897.

[12] Lin, Tsung-Yi (2020). “Focal Loss for Dense Object Detection”. 《IEEE Transactions on Pattern Analysis and Machine Intelligence》 42 (2): 318–327. arXiv:1708.02002. Bibcode:2017arXiv170802002L. doi:10.1109/TPAMI.2018.2858826. PMID 30040631. S2CID 47252984.

[13] Zhu, Xizhou (2018). “Deformable ConvNets v2: More Deformable, Better Results”. arXiv:1811.11168 [cs.CV].

[14] Dai, Jifeng (2017). “Deformable Convolutional Networks”. arXiv:1703.06211 [cs.CV].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]