Occlusion-Free Road Segmentation Leveraging Semantics for Autonomous Vehicles
Abstract
:1. Introduction
- We analyze the occlusion problem in road detection and propose the novel task of occlusion-free road segmentation in the semantic domain, which infers the occluded road area using semantic features of the dynamic scenes.
- To complete this task, we create a small but efficient dataset based on the popular KITTI dataset named the KITTI-OFRS dataset, design a lightweight and efficient encoder–decoder fully convolution network referred to as OFRSNet and optimize the cross-entropy loss for the task by adding a spatially-dependent weight that could significantly increase the accuracy.
- We elaborately design the architecture of OFRSNet to obtain a good trade-off between accuracy and runtime. The down-sampling block and joint context up-sampling block in the network are designed to effectively capture the contextual features that are essential for the occlusion reasoning process and increase the generalization ability of the model.
2. Related Works
3. Occlusion-Free Road Segmentation
3.1. Task Definition
3.2. Network Architecture
3.3. Loss Function
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Results and Analysis
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Oliveira, G.L.; Burgard, W.; Brox, T. Efficient deep models for monocular road segmentation. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Korea, 9–14 October 2016; pp. 4885–4891. [Google Scholar]
- Mendes, C.C.T.; Fremont, V.; Wolf, D.F. Exploiting Fully Convolutional Neural Networks for Fast Road Detection. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 16–21 May 2016; Okamura, A., Menciassi, A., Ude, A., Burschka, D., Lee, D., Arrichiello, F., Liu, H., Eds.; IEEE: New York, NY, USA, 2016; pp. 3174–3179. [Google Scholar]
- Zhang, X.; Chen, Z.; Wu, Q.M.J.; Cai, L.; Lu, D.; Li, X. Fast Semantic Segmentation for Scene Perception. IEEE Trans. Ind. Inform. 2019, 15, 1183–1192. [Google Scholar] [CrossRef]
- Wang, B.; Fremont, V.; Rodriguez, S.A. Color-based Road Detection and its Evaluation on the KITTI Road Benchmark. In Proceedings of the 2014 IEEE Intelligent Vehicles Symposium Proceedings, Dearborn, MI, USA, 8–11 June 2014; pp. 31–36. [Google Scholar]
- Song, X.; Rui, T.; Zhang, S.; Fei, J.; Wang, X. A road segmentation method based on the deep auto-encoder with supervised learning. Comput. Electr. Eng. 2018, 68, 381–388. [Google Scholar] [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 Ieee Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Mano, K.; Masuzawa, H.; Miura, J.; Ardiyanto, I. Road Boundary Estimation for Mobile Robot Using Deep Learning and Particle Filter; IEEE: New York, NY, USA, 2018; pp. 1545–1550. [Google Scholar]
- Li, K.; Shao, J.; Guo, D. A Multi-Feature Search Window Method for Road Boundary Detection Based on LIDAR Data. Sensors 2019, 19, 1551. [Google Scholar] [CrossRef]
- Khalilullah, K.M.I.; Jindai, M.; Ota, S.; Yasuda, T. Fast Road Detection Methods on a Large Scale Dataset for assisting robot navigation Using Kernel Principal Component Analysis and Deep Learning; IEEE: New York, NY, USA, 2018; pp. 798–803. [Google Scholar]
- Son, J.; Yoo, H.; Kim, S.; Sohn, K. Real-time illumination invariant lane detection for lane departure warning system. Expert Syst. Appl. 2015, 42, 1816–1824. [Google Scholar] [CrossRef]
- Li, Q.; Zhou, J.; Li, B.; Guo, Y.; Xiao, J. Robust Lane-Detection Method for Low-Speed Environments. Sensors 2018, 18, 4274. [Google Scholar] [CrossRef] [PubMed]
- Cao, J.; Song, C.; Song, S.; Xiao, F.; Peng, S. Lane Detection Algorithm for Intelligent Vehicles in Complex Road Conditions and Dynamic Environments. Sensors 2019, 19, 3166. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Deng, Z. Segmentation of Drivable Road Using Deep Fully Convolutional Residual Network with Pyramid Pooling. Cogn. Comput. 2017, 10, 272–281. [Google Scholar] [CrossRef]
- Cai, Y.; Li, D.; Zhou, X.; Mou, X. Robust Drivable Road Region Detection for Fixed-Route Autonomous Vehicles Using Map-Fusion Images. Sensors 2018, 18, 4158. [Google Scholar] [CrossRef] [PubMed]
- Aly, M. Real time Detection of Lane Markers in Urban Streets. In Proceedings of the Intelligent Vehicles Symposium, Eindhoven, The Netherlands, 4–6 June 2008. [Google Scholar]
- Laddha, A.; Kocamaz, M.K.; Navarro-Serment, L.E.; Hebert, M. Map-supervised road detection. In Proceedings of the Intelligent Vehicles Symposium, Gothenburg, Sweden, 19–22 June 2016; pp. 118–123. [Google Scholar]
- Alvarez, J.M.; Salzmann, M.; Barnes, N. Learning Appearance Models for Road Detection. In Proceedings of the Intelligent Vehicles Symposium, Gold Coast, QLD, Australia, 23–26 June 2013. [Google Scholar]
- Badrinarayanan, V.; Handa, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling. arXiv 2015, arXiv:1505.07293. [Google Scholar]
- Chen, L.-C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
- Suleymanov, T.; Amayo, P.; Newman, P. Inferring Road Boundaries Through and Despite Traffic. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018; pp. 409–416. [Google Scholar]
- Becattini, F.; Berlincioni, L.; Galteri, L.; Seidenari, L.; Del Bimbo, A. Semantic Road Layout Understanding by Generative Adversarial Inpainting. arXiv 2018, arXiv:1805.11746. [Google Scholar]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Romera, E.; Álvarez, J.M.; Bergasa, L.M.; Arroyo, R. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation. IEEE Trans. Intell. Transp. Syst. 2017, 19, 263–272. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond. arXiv 2019, arXiv:1904.11492. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Yang, M.; Yu, K.; Zhang, C.; Li, Z.; Yang, K. DenseASPP for Semantic Segmentation in Street Scenes. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3684–3692. [Google Scholar]
- Zheng, S.; Jayasumana, S.; Romera-Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H.S. Conditional Random Fields as Recurrent Neural Networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
- Canny, J. A Computational Approach to Edge Detection. IEEE Trans.Pattern Anal. Mach. Intell. 1986, 8, 679–698. [Google Scholar] [CrossRef] [PubMed]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- PyTorch. Available online: http://pytorch.org/ (accessed on 1 September 2019).
- Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent; Physica-Verlag HD: Heidelberg, Germany, 2010; pp. 177–186. [Google Scholar]
Stage | Block Type | Size |
---|---|---|
Encoder | Context Down-sampling | 192 × 624 × 16 |
Context Down-sampling | 96 × 312 × 32 | |
Factorized blocks | 96 × 312 × 32 | |
Context Down-sampling | 48 × 156 × 64 | |
Dilated blocks | 48 × 156 × 64 | |
Context down-sampling | 24 × 78 × 128 | |
Dilated blocks | 24 × 78 × 128 | |
Decoder | Joint Context Up-sampling | 48 × 156 × 64 |
Bottleneck Blocks | 48 × 156 × 64 | |
Joint Context Up-sampling | 96 × 312 × 32 | |
Bottleneck Blocks | 96 × 312 × 32 | |
Joint Context Up-sampling | 192 × 624 × 16 | |
Bottleneck Blocks | 192 × 624 × 16 | |
Deconv | 384 × 1248 × 2 |
Model | Parameters | GFLOPs | FPS | ACC | PRE | REC | F1 | IoU |
---|---|---|---|---|---|---|---|---|
ENet | 0.37M | 3.83 | 52 | 91.8% | 92.1% | 89.3% | 90.7% | 82.9% |
ERFNet | 2.06M | 24.43 | 25 | 92.3% | 92.6% | 89.7% | 91.2% | 83.8% |
SegNet | 29.46M | 286.03 | 16 | 92.9% | 93.6% | 90.2% | 91.8% | 84.9% |
ORBNet | 1.91M | 48.48 | 11.5 | 92.7% | 93.4% | 89.9% | 91.6% | 84.5% |
OFRSNet | 0.39M | 2.99 | 46 | 93.2% | 94.2% | 90.3% | 92.2% | 85.5% |
Model | ACC | PRE | REC | F1 | IoU |
---|---|---|---|---|---|
ENet | 90.4%(−1.4%) | 90.5%(−1.6%) | 87.6%(−1.7%) | 89.0%(−1.7%) | 80.2%(−2.7%) |
ERFNet | 90.5%(−1.8%) | 90.9%(−1.7%) | 87.3%(−2.4%) | 89.1%(−2.1%) | 80.3%(−3.5%) |
SegNet | 92.1%(−0.8%) | 92.6%(−1.0%) | 89.4%(−0.8%) | 91.0%(−0.8%) | 83.5%(−1.4%) |
ORBNet | 91.5% (−1.2%) | 92.2% (−1.2%) | 88.4% (−1.5%) | 90.2% (−1.4%) | 82.2% (−2.3%) |
OFRSNet | 91.7%(−1.5%) | 92.4%(−1.8%) | 88.6%(−1.7%) | 90.5%(−1.7%) | 82.6%(−2.9%) |
Model | Context | Parameters | GFLOPs | ACC | PRE | REC | F1 | IoU |
---|---|---|---|---|---|---|---|---|
OFRSNet | w/o | 0.34M | 2.96 | 92.7% | 92.8% | 90.4% | 91.6% | 84.5% |
OFRSNet | w/ | 0.39M | 2.99 | 93.2% | 94.2% | 90.3% | 92.2% | 85.5% |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, K.; Yan, F.; Zou, B.; Tang, L.; Yuan, Q.; Lv, C. Occlusion-Free Road Segmentation Leveraging Semantics for Autonomous Vehicles. Sensors 2019, 19, 4711. https://doi.org/10.3390/s19214711
Wang K, Yan F, Zou B, Tang L, Yuan Q, Lv C. Occlusion-Free Road Segmentation Leveraging Semantics for Autonomous Vehicles. Sensors. 2019; 19(21):4711. https://doi.org/10.3390/s19214711
Chicago/Turabian StyleWang, Kewei, Fuwu Yan, Bin Zou, Luqi Tang, Quan Yuan, and Chen Lv. 2019. "Occlusion-Free Road Segmentation Leveraging Semantics for Autonomous Vehicles" Sensors 19, no. 21: 4711. https://doi.org/10.3390/s19214711
APA StyleWang, K., Yan, F., Zou, B., Tang, L., Yuan, Q., & Lv, C. (2019). Occlusion-Free Road Segmentation Leveraging Semantics for Autonomous Vehicles. Sensors, 19(21), 4711. https://doi.org/10.3390/s19214711