Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy
Abstract
:1. Introduction
- We propose a channel and spatial loss function that transfers the full relation of both the channel and spatial information in the feature map from a teacher network to a student network.
- We propose an adaptive cross entropy loss function, which adaptively exploits the ground truth labels and prediction results of the teacher networks to prevent error propagation from it.
2. Related Work
2.1. Semantic Segmentation
2.2. Efficient Semantic Segmentation
2.3. Knowledge Distillation
3. Our Approach
3.1. Channel and Spatial Correlation Loss Function
3.2. ACE Loss Function
3.3. Total loss function
4. Experiments
4.1. Dataset
4.2. Training Setup
4.3. Evaluation Metrics
4.4. Ablation Study
4.4.1. Effects of Each Loss Function
4.4.2. Effects of the Number of Channels of the Feature Map
4.4.3. Effects of Architectures of Student Networks
4.5. Comparative Results
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Shelhamer, E.; Long, J.; Darrell, T. Fully convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 640–651. [Google Scholar] [CrossRef] [PubMed]
- Noh, H.; Hong, S.; Han, B. Learning Deconvolution Network for Semantic Segmentation. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Peng, C.; Zhang, X.; Yu, G.; Luo, G.; Sun, J. Large Kernel Matters—Improve Semantic Segmentation by Global Convolutional Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Rabinovich, A.; Berg, A.C. ParseNet: Looking Wider to See Better. arXiv 2015, arXiv:1506.04579. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Ghiasi, G.; Fowlkes, C.C. Laplacian Reconstruction and Refinement for Semantic Segmentation. arXiv 2016, arXiv:1605.02264. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I.D. RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation. arXiv 2016, arXiv:1611.06612. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. arXiv 2016, arXiv:1612.01105. [Google Scholar]
- Chen, L.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611. [Google Scholar]
- Yuan, Y.; Wang, J. OCNet: Object Context Network for Scene Parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
- Lin, C.Y.; Chiu, Y.C.; Ng, H.F.; Shih, T.K.; Lin, K.H. Global-and-Local Context Network for Semantic Segmentation of Street View Images. Sensors 2020, 20, 2907. [Google Scholar] [CrossRef] [PubMed]
- Ko, T.Y.; Lee, S.H. Novel Method of Semantic Segmentation Applicable to Augmented Reality. Sensors 2020, 20, 1737. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv 2016, arXiv:1606.02147. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. arXiv 2017, arXiv:1704.08545. [Google Scholar]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Romera, E.; Álvarez, J.M.; Bergasa, L.M.; Arroyo, R. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation. IEEE Trans. Intell. Trans. Syst. 2018, 19, 263–272. [Google Scholar] [CrossRef]
- Kim, J.; Heo, Y.S. Efficient Semantic Segmentation Using Spatio-Channel Dilated Convolutions. IEEE Access 2019, 7, 154239–154252. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A Survey of Model Compression and Acceleration for Deep Neural Networks. arXiv 2017, arXiv:1710.09282. [Google Scholar]
- Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. FitNets: Hints for Thin Deep Nets. arXiv 2015, arXiv:1412.6550. [Google Scholar]
- Yim, J.; Joo, D.; Bae, J.; Kim, J. A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Srinivas, S.; Fleuret, F. Knowledge transfer with jacobian matching. arXiv 2018, arXiv:1803.00443. [Google Scholar]
- Wang, X.; Zhang, R.; Sun, Y.; Qi, J. KDGAN: knowledge distillation with generative adversarial networks. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS) 2018, Montréal, QC, Canada, 3–8 December 2018. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer. arXiv 2016, arXiv:1612.03928. [Google Scholar]
- Xie, J.; Shuai, B.; Hu, J.; Lin, J.; Zheng, W. Improving Fast Segmentation With Teacher-Student Learning. In Proceedings of the 29th British Machine Vision Conference (BMVC), Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Liu, Y.; Chen, K.; Liu, C.; Qin, Z.; Luo, Z.; Wang, J. Structured Knowledge Distillation for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. arXiv 2016, arXiv:1610.02357. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009. [Google Scholar]
- Sandler, M.; Howard, A.G.; Zhu, M.; Zhmoginov, A.; Chen, L. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation. arXiv 2018, arXiv:1801.04381. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv 2017, arXiv:1707.01083. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. arXiv 2015, arXiv:1512.00567. [Google Scholar]
- Siam, M.; Elkerdawy, S.; Jagersand, M.; Yogamani, S. Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges. In Proceedings of the 20th IEEE International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017. [Google Scholar]
- Iandola, F.N.; Moskewicz, M.W.; Ashraf, K.; Han, S.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Poudel, R.P.K.; Bonde, U.; Liwicki, S.; Zach, C. ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time. arXiv 2018, arXiv:1805.04554. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Poudel, R.P.; Liwicki, S.; Cipolla, R. Fast-scnn: Fast semantic segmentation network. arXiv 2019, arXiv:1902.04502. [Google Scholar]
- Hu, X.; Wang, H. Efficient Fast Semantic Segmentation Using Continuous Shuffle Dilated Convolutions. IEEE Access 2020, 8, 70913–70924. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Brostow, G.J.; Shotton, J.; Fauqueur, J.; Cipolla, R. Segmentation and Recognition Using Structure from Motion Point Clouds. In Proceedings of the 15th European Conference on Computer Vision (ECCV), Marseille, France, 12–18 October 2008. [Google Scholar]
Method | Val. mIoU (%) | Train. mIoU (%) | Test. mIoU (%) |
---|---|---|---|
Teacher | 74.43 | 79.23 | 72.55 |
Resnet34 (CE) | 67.94 | 72.35 | 64.87 |
Resnet34 (CSC + CE) | 68.99 | 72.79 | 66.43 |
Resnet34 (ACE) | 72.88 | 76.34 | 71.22 |
Resnet34 (CSC + ACE) | 73.28 | 77.05 | 72.36 |
Method | Val. mIoU (%) | Train. mIoU (%) | Test. mIoU (%) |
---|---|---|---|
Teacher | 74.43 | 79.23 | 72.55 |
Resnet34 (CSC + CE) | 68.99 | 72.79 | 66.43 |
Resnet34 (CSC_Eli + CE) | 67.72 | 72.13 | 64.80 |
Resnet34 (CSC_Pooling + CE) | 68.50 | 72.88 | 66.39 |
Network | #Params (M) | FLOPs (G) | Val. mIoU (%) | Train. mIoU (%) | Test. mIoU (%) |
---|---|---|---|---|---|
ERFNet [18] | 2.067 | 30.18 | 71.5 | n/a | 68.0 |
ICNet [16] | 28.30 | 74.02 | 67.7 | n/a | 69.5 |
ESPNet [17] | 0.36 | 5.55 | 61.4 | n/a | 60.3 |
BiseNet [40] | 5.8 | 30.35 | 74.8 | n/a | 74.7 |
Fast-SCNN [41] | 1.11 | 1.91 | 69.22 | n/a | 68.0 |
SegNet [4] | 29.45 | 326.77 | n/a | n/a | 56.1 |
PSPNet [10] | 49.08 | 369.49 | 78.38 | n/a | 78.4 |
DANet [32] | 68.50 | 552.67 | 81.50 | n/a | 81.5 |
OCNet [12] | 62.54 | 613.15 | 79.58 | n/a | 80.1 |
Network | #Params (M) | FLOPs (G) | Val. mIoU (%) | Train. mIoU (%) | Test. mIoU (%) | Proc. Time (s) | Memory Usage (GB) |
---|---|---|---|---|---|---|---|
Teacher | 41.05 | 104.03 | 74.43 | 79.23 | 72.55 | 0.1116 | 8.19 |
Resnet34 | 22.45 | 69.30 | 67.94 | 72.35 | 64.87 | 0.0382 | 2.09 |
Resnet34 (ours) | 73.28 | 77.05 | 72.36 | ||||
Resnet18 | 12.34 | 42.66 | 64.84 | 69.66 | 63.10 | 0.0299 | 1.81 |
Resnet18 (ours) | 70.65 | 76.49 | 69.70 | ||||
Mobilenet-V2 | 2.25 | 15.85 | 58.60 | 62.59 | 57.43 | 0.0292 | 2.41 |
Mobilenet-V2 (ours) | 66.30 | 68.20 | 64.71 |
Network | Val. mIoU (%) | Train. mIoU (%) | Test. mIoU (%) |
---|---|---|---|
Teacher | 75.05 | 81.11 | 70.73 |
Resnet34 | 62.96 | 65.88 | 57.90 |
Resnet34 (ours) | 70.18 | 76.99 | 65.25 |
Resnet18 | 58.59 | 63.19 | 55.63 |
Resnet18 (ours) | 68.40 | 75.07 | 63.60 |
Mobilenet-V2 | 58.10 | 59.83 | 51.79 |
Mobilenet-V2 (ours) | 68.43 | 73.37 | 60.67 |
Method | Val. mIoU (%) | Train. mIoU (%) | Test. mIoU (%) |
---|---|---|---|
CE | 67.94 | 72.35 | 64.87 |
MIMIC [22] + CE | 68.59 | 72.37 | 65.31 |
Pair-wise [28] + CE | 68.90 | 72.58 | 66.03 |
CSC + CE | 68.99 | 72.79 | 66.43 |
MIMIC [22] + ACE | 73.04 | 76.84 | 71.75 |
Pair-wise [28] + ACE | 73.25 | 77.00 | 72.25 |
CSC + ACE | 73.28 | 77.05 | 72.36 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, S.; Heo, Y.S. Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy. Sensors 2020, 20, 4616. https://doi.org/10.3390/s20164616
Park S, Heo YS. Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy. Sensors. 2020; 20(16):4616. https://doi.org/10.3390/s20164616
Chicago/Turabian StylePark, Sangyong, and Yong Seok Heo. 2020. "Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy" Sensors 20, no. 16: 4616. https://doi.org/10.3390/s20164616
APA StylePark, S., & Heo, Y. S. (2020). Knowledge Distillation for Semantic Segmentation Using Channel and Spatial Correlations and Adaptive Cross Entropy. Sensors, 20(16), 4616. https://doi.org/10.3390/s20164616