MSSD: multi-scale self-distillation for object detection
Z Jia, S Sun, G Liu, B Liu - Visual Intelligence, 2024 - Springer
Z Jia, S Sun, G Liu, B Liu
Visual Intelligence, 2024•SpringerAbstract Knowledge distillation techniques have been widely used in the field of deep
learning, usually by extracting valid information from a neural network with a large number of
parameters and a high learning capacity (the teacher model) to a neural network with a
small number of parameters and a low learning capacity (the student model). However,
there are inefficiencies in the transfer of knowledge between teacher and student. The
student model does not fully learn all the knowledge of the teacher model. Therefore, we aim …
learning, usually by extracting valid information from a neural network with a large number of
parameters and a high learning capacity (the teacher model) to a neural network with a
small number of parameters and a low learning capacity (the student model). However,
there are inefficiencies in the transfer of knowledge between teacher and student. The
student model does not fully learn all the knowledge of the teacher model. Therefore, we aim …
Abstract
Knowledge distillation techniques have been widely used in the field of deep learning, usually by extracting valid information from a neural network with a large number of parameters and a high learning capacity (the teacher model) to a neural network with a small number of parameters and a low learning capacity (the student model). However, there are inefficiencies in the transfer of knowledge between teacher and student. The student model does not fully learn all the knowledge of the teacher model. Therefore, we aim to achieve knowledge distillation of our network layer by a single model, i.e., self-distillation. We also apply the idea of self-distillation to the object detection task and propose a multi-scale self-distillation approach, where we argue that knowledge distillation of the information contained in feature maps at different scales can help the model better detect small targets. In addition, we propose a Gaussian mask based on the target region as an auxiliary detection method to improve the accuracy of target position detection in the distillation process. We then validate our approach on the KITTI dataset using a single-stage detector YOLO. The results demonstrate a 2.8% improvement in accuracy over the baseline model without the use of a teacher model.
Springer
Showing the best result for this search. See all results