Joint Multi-Scale Residual and Motion Feature Learning for Action Recognition
L Yang, Z Zhu, C Wang, P Wang, S Hei - Proceedings of the 2022 5th …, 2022 - dl.acm.org
L Yang, Z Zhu, C Wang, P Wang, S Hei
Proceedings of the 2022 5th International Conference on Artificial …, 2022•dl.acm.orgFor action recognition, two-stream networks consisting of RGB and optical flow has been
widely used, showing high recognition accuracy. However, optical flow computation is time-
consuming and requires a large amount of storage space, and the recognition efficiency is
very low. To alleviate this problem, we propose an Adaptive Multi-Scale Residual (AMSR)
module and a Long Short Term Motion Squeeze (LSMS) module, which are inserted into the
2D convolutional neural network to improve the accuracy of action recognition and achieve …
widely used, showing high recognition accuracy. However, optical flow computation is time-
consuming and requires a large amount of storage space, and the recognition efficiency is
very low. To alleviate this problem, we propose an Adaptive Multi-Scale Residual (AMSR)
module and a Long Short Term Motion Squeeze (LSMS) module, which are inserted into the
2D convolutional neural network to improve the accuracy of action recognition and achieve …
For action recognition, two-stream networks consisting of RGB and optical flow has been widely used, showing high recognition accuracy. However, optical flow computation is time-consuming and requires a large amount of storage space, and the recognition efficiency is very low. To alleviate this problem, we propose an Adaptive Multi-Scale Residual (AMSR) module and a Long Short Term Motion Squeeze (LSMS) module, which are inserted into the 2D convolutional neural network to improve the accuracy of action recognition and achieve a balance of accuracy and speed. The AMSR module adaptively fuses multi-scale feature maps to fully utilize the semantic information provided by deep feature maps and the detailed information provided by shallow feature maps. The LSMS module is a learnable lightweight motion feature extractor for learning long-term motion features of adjacent and non-adjacent frames, thus replacing the traditional optical flow and improving the accuracy of action recognition. Experimental results on UCF-101 and HMDB-51 datasets demonstrate that the method proposed in this paper achieves competitive performance compared to state-of-the-art methods with only a small increase in parameters and computational cost.

Showing the best result for this search. See all results