Crowd counting has gained widespread attention in the fields of public safety management, video surveillance, and emergency response. Currently, background interference and scale variation of the head are still intractable problems. We propose an attention-injective scale aggregation network (ASANet) to cope with the above problems. ASANet consists of three parts: shallow feature attention network (SFAN), multi-level feature aggregation (MLFA) module, and density map generation (DMG) network. SFAN effectively overcomes the noise impact of a cluttered background by cross-injecting the attention module in the truncated VGG16 structure. To fully utilize the multi-scale crowd information embedded in the feature layers at different positions, we densely connect the multi-layer feature maps in the MLFA module to solve the scale variation problem. In addition, to capture large-scale head information, the DMG network introduces successive dilated convolutional layers to further expand the receptive field of the model, thus improving the accuracy of crowd counting. We conduct extensive experiments on five public datasets (ShanghaiTech Part_A, ShanghaiTech Part_B, UCF_QNRF, UCF_CC_50, JHU-Crowd++), and the results show that ASANet outperforms most of the existing methods in terms of counting and at the same time demonstrates satisfactory superiority in dealing with background noise in different scenes. |
ACCESS THE FULL ARTICLE
No SPIE Account? Create one
Head
Visualization
Education and training
Convolution
Data modeling
Feature extraction
Quantum networks