Learning in the Frequency Domain

Xu, Kai; Qin, Minghai; Sun, Fei; Wang, Yuhao; Chen, Yen-Kuang; Ren, Fengbo

Computer Science > Computer Vision and Pattern Recognition

arXiv:2002.12416 (cs)

[Submitted on 27 Feb 2020 (v1), last revised 31 Mar 2020 (this version, v4)]

Title:Learning in the Frequency Domain

Authors:Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, Fengbo Ren

View PDF

Abstract:Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.

Comments:	Accepted to CVPR 2020; this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2002.12416 [cs.CV]
	(or arXiv:2002.12416v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2002.12416

Submission history

From: Kai Xu [view email]
[v1] Thu, 27 Feb 2020 19:57:55 UTC (4,627 KB)
[v2] Tue, 10 Mar 2020 20:41:05 UTC (4,628 KB)
[v3] Thu, 12 Mar 2020 01:13:45 UTC (4,628 KB)
[v4] Tue, 31 Mar 2020 23:40:51 UTC (4,628 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning in the Frequency Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning in the Frequency Domain

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators