论文标题
在频域中学习
Learning in the Frequency Domain
论文作者
论文摘要
深度神经网络在计算机视觉任务上取得了巨大的成功。现有的神经网络主要在具有固定输入大小的空间域中运行。对于实际应用,图像通常很大,必须将其删除为神经网络的预定输入大小。即使下采样操作减少了计算和所需的通信带宽,它也会删除冗余和显着信息,从而导致准确降解。受数字信号处理理论的启发,我们从频率角度分析了频谱偏置,并提出了一种基于学习的频率选择方法,以识别可以在不准确损失的情况下删除的微不足道频率组件。频域中的学习方法利用了众所周知的神经网络的相同结构,例如Resnet-50,MobilenetV2和Mask R-CNN,同时接受频域信息作为输入。实验结果表明,静态通道选择的频域中学习可以比传统的空间下采样方法获得更高的精度,同时进一步降低了输入数据的大小。专门针对具有相同输入大小的成像网分类,该方法分别在Resnet-50和MobilenetV2上获得了1.41%和0.66%的TOP-1精度。即使输入尺寸的一半,提出的方法仍然可以提高Resnet-50的前1位准确性1%。此外,我们观察到蒙版R-CNN的平均精度提高了0.8%,例如在可可数据集上的分割。
Deep neural networks have achieved remarkable success in computer vision tasks. Existing neural networks mainly operate in the spatial domain with fixed input sizes. For practical applications, images are usually large and have to be downsampled to the predetermined input size of neural networks. Even though the downsampling operations reduce computation and the required communication bandwidth, it removes both redundant and salient information obliviously, which results in accuracy degradation. Inspired by digital signal processing theories, we analyze the spectral bias from the frequency perspective and propose a learning-based frequency selection method to identify the trivial frequency components which can be removed without accuracy loss. The proposed method of learning in the frequency domain leverages identical structures of the well-known neural networks, such as ResNet-50, MobileNetV2, and Mask R-CNN, while accepting the frequency-domain information as the input. Experiment results show that learning in the frequency domain with static channel selection can achieve higher accuracy than the conventional spatial downsampling approach and meanwhile further reduce the input data size. Specifically for ImageNet classification with the same input size, the proposed method achieves 1.41% and 0.66% top-1 accuracy improvements on ResNet-50 and MobileNetV2, respectively. Even with half input size, the proposed method still improves the top-1 accuracy on ResNet-50 by 1%. In addition, we observe a 0.8% average precision improvement on Mask R-CNN for instance segmentation on the COCO dataset.