论文标题
接受音乐分类和标记的接收场正规CNN
Receptive-Field Regularized CNNs for Music Classification and Tagging
论文作者
论文摘要
卷积神经网络(CNN)已成功地用于各种音乐信息检索(MIR)任务,无论是端到端模型还是更复杂的系统的功能提取器。但是,MIR场仍然由经典的基于VGG的CNN体系结构变体主导,通常与更复杂的模块(例如注意力)和/或技术(例如大型数据集中的预训练)结合使用。 MIR中很少使用更深的模型,例如Resnet(Resnet)超过了其他域的VGG。正如我们将要说的那样,造成这种情况的主要原因之一是缺乏音乐领域中更深层次的CNN的概括。在本文中,我们提出了一种基于精心设计的正规化策略,使其对与音乐有关的任务进行竞争的深层体系结构。特别是,我们分析了最近引入的接受场正则化和摇动,并表明它们会显着改善与音乐相关任务的深入CNN的概括,并且所得的深CNN可以超越当前更复杂的模型,例如CNNS,随着预先侵略和注意力而增强。我们在两个不同的MIR任务和两个相应的数据集上证明了这一点,从而为这些数据集提供了深层正规化的CNN,作为这些数据集的新基线,这些基线也可以用作将来的功能提取模块。
Convolutional Neural Networks (CNNs) have been successfully used in various Music Information Retrieval (MIR) tasks, both as end-to-end models and as feature extractors for more complex systems. However, the MIR field is still dominated by the classical VGG-based CNN architecture variants, often in combination with more complex modules such as attention, and/or techniques such as pre-training on large datasets. Deeper models such as ResNet -- which surpassed VGG by a large margin in other domains -- are rarely used in MIR. One of the main reasons for this, as we will show, is the lack of generalization of deeper CNNs in the music domain. In this paper, we present a principled way to make deep architectures like ResNet competitive for music-related tasks, based on well-designed regularization strategies. In particular, we analyze the recently introduced Receptive-Field Regularization and Shake-Shake, and show that they significantly improve the generalization of deep CNNs on music-related tasks, and that the resulting deep CNNs can outperform current more complex models such as CNNs augmented with pre-training and attention. We demonstrate this on two different MIR tasks and two corresponding datasets, thus offering our deep regularized CNNs as a new baseline for these datasets, which can also be used as a feature-extracting module in future, more complex approaches.