论文标题
音乐流派分类
Convolution channel separation and frequency sub-bands aggregation for music genre classification
论文作者
论文摘要
在音乐中,音调和节奏等短期特征构成了旋律和叙事等长期语义特征。音乐类型分类(MGC)系统应该能够分析这些功能。在这项研究中,我们提出了一个新型框架,可以从层次上提取和汇总短期和长期特征。我们的框架基于ECAPA-TDNN,其中所有提取短期特征的层都受到由于背部传播训练而提取长期特征的层的影响。为了防止短期特征的扭曲,我们设计了将短期特征与长期特征提取路径分开的卷积通道分离技术。为了从我们的框架中提取更多样化的特征,我们结合了频率子频带聚合方法,该方法将输入频谱图沿每个段沿频率带宽和过程进行了分配。我们使用Melon Playlist数据集评估了我们的框架,该数据集是一个大规模数据集,其中包含600倍的数据,该数据是MGC研究中广泛使用的数据集的600倍。结果,我们的框架达到了70.4%的准确性,与常规框架相比,它的精度提高了16.9%。
In music, short-term features such as pitch and tempo constitute long-term semantic features such as melody and narrative. A music genre classification (MGC) system should be able to analyze these features. In this research, we propose a novel framework that can extract and aggregate both short- and long-term features hierarchically. Our framework is based on ECAPA-TDNN, where all the layers that extract short-term features are affected by the layers that extract long-term features because of the back-propagation training. To prevent the distortion of short-term features, we devised the convolution channel separation technique that separates short-term features from long-term feature extraction paths. To extract more diverse features from our framework, we incorporated the frequency sub-bands aggregation method, which divides the input spectrogram along frequency bandwidths and processes each segment. We evaluated our framework using the Melon Playlist dataset which is a large-scale dataset containing 600 times more data than GTZAN which is a widely used dataset in MGC studies. As the result, our framework achieved 70.4% accuracy, which was improved by 16.9% compared to a conventional framework.