论文标题
使用多光谱图编码器框架的强大声学场景分类
Robust Acoustic Scene Classification using a Multi-Spectrogram Encoder-Decoder Framework
论文作者
论文摘要
本文提出了一个用于声学场景分类(ASC)的编码器 - 模型网络模型,该模型是从其声学签名中识别音频录制场景的任务。我们利用前端的多个低级光谱图,通过训练有素的CNN-DNN前端编码器转化为更高级别的功能。然后将高水平的特征及其组合(通过训练有素的功能组合制剂)被馈入不同的解码器模型,其中包括随机森林回归,DNN和专家的混合物,以进行后端分类。我们报告了广泛的实验,以评估该框架对各种ASC数据集的准确性,包括在检测和分类声场景和事件(DCASE)2016年任务1,2017 Task 1,2018 Task 1 A&1B和2019 Task 1A和2019 Task 1A&1B时,包括Litis Rouen和IEEE AASP挑战挑战(DCASE)2016年任务1,2017 Task 1,2017 Task。实验结果突出了两个主要贡献。第一个是通过新颖的C-DNN体系结构编码网络从多光谱输入中提取高级特征的有效方法,第二个是提出的解码器,该解码器使框架能够在各种数据集中获得竞争性结果。单个框架在几个不同的挑战中具有高度竞争性的事实,这表明了其执行一般ASC任务的鲁棒性。
This article proposes an encoder-decoder network model for Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. We make use of multiple low-level spectrogram features at the front-end, transformed into higher level features through a well-trained CNN-DNN front-end encoder. The high level features and their combination (via a trained feature combiner) are then fed into different decoder models comprising random forest regression, DNNs and a mixture of experts, for back-end classification. We report extensive experiments to evaluate the accuracy of this framework for various ASC datasets, including LITIS Rouen and IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Task 1, 2017 Task 1, 2018 Tasks 1A & 1B and 2019 Tasks 1A & 1B. The experimental results highlight two main contributions; the first is an effective method for high-level feature extraction from multi-spectrogram input via the novel C-DNN architecture encoder network, and the second is the proposed decoder which enables the framework to achieve competitive results on various datasets. The fact that a single framework is highly competitive for several different challenges is an indicator of its robustness for performing general ASC tasks.