半监督声音事件检测的基于多元透明的注意力网络

论文标题

半监督声音事件检测的基于多元透明的注意力网络

A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

论文作者

Hu, Ying, Zhu, Xiujuan, Li, Yunlong, Huang, Hao, He, Liang

论文摘要

声音事件检测（SED）是一项有趣但具有挑战性的任务，这是由于数据缺乏现实生活中的多种声音事件。本文介绍了半监督声音事件检测的基于多元素的注意网络（MGA-NET）。为了获得与声音事件相关的特征表示形式，剩余的混合卷积（RH-CONV）块旨在提高香草卷积提取时频功能的能力。此外，多元透明的注意（MGA）模块旨在学习从粗级别到高级级别的时间分辨率特征。使用MGA模块，网络可以用短或长时间捕获目标事件的特征，从而更准确地确定声音事件的发作和偏移。此外，为了有效提高平均教师（MT）方法的性能，引入了空间移位（SS）模块作为数据扰动机制，以增加数据的多样性。实验结果表明，MGA-NET的表现优于已发表的最先进的竞争对手，在验证和公众方面分别获得了53.27％和基于事件的宏F1（EB-F1）得分，0.709和0.739多声音检测分数（PSD）。

Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. This paper presents a multi-grained based attention network (MGA-Net) for semi-supervised sound event detection. To obtain the feature representations related to sound events, a residual hybrid convolution (RH-Conv) block is designed to boost the vanilla convolution's ability to extract the time-frequency features. Moreover, a multi-grained attention (MGA) module is designed to learn temporal resolution features from coarse-level to fine-level. With the MGA module,the network could capture the characteristics of target events with short- or long-duration, resulting in more accurately determining the onset and offset of sound events. Furthermore, to effectively boost the performance of the Mean Teacher (MT) method, a spatial shift (SS) module as a data perturbation mechanism is introduced to increase the diversity of data. Experimental results show that the MGA-Net outperforms the published state-of-the-art competitors, achieving 53.27% and 56.96% event-based macro F1 (EB-F1) score, 0.709 and 0.739 polyphonic sound detection score (PSDS) on the validation and public set respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题