论文标题

半监督声音事件检测的基于多元透明的注意力网络

A Multi-grained based Attention Network for Semi-supervised Sound Event Detection

论文作者

Hu, Ying, Zhu, Xiujuan, Li, Yunlong, Huang, Hao, He, Liang

论文摘要

声音事件检测(SED)是一项有趣但具有挑战性的任务,这是由于数据缺乏现实生活中的多种声音事件。本文介绍了半监督声音事件检测的基于多元素的注意网络(MGA-NET)。为了获得与声音事件相关的特征表示形式,剩余的混合卷积(RH-CONV)块旨在提高香草卷积提取时频功能的能力。此外,多元透明的注意(MGA)模块旨在学习从粗级别到高级级别的时间分辨率特征。使用MGA模块,网络可以用短或长时间捕获目标事件的特征,从而更准确地确定声音事件的发作和偏移。此外,为了有效提高平均教师(MT)方法的性能,引入了空间移位(SS)模块作为数据扰动机制,以增加数据的多样性。实验结果表明,MGA-NET的表现优于已发表的最先进的竞争对手,在验证和公众方面分别获得了53.27%和基于事件的宏F1(EB-F1)得分,0.709和0.739多声音检测分数(PSD)。

Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. This paper presents a multi-grained based attention network (MGA-Net) for semi-supervised sound event detection. To obtain the feature representations related to sound events, a residual hybrid convolution (RH-Conv) block is designed to boost the vanilla convolution's ability to extract the time-frequency features. Moreover, a multi-grained attention (MGA) module is designed to learn temporal resolution features from coarse-level to fine-level. With the MGA module,the network could capture the characteristics of target events with short- or long-duration, resulting in more accurately determining the onset and offset of sound events. Furthermore, to effectively boost the performance of the Mean Teacher (MT) method, a spatial shift (SS) module as a data perturbation mechanism is introduced to increase the diversity of data. Experimental results show that the MGA-Net outperforms the published state-of-the-art competitors, achieving 53.27% and 56.96% event-based macro F1 (EB-F1) score, 0.709 and 0.739 polyphonic sound detection score (PSDS) on the validation and public set respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源