论文标题
通过多任务来检测声音事件和柔和场景标签的场景
Sound Event Detection by Multitask Learning of Sound Events and Scenes with Soft Scene Labels
论文作者
论文摘要
声音事件检测(SED)和声学场景分类(ASC)是环境声音分析中的主要任务。考虑到声音事件和场景彼此密切相关,一些作品已经针对基于多任务学习(MTL)的声音事件和声学场景的联合分析,其中声音事件和场景的知识可以帮助估算它们。常规的基于MTL的方法利用一热场景标签来训练声音事件与场景之间的关系。因此,常规方法无法建模声音事件和场景相关的程度。但是,在真实的环境中,常见的声音事件可能发生在某些声学场景中。另一方面,一些声音事件仅发生在有限的声学场景中。在本文中,我们使用声学场景的软标签提出了一种基于SED和ASC的SED的新方法,这使我们能够对声音事件和场景相关的程度进行建模。使用TUT声音事件2016/2017和TUT声学场景2016数据集进行的实验表明,与常规MTL基于MTL的SED相比,F-评分的SED性能在F-评分中提高了3.80%。
Sound event detection (SED) and acoustic scene classification (ASC) are major tasks in environmental sound analysis. Considering that sound events and scenes are closely related to each other, some works have addressed joint analyses of sound events and acoustic scenes based on multitask learning (MTL), in which the knowledge of sound events and scenes can help in estimating them mutually. The conventional MTL-based methods utilize one-hot scene labels to train the relationship between sound events and scenes; thus, the conventional methods cannot model the extent to which sound events and scenes are related. However, in the real environment, common sound events may occur in some acoustic scenes; on the other hand, some sound events occur only in a limited acoustic scene. In this paper, we thus propose a new method for SED based on MTL of SED and ASC using the soft labels of acoustic scenes, which enable us to model the extent to which sound events and scenes are related. Experiments conducted using TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets show that the proposed method improves the SED performance by 3.80% in F-score compared with conventional MTL-based SED.