轨道合奏事件的独立网络，用于复合声音事件的本地化和检测

论文标题

轨道合奏事件的独立网络，用于复合声音事件的本地化和检测

A Track-Wise Ensemble Event Independent Network for Polyphonic Sound Event Localization and Detection

论文作者

Hu, Jinbo, Cao, Yin, Wu, Ming, Kong, Qiuqiang, Yang, Feiran, Plumbley, Mark D., Yang, Jun

论文摘要

复音声音事件的定位和检测（SELD）旨在检测具有相应时间活动和空间位置的声音事件类型。在本文中，提出了具有新型数据增强方法的轨道合奏事件独立网络。提出的模型基于我们以前提出的与事件无关的网络V2，并通过构象构架和致密块扩展。提出了带有轨道输出格式的轨道合奏模型，以解决轨道输出格式的集合模型问题，即不同模型之间可能发生轨道排列。数据增强方法包含多个数据增强链，这些链由几个数据增强操作的随机组合组成。该方法还利用了对数 - 摩尔频谱图，强度向量和空间提示凸出的对数 - 光谱图（SALSA）的不同模型。我们在L3DAS22挑战的任务中评估了我们提出的方法，并以位于位置依赖的F-评分为0.699的最高排名解决方案。源代码已发布。

Polyphonic sound event localization and detection (SELD) aims at detecting types of sound events with corresponding temporal activities and spatial locations. In this paper, a track-wise ensemble event independent network with a novel data augmentation method is proposed. The proposed model is based on our previous proposed Event-Independent Network V2 and is extended by conformer blocks and dense blocks. The track-wise ensemble model with track-wise output format is proposed to solve an ensemble model problem for track-wise output format that track permutation may occur among different models. The data augmentation approach contains several data augmentation chains, which are composed of random combinations of several data augmentation operations. The method also utilizes log-mel spectrograms, intensity vectors, and Spatial Cues-Augmented Log-Spectrogram (SALSA) for different models. We evaluate our proposed method in the Task of the L3DAS22 challenge and obtain the top ranking solution with a location-dependent F-score to be 0.699. Source code is released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题