ACCDOA：活动耦合的笛卡尔到达代表方向，以进行声音事件定位和检测

论文标题

ACCDOA：活动耦合的笛卡尔到达代表方向，以进行声音事件定位和检测

ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

论文作者

Shimada, Kazuki, Koyama, Yuichiro, Takahashi, Naoya, Takahashi, Shusuke, Mitsufuji, Yuki

论文摘要

基于神经网络（NN）的方法在声音事件定位和检测（SELD）中显示出高性能。基于NN的常规方法使用两个分支来进行声音事件检测（SED）目标和一个排序方向（DOA）目标。具有单个网络的两分支表示形式必须决定如何在优化过程中平衡两个目标。使用专用于每个任务的两个网络增加了系统的复杂性和网络大小。为了解决这些问题，我们提出了一个活动耦合的笛卡尔DOA（ACCDOA）表示，该表示将声音事件活动分配给相应的笛卡尔DOA矢量的长度。 ACCDOA表示使我们能够用一个目标解决SELD任务，并具有两个优点：避免有必要平衡目标和模型大小的增加。在使用Dcase 2020 Task 3数据集的实验评估中，ACCDOA表示的表现优于较小的网络大小的SELD指标的两个分支表示。就本地化和位置依赖性检测而言，基于ACCDOA的SELD系统的性能也比最先进的SELD系统更好。

Neural-network (NN)-based methods show high performance in sound event localization and detection (SELD). Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target. The two-branch representation with a single network has to decide how to balance the two objectives during optimization. Using two networks dedicated to each task increases system complexity and network size. To address these problems, we propose an activity-coupled Cartesian DOA (ACCDOA) representation, which assigns a sound event activity to the length of a corresponding Cartesian DOA vector. The ACCDOA representation enables us to solve a SELD task with a single target and has two advantages: avoiding the necessity of balancing the objectives and model size increase. In experimental evaluations with the DCASE 2020 Task 3 dataset, the ACCDOA representation outperformed the two-branch representation in SELD metrics with a smaller network size. The ACCDOA-based SELD system also performed better than state-of-the-art SELD systems in terms of localization and location-dependent detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题