论文标题
SRP-DNN:学习多个移动声源本地化的直接路径阶段差异
SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization
论文作者
论文摘要
由于来源之间的相互作用,随时间变化的轨迹,扭曲的空间提示等之间的相互作用,在现实世界中的多个动态声源定位仍然是一个具有挑战性的问题。在这项工作中,我们建议使用深度学习技巧来学习竞争和时间变化的直接路径阶段,用于本地化多个移动声音源的本地化。因果卷积复发性神经网络旨在从每个麦克风对的信号中提取直接路径相位差序列。为了避免分配歧义和同时预测多个目标时遇到的不确定输出维度的问题,学习目标是以加权和格式设计的,该格式在汇总值中编码源活动中的源活动和直接路径相位差异。所有麦克风对的学习的直接路径差异可直接根据转向响应能力(SRP)的公式直接用于构建空间谱。这种基于深神经网络(DNN)的SRP方法称为SRP-DNN。来源的位置是通过从空间频谱中迭代检测和去除主要源的估计的,在这种方式中,源之间的相互作用减少了。模拟和现实世界数据的实验结果表明,在存在噪声和混响的情况下,所提出的方法的优越性。
Multiple moving sound source localization in real-world scenarios remains a challenging issue due to interaction between sources, time-varying trajectories, distorted spatial cues, etc. In this work, we propose to use deep learning techniques to learn competing and time-varying direct-path phase differences for localizing multiple moving sound sources. A causal convolutional recurrent neural network is designed to extract the direct-path phase difference sequence from signals of each microphone pair. To avoid the assignment ambiguity and the problem of uncertain output-dimension encountered when simultaneously predicting multiple targets, the learning target is designed in a weighted sum format, which encodes source activity in the weight and direct-path phase differences in the summed value. The learned direct-path phase differences for all microphone pairs can be directly used to construct the spatial spectrum according to the formulation of steered response power (SRP). This deep neural network (DNN) based SRP method is referred to as SRP-DNN. The locations of sources are estimated by iteratively detecting and removing the dominant source from the spatial spectrum, in which way the interaction between sources is reduced. Experimental results on both simulated and real-world data show the superiority of the proposed method in the presence of noise and reverberation.