现实世界多声源2D本地化的数据效率框架

论文标题

现实世界多声源2D本地化的数据效率框架

Data-Efficient Framework for Real-world Multiple Sound Source 2D Localization

论文作者

Moing, Guillaume Le, Vinayavekhin, Phongtharin, Agravante, Don Joven, Inoue, Tadanobu, Vongkulbhisal, Jayakorn, Munawar, Asim, Tachibana, Ryuki

论文摘要

深度神经网络最近导致了多个声源定位的任务带来了令人鼓舞的结果。但是，他们需要大量的训练数据来涵盖各种声学条件和麦克风阵列布局。一个人可以利用声学模拟器来廉价地生成标记的训练数据。但是，由于域不匹配，对合成数据训练的模型往往会在现实世界中的记录效果不佳。此外，学习不同麦克风阵列布局的学习使任务更加复杂，这是由于可能的布局数量无限。我们建议使用对抗性学习方法来缩小合成和真实领域之间的差距。我们新颖的合奏歧视方法可显着提高本地化性能，而无需从真实数据中获得任何标签。此外，我们提出了一个新颖的显式转换层，将其嵌入到本地化体系结构中。它使该模型能够接受来自特定麦克风阵列布局的数据训练，同时在推理过程中概括地概括了看不见的布局。

Deep neural networks have recently led to promising results for the task of multiple sound source localization. Yet, they require a lot of training data to cover a variety of acoustic conditions and microphone array layouts. One can leverage acoustic simulators to inexpensively generate labeled training data. However, models trained on synthetic data tend to perform poorly with real-world recordings due to the domain mismatch. Moreover, learning for different microphone array layouts makes the task more complicated due to the infinite number of possible layouts. We propose to use adversarial learning methods to close the gap between synthetic and real domains. Our novel ensemble-discrimination method significantly improves the localization performance without requiring any label from the real data. Furthermore, we propose a novel explicit transformation layer to be embedded in the localization architecture. It enables the model to be trained with data from specific microphone array layouts while generalizing well to unseen layouts during inference.

下载PDF全文

下载文献需遵守相关版权规定

论文标题