共同最佳的脱泽，取代和源分离

论文标题

共同最佳的脱泽，取代和源分离

Jointly optimal denoising, dereverberation, and source separation

论文作者

Nakatani, Tomohiro, Boeddeker, Christoph, Kinoshita, Keisuke, Ikeshita, Rintaro, Delcroix, Marc, Haeb-Umbach, Reinhold

论文摘要

本文提出的方法可以以计算有效的方式优化卷积光束器（CBF），以共同执行非固定，覆盖和源分离（DN+DR+SS）。从传统上讲，由加权预测误差最小化（WPE）的级联配置（WPE）进行验证过滤器，然后使用最小差异无扭曲响应束缚器，是远场语音识别的最新前端，但是，无法保证这种方法的整体最佳性。在盲信处理区域中，已经提出了一种共同优化覆盖和源分离（DR+SS）的方法，但是，这种方法需要巨大的计算成本，并且尚未扩展以应用于DN+DR+SS。为了克服上述局限性，本文开发了以更有效的方式共同优化DN+DR+SS的新方法。为此，我们首先提出一个目标函数，以根据最大似然估计来优化用于执行DN+DR+SS的CBF，假设使用神经网络给出或可以估算目标信号的转向向量。本文指的是该目标函数优化的CBF作为加权最小动力失真响应（WMPDR）CBF。然后，我们得出了两种基于将CBF分配到WPE滤波器和波束形式的两种不同方法，以优化WMPDR CBF的两种算法。使用嘈杂的回响声音混合物的实验表明，与传统的级联反应相比，根据信号失真度量和ASR性能，提出的优化方法可大大提高语音增强的性能。还表明，与常规关节优化方法相比，提出的方法可以通过提高估计精度大大降低计算成本。

This paper proposes methods that can optimize a Convolutional BeamFormer (CBF) for jointly performing denoising, dereverberation, and source separation (DN+DR+SS) in a computationally efficient way. Conventionally, cascade configuration composed of a Weighted Prediction Error minimization (WPE) dereverberation filter followed by a Minimum Variance Distortionless Response beamformer has been usedas the state-of-the-art frontend of far-field speech recognition, however, overall optimality of this approach is not guaranteed. In the blind signal processing area, an approach for jointly optimizing dereverberation and source separation (DR+SS) has been proposed, however, this approach requires huge computing cost, and has not been extended for application to DN+DR+SS. To overcome the above limitations, this paper develops new approaches for jointly optimizing DN+DR+SS in a computationally much more efficient way. To this end, we first present an objective function to optimize a CBF for performing DN+DR+SS based on the maximum likelihood estimation, on an assumption that the steering vectors of the target signals are given or can be estimated, e.g., using a neural network. This paper refers to a CBF optimized by this objective function as a weighted Minimum-Power Distortionless Response (wMPDR) CBF. Then, we derive two algorithms for optimizing a wMPDR CBF based on two different ways of factorizing a CBF into WPE filters and beamformers. Experiments using noisy reverberant sound mixtures show that the proposed optimization approaches greatly improve the performance of the speech enhancement in comparison with the conventional cascade configuration in terms of the signal distortion measures and ASR performance. It is also shown that the proposed approaches can greatly reduce the computing cost with improved estimation accuracy in comparison with the conventional joint optimization approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题