MLNET：一种自适应多个接受性场所注意神经网络，用于语音活动检测

论文标题

MLNET：一种自适应多个接受性场所注意神经网络，用于语音活动检测

MLNET: An Adaptive Multiple Receptive-field Attention Neural Network for Voice Activity Detection

论文作者

Zheng, Zhenpeng, Wang, Jianzong, Cheng, Ning, Luo, Jian, Xiao, Jing

论文摘要

语音活动检测（VAD）在语音和非语音之间有所区别，其性能对于基于语音的服务至关重要。最近，基于深的神经网络（DNN）的VAD比常规信号处理方法的性能更好。现有的基于DNNB的模型始终手工制作了一个固定的窗口，以利用上下文语音信息来提高VAD的性能。但是，上下文语音信息的固定窗口无法处理各种无法预言的噪声环境，并突出显示了对VAD任务的关键语音信息。为了解决这个问题，本文提出了一个自适应的多个接受性领域的注意神经网络，称为MLNET，以完成VAD任务。 MLNET利用多分支来提取多个上下文语音信息，并研究了有效的注意力阻滞，以加重最终分类的上下文中最关键的部分。在现实世界中的实验表明，所提出的基于MLNET的模型的表现优于其他基线。

Voice activity detection (VAD) makes a distinction between speech and non-speech and its performance is of crucial importance for speech based services. Recently, deep neural network (DNN)-based VADs have achieved better performance than conventional signal processing methods. The existed DNNbased models always handcrafted a fixed window to make use of the contextual speech information to improve the performance of VAD. However, the fixed window of contextual speech information can't handle various unpredicatable noise environments and highlight the critical speech information to VAD task. In order to solve this problem, this paper proposed an adaptive multiple receptive-field attention neural network, called MLNET, to finish VAD task. The MLNET leveraged multi-branches to extract multiple contextual speech information and investigated an effective attention block to weight the most crucial parts of the context for final classification. Experiments in real-world scenarios demonstrated that the proposed MLNET-based model outperformed other baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题