论文标题
选择器 - 增强剂:学习语音增强本地和非本地关注操作的动态选择
Selector-Enhancer: Learning Dynamic Selection of Local and Non-local Attention Operation for Speech Enhancement
论文作者
论文摘要
注意机制(例如本地和非本地关注)在最近基于深度学习的语音增强(SE)系统中起着基本作用。然而,自然语音包含许多快速变化且相对简短的声学事件,因此,通过使用本地和非本地关注来捕获最有用的语音特征。我们观察到,噪声类型和语音特征在一系列语音中有所不同,本地和非本地操作可以分别从损坏的语音中提取不同的特征。为了利用这一点,我们提出了基于双重注意的卷积神经网络(CNN)的选择器 - 增强器,其功能过滤器可以动态从低分辨率语音特征中动态选择区域,并将其馈送到本地或非本地注意力操作中。特别是,提出的功能过滤器是通过使用增强学习(RL)和开发的难度调节奖励培训的,该奖励与网络性能,模型复杂性和“ SE任务的难度”有关。结果表明,我们的方法可以达到与现有方法相当或卓越的性能。特别是,选择器增强剂可能对现实世界降级有效,在现实世界中,噪声的数量和类型在单个嘈杂的混合物上有所不同。
Attention mechanisms, such as local and non-local attention, play a fundamental role in recent deep learning based speech enhancement (SE) systems. However, natural speech contains many fast-changing and relatively brief acoustic events, therefore, capturing the most informative speech features by indiscriminately using local and non-local attention is challenged. We observe that the noise type and speech feature vary within a sequence of speech and the local and non-local operations can respectively extract different features from corrupted speech. To leverage this, we propose Selector-Enhancer, a dual-attention based convolution neural network (CNN) with a feature-filter that can dynamically select regions from low-resolution speech features and feed them to local or non-local attention operations. In particular, the proposed feature-filter is trained by using reinforcement learning (RL) with a developed difficulty-regulated reward that is related to network performance, model complexity, and "the difficulty of the SE task". The results show that our method achieves comparable or superior performance to existing approaches. In particular, Selector-Enhancer is potentially effective for real-world denoising, where the number and types of noise are varies on a single noisy mixture.