论文标题

EMGSE:多模式语音增强的声学/EMG融合

EMGSE: Acoustic/EMG Fusion for Multimodal Speech Enhancement

论文作者

Wang, Kuan-Chen, Liu, Kai-Chun, Wang, Hsin-Min, Tsao, Yu

论文摘要

事实证明,多模式学习是提高语音增强(SE)表现的有效方法,尤其是在诸如低信噪比,语音噪声或看不见的噪声类型之类的挑战性情况下。在先前的研究中,已经使用了几种类型的辅助数据来构建多模式SE系统,例如唇部图像,电绘画或电磁中磁性关节摄影。在本文中,我们提出了一个新型的EMGSE多模式SE框架,该框架整合了音频和面部肌电图(EMG)信号。面部EMG是一种包含关节运动信息的生物信号,可以以非侵入性方式进行测量。实验结果表明,所提出的EMGSE系统可以比仅音频SE系统获得更好的性能。在具有挑战性的情况下,将EMG信号与声学信号融合的好处是值得注意的。此外,这项研究表明,脸颊EMG足以进行SE。

Multimodal learning has been proven to be an effective method to improve speech enhancement (SE) performance, especially in challenging situations such as low signal-to-noise ratios, speech noise, or unseen noise types. In previous studies, several types of auxiliary data have been used to construct multimodal SE systems, such as lip images, electropalatography, or electromagnetic midsagittal articulography. In this paper, we propose a novel EMGSE framework for multimodal SE, which integrates audio and facial electromyography (EMG) signals. Facial EMG is a biological signal containing articulatory movement information, which can be measured in a non-invasive way. Experimental results show that the proposed EMGSE system can achieve better performance than the audio-only SE system. The benefits of fusing EMG signals with acoustic signals for SE are notable under challenging circumstances. Furthermore, this study reveals that cheek EMG is sufficient for SE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源