具有跨模式判别网络的视听演讲者识别

论文标题

具有跨模式判别网络的视听演讲者识别

Audio-visual Speaker Recognition with a Cross-modal Discriminative Network

论文作者

Tao, Ruijie, Das, Rohan Kumar, Li, Haizhou

论文摘要

视听演讲者的认可是最近2019年NIST发言人识别评估（SRE）的任务之一。神经科学和计算机科学的研究都表明，视力和听觉神经信号在认知过程中相互作用。这促使我们研究了一个跨模式网络，即语音面歧视网络（VFNET），该网络建立了人类声音与面部之间的一般关系。实验表明，VFNET提供了其他扬声器歧视性信息。使用VFNET，我们在2019 NIST SRE的评估集上，与得分水平融合视频基线相对降低16.54％。

Audio-visual speaker recognition is one of the tasks in the recent 2019 NIST speaker recognition evaluation (SRE). Studies in neuroscience and computer science all point to the fact that vision and auditory neural signals interact in the cognitive process. This motivated us to study a cross-modal network, namely voice-face discriminative network (VFNet) that establishes the general relation between human voice and face. Experiments show that VFNet provides additional speaker discriminative information. With VFNet, we achieve 16.54% equal error rate relative reduction over the score level fusion audio-visual baseline on evaluation set of 2019 NIST SRE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题