论文标题

多尺度扬声器诊断具有神经亲和力评分融合

Multi-Scale Speaker Diarization With Neural Affinity Score Fusion

论文作者

Park, Tae Jin, Kumar, Manoj, Narayanan, Shrikanth

论文摘要

确定人类对话中短段的说话者的身份被认为是语音信号处理中最具挑战性的问题之一。简短语音段的说话者表示往往是不可靠的,导致说话者代表的忠诚度在需要说话者识别的任务中。在本文中,我们提出了一种非常规方法,该方法可以解决时间分辨率与说话者表示质量之间的权衡。为了找到一组平衡来自多个段的时间尺度的分数的权重,提出了神经亲和力评分融合模型。使用Callhome数据集,我们表明我们提出的多规模分割和集成方法可以实现最新的诊断性能。

Identifying the identity of the speaker of short segments in human dialogue has been considered one of the most challenging problems in speech signal processing. Speaker representations of short speech segments tend to be unreliable, resulting in poor fidelity of speaker representations in tasks requiring speaker recognition. In this paper, we propose an unconventional method that tackles the trade-off between temporal resolution and the quality of the speaker representations. To find a set of weights that balance the scores from multiple temporal scales of segments, a neural affinity score fusion model is presented. Using the CALLHOME dataset, we show that our proposed multi-scale segmentation and integration approach can achieve a state-of-the-art diarization performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源