演讲者重叠感知的神经诊断用于多方会议分析

论文标题

演讲者重叠感知的神经诊断用于多方会议分析

Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

论文作者

Du, Zhihao, Zhang, Shiliang, Zheng, Siqi, Yan, Zhijie

论文摘要

最近，在多方会议分析中成功地应用了聚类和神经腹泻模型的混合系统。但是，当前的模型总是将重叠的说话者诊断视为一个多标签分类问题，在这种问题中，说话者的依赖性和重叠尚未得到很好的考虑。为了克服这些缺点，我们通过拟议的功率集编码（PSE）将重叠的说话者诊断任务重新制定为单标签预测问题。通过此公式，可以明确建模说话者的依赖性和重叠。为了充分利用这种表述，我们进一步建议说话者重叠感知神经诊断（SOND）模型，该模型由无上下文独立的（CI）得分手组成，以模拟全球扬声器判别性（CD），这是一种依赖上下文依赖性评分师（CD），以模拟本地歧视性，并结合扬声器网络（SCN）以结合网络（SCN）来结合扬声器和重新启动活动。实验结果表明，使用建议的公式可以胜过基于目标扬声器语音活动检测的最先进方法，并且可以通过SOND进一步提高性能，从而导致6.30％的相对诊断误差降低。

Recently, hybrid systems of clustering and neural diarization models have been successfully applied in multi-party meeting analysis. However, current models always treat overlapped speaker diarization as a multi-label classification problem, where speaker dependency and overlaps are not well considered. To overcome the disadvantages, we reformulate overlapped speaker diarization task as a single-label prediction problem via the proposed power set encoding (PSE). Through this formulation, speaker dependency and overlaps can be explicitly modeled. To fully leverage this formulation, we further propose the speaker overlap-aware neural diarization (SOND) model, which consists of a context-independent (CI) scorer to model global speaker discriminability, a context-dependent scorer (CD) to model local discriminability, and a speaker combining network (SCN) to combine and reassign speaker activities. Experimental results show that using the proposed formulation can outperform the state-of-the-art methods based on target speaker voice activity detection, and the performance can be further improved with SOND, resulting in a 6.30% relative diarization error reduction.

下载PDF全文

下载文献需遵守相关版权规定

论文标题