论文标题
在广播演讲中,用于流式扬声器变更检测的套圈感知培训
Collar-aware Training for Streaming Speaker Change Detection in Broadcast Speech
论文作者
论文摘要
在本文中,我们提出了一种用于说话者变更检测模型的新型培训方法。扬声器更改检测通常被视为二进制序列标记问题。这种方法的主要挑战是,由于大多数框架不包括说话者更改,因此说话者转弯和不平衡数据之间的沉默和不平衡数据之间的沉默引起的含糊变化点的模糊性。常规培训方法通过人为地增加培训数据中的正标的比例来解决这些方法。取而代之的是,所提出的方法使用一个目标函数,该目标函数鼓励模型预测指定项圈内的单个正标签。这是通过在所有可能的子序列上边缘化的,这些子序列在衣领中恰好具有一个正标签。对英语和爱沙尼亚数据集的实验表现出对传统培训方法的巨大改进。此外,模型输出的峰将集中于单个帧,从而消除了后处理的需求,以找到确切的预测变更点,这对于流媒体应用特别有用。
In this paper, we present a novel training method for speaker change detection models. Speaker change detection is often viewed as a binary sequence labelling problem. The main challenges with this approach are the vagueness of annotated change points caused by the silences between speaker turns and imbalanced data due to the majority of frames not including a speaker change. Conventional training methods tackle these by artificially increasing the proportion of positive labels in the training data. Instead, the proposed method uses an objective function which encourages the model to predict a single positive label within a specified collar. This is done by marginalizing over all possible subsequences that have exactly one positive label within the collar. Experiments on English and Estonian datasets show large improvements over the conventional training method. Additionally, the model outputs have peaks concentrated to a single frame, removing the need for post-processing to find the exact predicted change point which is particularly useful for streaming applications.