论文标题
使用自我监督的矢量定量神经网络进行无监督的电话和单词分割
Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks
论文作者
论文摘要
我们在没有监督的情况下调查了将语音细分和聚类语音转化为类似 - 双纹酸盐的序列。我们特异性地限制了预处理的自我监督矢量定量(VQ)神经网络,以便将连续特征向量的块分配给相同的代码,从而将语音的可变速率分割为离散单元。考虑了两种分割方法。首先,贪婪地合并了功能,直到达到预定数量的段为止。第二种使用动态编程来优化罚款术语的平方错误,以鼓励更少但更长的段。我们表明,可以使用这些VQ分割方法,而无需更改各种任务:无监督电话分割,ABX电话歧视,相同的单词歧视以及对符号单词分割算法的输入。惩罚的动态编程方法通常会表现最好。虽然在某些情况下,在各个任务上的性能仅与最先进的效果相媲美,但在所有任务中,合理的竞争方法的表现要优于基本较低的比特率。
We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units. Two segmentation methods are considered. In the first, features are greedily merged until a prespecified number of segments are reached. The second uses dynamic programming to optimize a squared error with a penalty term to encourage fewer but longer segments. We show that these VQ segmentation methods can be used without alteration across a wide range of tasks: unsupervised phone segmentation, ABX phone discrimination, same-different word discrimination, and as inputs to a symbolic word segmentation algorithm. The penalized dynamic programming method generally performs best. While performance on individual tasks is only comparable to the state-of-the-art in some cases, in all tasks a reasonable competing approach is outperformed at a substantially lower bitrate.