使用自我监督的矢量定量神经网络进行无监督的电话和单词分割

论文标题

使用自我监督的矢量定量神经网络进行无监督的电话和单词分割

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

论文作者

Kamper, Herman, van Niekerk, Benjamin

论文摘要

我们在没有监督的情况下调查了将语音细分和聚类语音转化为类似 - 双纹酸盐的序列。我们特异性地限制了预处理的自我监督矢量定量（VQ）神经网络，以便将连续特征向量的块分配给相同的代码，从而将语音的可变速率分割为离散单元。考虑了两种分割方法。首先，贪婪地合并了功能，直到达到预定数量的段为止。第二种使用动态编程来优化罚款术语的平方错误，以鼓励更少但更长的段。我们表明，可以使用这些VQ分割方法，而无需更改各种任务：无监督电话分割，ABX电话歧视，相同的单词歧视以及对符号单词分割算法的输入。惩罚的动态编程方法通常会表现最好。虽然在某些情况下，在各个任务上的性能仅与最先进的效果相媲美，但在所有任务中，合理的竞争方法的表现要优于基本较低的比特率。

We investigate segmenting and clustering speech into low-bitrate phone-like sequences without supervision. We specifically constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units. Two segmentation methods are considered. In the first, features are greedily merged until a prespecified number of segments are reached. The second uses dynamic programming to optimize a squared error with a penalty term to encourage fewer but longer segments. We show that these VQ segmentation methods can be used without alteration across a wide range of tasks: unsupervised phone segmentation, ABX phone discrimination, same-different word discrimination, and as inputs to a symbolic word segmentation algorithm. The penalized dynamic programming method generally performs best. While performance on individual tasks is only comparable to the state-of-the-art in some cases, in all tasks a reasonable competing approach is outperformed at a substantially lower bitrate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题