论文标题
完全无监督的训练几个关键字发现
Fully Unsupervised Training of Few-shot Keyword Spotting
论文作者
论文摘要
为了训练几个弹出的关键字发现(FS-KWS)模型,已知包含大量目标关键字的大型标签数据集对于概括为任意目标关键字至关重要,只有几个注册样本。为了减轻标签收集昂贵的数据收集,在本文中,我们提出了一种仅在合成数据上训练的新型FS-KWS系统。所提出的系统基于指标学习,可以使用距离指标检测目标关键字。利用语音合成模型,该模型使用伪音素而不是文本生成语音,我们很容易地获得了具有相同语义的大量多视图样本。这些样本足以训练,考虑度量学习并不需要标记的数据。我们框架中的所有组件都不需要任何监督,因此我们的方法无监督。实际数据集的实验结果表明,即使没有任何标记和真实数据集,我们提出的方法也具有竞争力。
For training a few-shot keyword spotting (FS-KWS) model, a large labeled dataset containing massive target keywords has known to be essential to generalize to arbitrary target keywords with only a few enrollment samples. To alleviate the expensive data collection with labeling, in this paper, we propose a novel FS-KWS system trained only on synthetic data. The proposed system is based on metric learning enabling target keywords to be detected using distance metrics. Exploiting the speech synthesis model that generates speech with pseudo phonemes instead of texts, we easily obtain a large collection of multi-view samples with the same semantics. These samples are sufficient for training, considering metric learning does not intrinsically necessitate labeled data. All of the components in our framework do not require any supervision, making our method unsupervised. Experimental results on real datasets show our proposed method is competitive even without any labeled and real datasets.