论文标题
培训演讲者识别系统有限的数据
Training speaker recognition systems with limited data
论文作者
论文摘要
这项工作考虑了与当代工作相比,数据集大小的培训神经网络,以供扬声器识别。我们通过提出流行的Voxceleb2数据集的三个子集来人为地限制数据量。这些子集仅限于50 \,k音频文件(与1 \,可用的m文件相比),并且在说话者数量和会话变异性的轴上有所不同。我们在这些子集上训练三个说话者识别系统。 X-Vector,Ecapa-TDNN和WAV2VEC2网络体系结构。我们表明,当训练数据受到限制时,WAV2VEC2的自我监管,预先训练的权重可以显着提高性能。代码和数据子集可从https://github.com/nikvaessen/w2v2-speaker-few-smples获得。
This work considers training neural networks for speaker recognition with a much smaller dataset size compared to contemporary work. We artificially restrict the amount of data by proposing three subsets of the popular VoxCeleb2 dataset. These subsets are restricted to 50\,k audio files (versus over 1\,M files available), and vary on the axis of number of speakers and session variability. We train three speaker recognition systems on these subsets; the X-vector, ECAPA-TDNN, and wav2vec2 network architectures. We show that the self-supervised, pre-trained weights of wav2vec2 substantially improve performance when training data is limited. Code and data subsets are available at https://github.com/nikvaessen/w2v2-speaker-few-samples.