基于语音信号的联合性别和年龄估计使用X-矢量和转移学习

论文标题

基于语音信号的联合性别和年龄估计使用X-矢量和转移学习

Joint gender and age estimation based on speech signals using x-vectors and transfer learning

论文作者

Kwasny, Damian, Hemmerling, Daria

论文摘要

在本文中，我们将X-Vector框架扩展为说话者的年龄估计和性别分类的任务。特别是，我们用Quartznet替换了基线多层-TDNN体系结构，Quartznet是一种在语音识别领域中取得成功的卷积架构。我们进一步提出了一个两期转移学习方案，利用大规模的语音数据集：voxceleb和common语音，以及多任务学习的使用，以允许通过单个系统进行联合年龄估计和性别分类。我们在Timit数据集中训练并评估性能。提出的转移学习方案在年龄估计误差和性别分类的准确性和最佳性能系统方面取得了连续的绩效提高，并在5.12和5.29年的TIMIT测试数据集中获得了新的最新最先进的结果，并在7.24和8.12年中获得7.24和8.12年的MAE，男性和女性的演讲者分别为99。199。的级别。

In this paper we extend the x-vector framework for the task of speaker's age estimation and gender classification. In particular, we replace the baseline multilayer-TDNN architecture with QuartzNet, a convolutional architecture that has gained success in the field of speech recognition. We further propose a two-staged transfer learning scheme, utilizing large scale speech datasets: VoxCeleb and Common Voice, and usage of multitask learning to allow for joint age estimation and gender classification with a single system. We train and evaluate the performance on the TIMIT dataset. The proposed transfer learning scheme yields consecutive performance improvements in terms of both age estimation error and gender classification accuracy and the best performing system achieves new state-of-the-art results on the task of age estimation on the TIMIT TEST dataset with MAE of 5.12 and 5.29 years and RMSE of 7.24 and 8.12 years for male and female speakers respectively while maintaining a gender classification accuracy of 99.6%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题