论文标题
基于离散绑定的变分自动编码器的强大扬声器聚类方法
A Robust Speaker Clustering Method Based on Discrete Tied Variational Autoencoder
论文作者
论文摘要
最近,基于聚合层次结构群集(AHC)的说话者聚类模型是解决两个主要问题的常见方法:无预设类别编号聚类和修复类别编号聚类。通常,模型将I-VECTOR之类的功能作为概率和线性判别分析模型(PLDA)的输入,旨在在长语音应用程序方面形成距离矩阵,然后通过聚类模型获得聚类结果。但是,基于AHC的传统扬声器聚类方法具有长期运行的缺点,并且对环境噪声保持敏感。在本文中,我们提出了一种基于共同信息(MI)和具有离散变量的非线性模型的新型扬声器聚类方法,该模型在绑定的变异自动编码器(TVAE)的启发下,以增强噪声的鲁棒性。提出的名为离散绑定的变异自动编码器(DTVAE)的方法大大缩短了经过的时间。通过经验结果,它表现优于一般模型,并产生相对准确性(ACC)的改善和大幅度缩短的时间。
Recently, the speaker clustering model based on aggregation hierarchy cluster (AHC) is a common method to solve two main problems: no preset category number clustering and fix category number clustering. In general, model takes features like i-vectors as input of probability and linear discriminant analysis model (PLDA) aims to form the distance matric in long voice application scenario, and then clustering results are obtained through the clustering model. However, traditional speaker clustering method based on AHC has the shortcomings of long-time running and remains sensitive to environment noise. In this paper, we propose a novel speaker clustering method based on Mutual Information (MI) and a non-linear model with discrete variable, which under the enlightenment of Tied Variational Autoencoder (TVAE), to enhance the robustness against noise. The proposed method named Discrete Tied Variational Autoencoder (DTVAE) which shortens the elapsed time substantially. With experience results, it outperforms the general model and yields a relative Accuracy (ACC) improvement and significant time reduction.