非线性ISA具有用于学习语音表示的辅助变量

论文标题

非线性ISA具有用于学习语音表示的辅助变量

Nonlinear ISA with Auxiliary Variables for Learning Speech Representations

论文作者

Setlur, Amrith, Poczos, Barnabas, Black, Alan W

论文摘要

本文通过在存在辅助变量的情况下引入非线性独立子空间分析（ISA）的理论框架，扩展了非线性独立组件分析（ICA）的最新工作。观察到的高维声特征（如徽标MEL谱图）可以被视为在基于能量的模型的假设下，在单个多元源的信息来源（如说话者特征，语音内容等）上，非线性转化的表面水平表现形式，我们使用非线性ISA理论来提出一种算法，该算法具有无用的语音代表，而这些算法是在高度构建的，而这些算法是独立的，而这些算法是独立的。我们展示了如何将具有辅助变量的非线性ICA扩展到子空间的通用识别模型，同时还为这些高维子空间的可识别性提供了足够的条件。我们提出的方法是通用的，可以与标准的无监督方法集成在一起，以学习语音表示与可以从理论上捕获独立的高级语音信号的子空间学习语音表示。当与自回归预测解码（APC）模型集成时，我们通过在说话者验证和音素识别任务上显示经验结果来评估算法的收益。

This paper extends recent work on nonlinear Independent Component Analysis (ICA) by introducing a theoretical framework for nonlinear Independent Subspace Analysis (ISA) in the presence of auxiliary variables. Observed high dimensional acoustic features like log Mel spectrograms can be considered as surface level manifestations of nonlinear transformations over individual multivariate sources of information like speaker characteristics, phonological content etc. Under assumptions of energy based models we use the theory of nonlinear ISA to propose an algorithm that learns unsupervised speech representations whose subspaces are independent and potentially highly correlated with the original non-stationary multivariate sources. We show how nonlinear ICA with auxiliary variables can be extended to a generic identifiable model for subspaces as well while also providing sufficient conditions for the identifiability of these high dimensional subspaces. Our proposed methodology is generic and can be integrated with standard unsupervised approaches to learn speech representations with subspaces that can theoretically capture independent higher order speech signals. We evaluate the gains of our algorithm when integrated with the Autoregressive Predictive Decoding (APC) model by showing empirical results on the speaker verification and phoneme recognition tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题