论文标题

深度神经备质基质分解用于发音表示分解

Deep Neural Convolutive Matrix Factorization for Articulatory Representation Decomposition

论文作者

Lian, Jiachen, Black, Alan W, Goldstein, Louis, Anumanchipalli, Gopala Krishna

论文摘要

关于数据驱动的语音表示学习的大多数研究都以端到端的方式集中在原始音频上,很少关注其内部语音或手势结构。这项工作调查了源自信号运动学信号的语音表示,使用了辛苦的稀疏矩阵分解的神经实施,将关节数据分解为可解释的手势和手势分数。通过应用稀疏约束,手势分数利用语音手势的离散组合特性。还进行了音素识别实验,以表明手势分数确实成功地代码语音信息。因此,拟议的工作使发音语音学和深层神经网络之间的桥梁利用了信息丰富,可理解,可解释和高效的语音表征。

Most of the research on data-driven speech representation learning has focused on raw audios in an end-to-end manner, paying little attention to their internal phonological or gestural structure. This work, investigating the speech representations derived from articulatory kinematics signals, uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores. By applying sparse constraints, the gestural scores leverage the discrete combinatorial properties of phonological gestures. Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully. The proposed work thus makes a bridge between articulatory phonology and deep neural networks to leverage informative, intelligible, interpretable,and efficient speech representations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源