使用感知同化模型和最先进的声学模型来预测非本地语音感知

论文标题

使用感知同化模型和最先进的声学模型来预测非本地语音感知

Predicting non-native speech perception using the Perceptual Assimilation Model and state-of-the-art acoustic models

论文作者

Millet, Juliette, Chitoran, Ioana, Dunbar, Ewan

论文摘要

我们的母语会影响我们感知语音的声音的方式，从而影响我们区分非本地声音的能力。我们比较了关于母语对语音感知的影响的两个想法：感知同化模型，该模型吸引了将声音的心理分类为本地音素类别，而不是富有，精细的语音表达对母语统计的想法就足够了。我们使用来自两个最先进的语音模型的表示形式，一个Dirichlet过程高斯混合模型和最近的WAV2VEC 2.0模型来实现这一想法。我们提出了一个新的，讲法语和英语参与者的语音感知行为的新的，来自六种语言的元音声音。我们表明，对于整个歧视行为，音素同化是一个更好的预测指标，而不是细颗粒的语音建模，并且用于预测与本地语言背景差异相关的可区分性差异。我们还表明，WAV2VEC 2.0虽然不擅长捕获母语对语音感知的影响，但它是有关本地音素同化的信息的补充，并且提供了一个良好的低级语音表示模型，这支持了在语音感知期间使用分类和细粒度感知的观念。

Our native language influences the way we perceive speech sounds, affecting our ability to discriminate non-native sounds. We compare two ideas about the influence of the native language on speech perception: the Perceptual Assimilation Model, which appeals to a mental classification of sounds into native phoneme categories, versus the idea that rich, fine-grained phonetic representations tuned to the statistics of the native language, are sufficient. We operationalize this idea using representations from two state-of-the-art speech models, a Dirichlet process Gaussian mixture model and the more recent wav2vec 2.0 model. We present a new, open dataset of French- and English-speaking participants' speech perception behaviour for 61 vowel sounds from six languages. We show that phoneme assimilation is a better predictor than fine-grained phonetic modelling, both for the discrimination behaviour as a whole, and for predicting differences in discriminability associated with differences in native language background. We also show that wav2vec 2.0, while not good at capturing the effects of native language on speech perception, is complementary to information about native phoneme assimilation, and provides a good model of low-level phonetic representations, supporting the idea that both categorical and fine-grained perception are used during speech perception.

下载PDF全文

下载文献需遵守相关版权规定

论文标题