使用扩张的卷积网络将语音的基本频率与脑电图联系起来

论文标题

使用扩张的卷积网络将语音的基本频率与脑电图联系起来

Relating the fundamental frequency of speech with EEG using a dilated convolutional network

论文作者

Puffay, Corentin, Van Canneyt, Jana, Vanthornhout, Jonas, Van Hamme, Hugo, Francart, Tom

论文摘要

为了研究大脑中语音的处理方式，我们可以对自然语音信号的特征与相应记录的脑电图（EEG）进行建模。通常，线性模型用于回归任务。脑电图是预测的，或者是重建语音，并且预测信号与实际信号之间的相关性用于测量大脑的解码能力。但是，鉴于大脑的非线性性质，线性模型的建模能力是有限的。最近的研究介绍了非线性模型，将语音信封与脑电图联系起来。我们着手包括在信封中未编码的其他语音功能，尤其是语音的基本频率（F0）。 F0是主要在脑干到中脑水平的高频功能。我们提出了一个扩张的跨跨倾斜模型，以提供F0神经跟踪的证据。我们表明，F0和语音信封的组合可以提高基于信封的最先进模型的性能。这表明扩张的跨跨度模型可以从F0和Invelope中提取非冗余信息。我们还展示了扩张的跨跨跨跨跨跨识别模型在训练过程中未包括的受试者的能力。后一个发现将加速基于F0的听力诊断。

To investigate how speech is processed in the brain, we can model the relation between features of a natural speech signal and the corresponding recorded electroencephalogram (EEG). Usually, linear models are used in regression tasks. Either EEG is predicted, or speech is reconstructed, and the correlation between predicted and actual signal is used to measure the brain's decoding ability. However, given the nonlinear nature of the brain, the modeling ability of linear models is limited. Recent studies introduced nonlinear models to relate the speech envelope to EEG. We set out to include other features of speech that are not coded in the envelope, notably the fundamental frequency of the voice (f0). F0 is a higher-frequency feature primarily coded at the brainstem to midbrain level. We present a dilated-convolutional model to provide evidence of neural tracking of the f0. We show that a combination of f0 and the speech envelope improves the performance of a state-of-the-art envelope-based model. This suggests the dilated-convolutional model can extract non-redundant information from both f0 and the envelope. We also show the ability of the dilated-convolutional model to generalize to subjects not included during training. This latter finding will accelerate f0-based hearing diagnosis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题