一项关于语音情感识别的跨科语研究

论文标题

一项关于语音情感识别的跨科语研究

A cross-corpus study on speech emotion recognition

论文作者

Milner, Rosanna, Jalal, Md Asif, Ng, Raymond W. M., Hain, Thomas

论文摘要

对于语音情绪数据集，与日常生活中显示出较低的表现力情绪相比，很难获得大量可靠的数据，而表现出的情绪可能超过了最高。最近，已经创建了具有自然情绪的较大数据集。这项研究并没有忽略较小的，行为的数据集，而是研究了从行为情绪中学到的信息是否可用于检测自然情绪。跨科普斯研究主要考虑了跨语言甚至跨年龄数据集，并且源于注释情绪导致性能下降的不同方法。为了保持一致，考虑了四个涵盖行为的成年英语数据集，引起了自然情绪。提出了最先进的模型，以准确研究性能的降解。该系统涉及一个双向LSTM，具有注意机制，可以对数据集进行分类。实验研究了跨科普斯和多域的训练模型的影响，结果表明信息的传递不成功。室外模型，其次是适应丢失的数据集，而域对抗训练（DAT）被证明更适合于跨数据集的情绪概括。这显示了从ACT的数据集转移到具有更自然情绪和不同语料库培训的好处的积极信息。

For speech emotion datasets, it has been difficult to acquire large quantities of reliable data and acted emotions may be over the top compared to less expressive emotions displayed in everyday life. Lately, larger datasets with natural emotions have been created. Instead of ignoring smaller, acted datasets, this study investigates whether information learnt from acted emotions is useful for detecting natural emotions. Cross-corpus research has mostly considered cross-lingual and even cross-age datasets, and difficulties arise from different methods of annotating emotions causing a drop in performance. To be consistent, four adult English datasets covering acted, elicited and natural emotions are considered. A state-of-the-art model is proposed to accurately investigate the degradation of performance. The system involves a bi-directional LSTM with an attention mechanism to classify emotions across datasets. Experiments study the effects of training models in a cross-corpus and multi-domain fashion and results show the transfer of information is not successful. Out-of-domain models, followed by adapting to the missing dataset, and domain adversarial training (DAT) are shown to be more suitable to generalising to emotions across datasets. This shows positive information transfer from acted datasets to those with more natural emotions and the benefits from training on different corpora.

下载PDF全文

下载文献需遵守相关版权规定

论文标题