从言语中检测情绪原语及其在辨别分类情绪中的使用

论文标题

从言语中检测情绪原语及其在辨别分类情绪中的使用

Detecting Emotion Primitives from Speech and their use in discerning Categorical Emotions

论文作者

Kowtha, Vasudha, Mitra, Vikramjit, Bartels, Chris, Marchi, Erik, Booker, Sue, Caruso, William, Kajarekar, Sachin, Naik, Devang

论文摘要

情感在人类到人类的交流中起着至关重要的作用，使我们能够传达幸福，沮丧和诚意之类的感觉。尽管现代语音技术在很大程度上依赖语音识别和自然语言理解来理解语音内容，但声音表达的调查越来越受到关注。建立强大情绪模型的主要考虑因素包括表征和改善模型在训练数据分布的情况下能够推广到看不见的数据条件的程度。这项工作研究了一个长期记忆（LSTM）网络和时间卷积-LSTM（TC-LSTM），以检测语音的原始情感属性，例如价，唤醒和主导地位。据观察，使用多个数据集并使用可靠功能的训练提高了价为基线系统的一致性相关系数（CCC），而相关功能则提高了30 \％。此外，这项工作调查了如何使用情感原语来检测中性言语中的幸福，厌恶，蔑视，愤怒和惊喜等绝对情绪，结果表明，唤醒之后，随后是统治者，是对这种情感的更好探测者。

Emotion plays an essential role in human-to-human communication, enabling us to convey feelings such as happiness, frustration, and sincerity. While modern speech technologies rely heavily on speech recognition and natural language understanding for speech content understanding, the investigation of vocal expression is increasingly gaining attention. Key considerations for building robust emotion models include characterizing and improving the extent to which a model, given its training data distribution, is able to generalize to unseen data conditions. This work investigated a long-shot-term memory (LSTM) network and a time convolution - LSTM (TC-LSTM) to detect primitive emotion attributes such as valence, arousal, and dominance, from speech. It was observed that training with multiple datasets and using robust features improved the concordance correlation coefficient (CCC) for valence, by 30\% with respect to the baseline system. Additionally, this work investigated how emotion primitives can be used to detect categorical emotions such as happiness, disgust, contempt, anger, and surprise from neutral speech, and results indicated that arousal, followed by dominance was a better detector of such emotions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题