情感词意识到多模式的多模式改进，用于具有ASR错误的多模式情感分析

论文标题

情感词意识到多模式的多模式改进，用于具有ASR错误的多模式情感分析

Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors

论文作者

Wu, Yang, Zhao, Yanyan, Yang, Hao, Chen, Song, Qin, Bing, Cao, Xiaohuan, Zhao, Wenting

论文摘要

多模式情感分析吸引了越来越多的关注，并且已经提出了许多模型。但是，最新模型的性能在现实世界中部署时会大大降低。我们发现，主要原因是现实世界应用程序只能通过自动语音识别（ASR）模型访问文本输出，因为模型容量的限制可能会出现错误。通过对ASR输出的进一步分析，我们发现在某些情况下，情感单词（文本模式中的关键情感元素）被识别为其他单词，这使文本的情感变化并伤害了多模式模型的表现。为了解决这个问题，我们提出了情感词“意识到多模式改进模型”（SWRM），该模型可以通过利用多模式情感线索来动态地完善错误的情感单词。具体来说，我们首先使用情感单词位置检测模块来获取文本中情感词的最可能位置，然后利用多模式情感词修复模块动态完善情感单词嵌入。精制嵌入被视为多模式融合模块的文本输入，以预测情感标签。我们对包括Mosi-Speechbrain，Mosi-IBM和Mosi-Iflytek在内的现实世界数据集进行了广泛的实验，结果证明了我们模型的有效性，该模型超过了三个数据集上的当前最新模型。此外，我们的方法可以轻松地适用于其他多模式融合模型。数据和代码可在https://github.com/albertwy/swrm上找到。

Multimodal sentiment analysis has attracted increasing attention and lots of models have been proposed. However, the performance of the state-of-the-art models decreases sharply when they are deployed in the real world. We find that the main reason is that real-world applications can only access the text outputs by the automatic speech recognition (ASR) models, which may be with errors because of the limitation of model capacity. Through further analysis of the ASR outputs, we find that in some cases the sentiment words, the key sentiment elements in the textual modality, are recognized as other words, which makes the sentiment of the text change and hurts the performance of multimodal sentiment models directly. To address this problem, we propose the sentiment word aware multimodal refinement model (SWRM), which can dynamically refine the erroneous sentiment words by leveraging multimodal sentiment clues. Specifically, we first use the sentiment word position detection module to obtain the most possible position of the sentiment word in the text and then utilize the multimodal sentiment word refinement module to dynamically refine the sentiment word embeddings. The refined embeddings are taken as the textual inputs of the multimodal feature fusion module to predict the sentiment labels. We conduct extensive experiments on the real-world datasets including MOSI-Speechbrain, MOSI-IBM, and MOSI-iFlytek and the results demonstrate the effectiveness of our model, which surpasses the current state-of-the-art models on three datasets. Furthermore, our approach can be adapted for other multimodal feature fusion models easily. Data and code are available at https://github.com/albertwy/SWRM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题