从音乐视频中预测情绪：探索视觉和听觉信息对情感响应的相对贡献

论文标题

从音乐视频中预测情绪：探索视觉和听觉信息对情感响应的相对贡献

Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses

论文作者

Chua, Phoebe, Makris, Dimos, Herremans, Dorien, Roig, Gemma, Agres, Kat

论文摘要

尽管媒体内容越来越多地生产，分发和消耗多种方式，但单个模式如何促进媒体项目的感知情绪仍然很差。在本文中，我们介绍了MusicVideos（MUVI），这是一种用于情感多媒体内容分析的新型数据集，以研究听觉和视觉方式如何促进媒体的感知情绪。通过在三个条件下向参与者展示音乐视频来收集数据：音乐，视觉和视听。随着时间的流逝，参与者注释了Valence和唤醒的音乐视频，以及传达的整体情感。我们为数据集中的关键度量介绍了详细的描述性统计信息以及每个条件的特征重要性分析结果。最后，我们提出了一种新颖的传输学习体系结构，以训练以孤立的模态等级（PAIR）增强的预测模型，并证明了孤立的模态等级的潜力，以增强多模式情绪识别。我们的结果表明，对唤醒的看法主要受听觉信息的影响，而对价的看法则更为主观，并且可能受视觉和听觉信息的影响。该数据集可公开可用。

Although media content is increasingly produced, distributed, and consumed in multiple combinations of modalities, how individual modalities contribute to the perceived emotion of a media item remains poorly understood. In this paper we present MusicVideos (MuVi), a novel dataset for affective multimedia content analysis to study how the auditory and visual modalities contribute to the perceived emotion of media. The data were collected by presenting music videos to participants in three conditions: music, visual, and audiovisual. Participants annotated the music videos for valence and arousal over time, as well as the overall emotion conveyed. We present detailed descriptive statistics for key measures in the dataset and the results of feature importance analyses for each condition. Finally, we propose a novel transfer learning architecture to train Predictive models Augmented with Isolated modality Ratings (PAIR) and demonstrate the potential of isolated modality ratings for enhancing multimodal emotion recognition. Our results suggest that perceptions of arousal are influenced primarily by auditory information, while perceptions of valence are more subjective and can be influenced by both visual and auditory information. The dataset is made publicly available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题