$ m^3 $ t：野外的多模式连续价估计

论文标题

$ m^3 $ t：野外的多模式连续价估计

$M^3$T: Multi-Modal Continuous Valence-Arousal Estimation in the Wild

论文作者

Zhang, Yuan-Hang, Huang, Rulin, Zeng, Jiabei, Shan, Shiguang, Chen, Xilin

论文摘要

本报告描述了我们提交给我们提交的多模式多任务（$ m^3 $ t）方法，该方法是我们提交的，即情感行为分析（野生婚姻（ABAW）挑战）与IEEE国际会议与自动面孔和良好概念的2020年录制功能的IEEE International Chindution结合进行的婚姻挑战（ABAW）挑战的曲目。跟踪估计价和唤醒的轨道。时空视觉特征是用3D卷积网络和双向复发神经网络提取的。考虑到价 /唤醒，情绪和面部动作之间的相关性，我们还探索了从其他任务中受益的机制。我们在ABAW提供的验证集上评估了$ M^3 $ t框架，并且它的表现大大优于基线方法。

This report describes a multi-modal multi-task ($M^3$T) approach underlying our submission to the valence-arousal estimation track of the Affective Behavior Analysis in-the-wild (ABAW) Challenge, held in conjunction with the IEEE International Conference on Automatic Face and Gesture Recognition (FG) 2020. In the proposed $M^3$T framework, we fuse both visual features from videos and acoustic features from the audio tracks to estimate the valence and arousal. The spatio-temporal visual features are extracted with a 3D convolutional network and a bidirectional recurrent neural network. Considering the correlations between valence / arousal, emotions, and facial actions, we also explores mechanisms to benefit from other tasks. We evaluated the $M^3$T framework on the validation set provided by ABAW and it significantly outperforms the baseline method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题