深度学习模型的开源基准，用于视听明显和自我报告的性格识别

论文标题

深度学习模型的开源基准，用于视听明显和自我报告的性格识别

An Open-source Benchmark of Deep Learning Models for Audio-visual Apparent and Self-reported Personality Recognition

论文作者

Liao, Rongfan, Song, Siyang, Gunes, Hatice

论文摘要

人格决定了各种各样的人类日常行为和工作行为，对于理解人类内部和外部状态至关重要。近年来，已经开发了大量自动人格计算方法，以根据非语言音频行为来预测主题的明显人格或自我报告的人格。但是，大多数人都遭受了复杂和数据集的特定预处理步骤和模型培训技巧的困扰。在没有具有一致的实验设置的标准化基准的情况下，不仅可以公平地比较这些人格计算模型的真实表现，而且还使其难以复制。在本文中，我们介绍了第一个可重现的视听基准测试框架，以对八种现有的人格计算模型（例如，音频，视觉和视听）和七个标准的深度学习模型进行公平，一致的评估。在基于一组基准模型的基础上，我们还研究了两种先前使用的长期建模策略对总结短期/帧级预测对人格计算结果的影响。结果得出的结论是：（i）大多数基准深度学习模型从面部行为推断出的明显人格特征，比自我报告的人物更可靠性；（ii）视觉模型经常比在人格识别的音频模型中取得优越的性能；（iii）非语言行为在预测不同的人格特征方面有不同的贡献；（iv）我们再现的人格计算模型通常比其最初报告的结果更差。我们的基准标准可在\ url {https://github.com/liaorongfan/deeppersonality}上公开获得。

Personality determines a wide variety of human daily and working behaviours, and is crucial for understanding human internal and external states. In recent years, a large number of automatic personality computing approaches have been developed to predict either the apparent personality or self-reported personality of the subject based on non-verbal audio-visual behaviours. However, the majority of them suffer from complex and dataset-specific pre-processing steps and model training tricks. In the absence of a standardized benchmark with consistent experimental settings, it is not only impossible to fairly compare the real performances of these personality computing models but also makes them difficult to be reproduced. In this paper, we present the first reproducible audio-visual benchmarking framework to provide a fair and consistent evaluation of eight existing personality computing models (e.g., audio, visual and audio-visual) and seven standard deep learning models on both self-reported and apparent personality recognition tasks. Building upon a set of benchmarked models, we also investigate the impact of two previously-used long-term modelling strategies for summarising short-term/frame-level predictions on personality computing results. The results conclude: (i) apparent personality traits, inferred from facial behaviours by most benchmarked deep learning models, show more reliability than self-reported ones; (ii) visual models frequently achieved superior performances than audio models on personality recognition; (iii) non-verbal behaviours contribute differently in predicting different personality traits; and (iv) our reproduced personality computing models generally achieved worse performances than their original reported results. Our benchmark is publicly available at \url{https://github.com/liaorongfan/DeepPersonality}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题