肖像解释和基准

论文标题

肖像解释和基准

Portrait Interpretation and a Benchmark

论文作者

Fan, Yixuan, Dou, Zhaopeng, Li, Yali, Wang, Shengjin

论文摘要

我们提出了一个我们命名肖像解释的任务，并为其构建一个名为Portrait250k的数据集。当前关于人类属性识别和人重新认同等肖像的研究取得了许多成功，但通常，它们：1）可能缺乏各种任务与可能带来的可能收益之间的相互关系； 2）专门为每个任务设计的深层模型，这效率低下； 3）可能无法应付实际场景中统一模型的需求和全面的看法。在本文中，拟议的肖像解释从新的系统角度认识到人类的感知。我们将肖像的感知分为三个方面，即外观，姿势和情感，以及设计相应的子任务的三个方面。基于多任务学习的框架，肖像解释需要对静态属性和肖像的动态状态进行全面描述。为了激发有关这项新任务的研究，我们构建了一个新数据集，其中包含250,000张带有身份，性别，年龄，体质，身高，表达和姿势的图像。我们的数据集是从51部电影中收集的，因此涵盖了广泛的多样性。此外，我们专注于表示肖像解释的表示，并提出了反映我们系统观点的基准。我们还为此任务提出了适当的指标。我们的实验结果表明，结合与肖像解释有关的任务可以产生好处。代码和数据集将公开。

We propose a task we name Portrait Interpretation and construct a dataset named Portrait250K for it. Current researches on portraits such as human attribute recognition and person re-identification have achieved many successes, but generally, they: 1) may lack mining the interrelationship between various tasks and the possible benefits it may bring; 2) design deep models specifically for each task, which is inefficient; 3) may be unable to cope with the needs of a unified model and comprehensive perception in actual scenes. In this paper, the proposed portrait interpretation recognizes the perception of humans from a new systematic perspective. We divide the perception of portraits into three aspects, namely Appearance, Posture, and Emotion, and design corresponding sub-tasks for each aspect. Based on the framework of multi-task learning, portrait interpretation requires a comprehensive description of static attributes and dynamic states of portraits. To invigorate research on this new task, we construct a new dataset that contains 250,000 images labeled with identity, gender, age, physique, height, expression, and posture of the whole body and arms. Our dataset is collected from 51 movies, hence covering extensive diversity. Furthermore, we focus on representation learning for portrait interpretation and propose a baseline that reflects our systematic perspective. We also propose an appropriate metric for this task. Our experimental results demonstrate that combining the tasks related to portrait interpretation can yield benefits. Code and dataset will be made public.

下载PDF全文

下载文献需遵守相关版权规定

论文标题