从野外视频中学习与视频无关的眼神接触细分

论文标题

从野外视频中学习与视频无关的眼神接触细分

Learning Video-independent Eye Contact Segmentation from In-the-Wild Videos

论文作者

Wu, Tianyi, Sugano, Yusuke

论文摘要

人眼神交流是一种非语言交流的一种形式，可以对社会行为产生重大影响。由于目光接触目标的位置和大小在不同的视频中各不相同，因此学习与视频无关的眼神接触探测器仍然是一项艰巨的任务。在这项工作中，我们解决了野外视频单向眼神交流检测的任务。我们的目标是建立一个统一的模型，该模型可以在任意输入视频中确定一个人何时查看自己的目光目标。考虑到这需要时间序列相对眼动信息，我们建议将任务作为时间分割。由于标记的培训数据的稀缺性，我们进一步提出了一种目光目标发现方法，以生成用于未标记视频的伪标记，这使我们能够使用野外视频以无处可测的方式训练通用的眼神接触分段模型。为了评估我们提出的方法，我们手动注释了一个由52个人类对话视频组成的测试数据集。实验结果表明，我们的眼接触分段模型优于先前的视频依赖性眼接触探测器，并且可以在注释的测试集上实现71.88％的框架精度。我们的代码和评估数据集可从https://github.com/ut-vision/video-intependent-ecs获得。

Human eye contact is a form of non-verbal communication and can have a great influence on social behavior. Since the location and size of the eye contact targets vary across different videos, learning a generic video-independent eye contact detector is still a challenging task. In this work, we address the task of one-way eye contact detection for videos in the wild. Our goal is to build a unified model that can identify when a person is looking at his gaze targets in an arbitrary input video. Considering that this requires time-series relative eye movement information, we propose to formulate the task as a temporal segmentation. Due to the scarcity of labeled training data, we further propose a gaze target discovery method to generate pseudo-labels for unlabeled videos, which allows us to train a generic eye contact segmentation model in an unsupervised way using in-the-wild videos. To evaluate our proposed approach, we manually annotated a test dataset consisting of 52 videos of human conversations. Experimental results show that our eye contact segmentation model outperforms the previous video-dependent eye contact detector and can achieve 71.88% framewise accuracy on our annotated test set. Our code and evaluation dataset are available at https://github.com/ut-vision/Video-Independent-ECS.

下载PDF全文

下载文献需遵守相关版权规定

论文标题