论文标题
知识传播者:具有极限标签的学习面部动作单元动力学
Knowledge-Spreader: Learning Facial Action Unit Dynamics with Extremely Limited Labels
论文作者
论文摘要
关于面部动作单元(AU)自动检测的最新研究已广泛依赖大型注释。但是,手动标签很困难,耗时且昂贵。大多数现有的半监督作品都忽略了时间领域的信息线索,并且高度依赖于密集注释的视频,从而使学习过程的效率降低。为了减轻这些问题,我们提出了一个深度半监督的框架知识传播者(KS),这在两个方面与常规方法不同。首先,KS不仅将人类知识编码为约束,还学习了时空的AU相关性知识,以增强其分布外的概括能力。其次,我们通过在多个学生网络中使用一致性正则化和伪标记来处理KS。它将空间知识从标记的帧传播到未标记的数据,并完成了部分标记的视频剪辑的时间信息。因此,该设计允许KS从只有一个标签分配的视频剪辑中学习AU Dynamics,从而大大降低了使用注释的要求。广泛的实验表明,在仅在BP4D上仅使用2%标签和DISFA上的5%标签的情况下,与艺术品相比,所提出的KS可以实现竞争性能。此外,我们在新开发的大规模综合情感数据库上对其进行了测试,该数据库包含相当大的样本,跨同步和对齐的传感器方式,以减轻人类情感计算中的注释和身份的稀缺问题。新数据库将发布给研究社区。
Recent studies on the automatic detection of facial action unit (AU) have extensively relied on large-sized annotations. However, manually AU labeling is difficult, time-consuming, and costly. Most existing semi-supervised works ignore the informative cues from the temporal domain, and are highly dependent on densely annotated videos, making the learning process less efficient. To alleviate these problems, we propose a deep semi-supervised framework Knowledge-Spreader (KS), which differs from conventional methods in two aspects. First, rather than only encoding human knowledge as constraints, KS also learns the Spatial-Temporal AU correlation knowledge in order to strengthen its out-of-distribution generalization ability. Second, we approach KS by applying consistency regularization and pseudo-labeling in multiple student networks alternately and dynamically. It spreads the spatial knowledge from labeled frames to unlabeled data, and completes the temporal information of partially labeled video clips. Thus, the design allows KS to learn AU dynamics from video clips with only one label allocated, which significantly reduce the requirements of using annotations. Extensive experiments demonstrate that the proposed KS achieves competitive performance as compared to the state of the arts under the circumstances of using only 2% labels on BP4D and 5% labels on DISFA. In addition, we test it on our newly developed large-scale comprehensive emotion database, which contains considerable samples across well-synchronized and aligned sensor modalities for easing the scarcity issue of annotations and identities in human affective computing. The new database will be released to the research community.