论文标题
通过视觉和语言知识蒸馏的端到端零射HOI检测
End-to-End Zero-Shot HOI Detection via Vision and Language Knowledge Distillation
论文作者
论文摘要
大多数现有的人类对象相互作用〜(HOI)检测方法在很大程度上依赖于具有预定义HOI类别的完整注释,而HOI类别的多样性限制且昂贵,无法进一步扩展。我们旨在推进零拍的HOI检测,以同时检测到观察和看不见的HOI。基本挑战是发现潜在的人类对象对并确定新型HOI类别。为了克服上述挑战,我们提出了一种通过视觉知识蒸馏的新型端到端零击HOI检测(EOID)框架。我们首先设计一个交互式得分模块,结合了两阶段的两部分匹配算法,以以动作 - 不合命替的方式实现人类对象对的相互作用区分。然后,我们将动作概率的分布从验证的视觉语言教师以及可见的地面真理转移到HOI模型中,以获得零照片的HOI分类。在HICO-DET数据集上进行的广泛实验表明,我们的模型发现了潜在的交互式对,并可以识别看不见的HOI。最后,我们的方法在看不见的地图上优于先前的SOTA,在UA设置下的总体地图上的表现为10.18%,在Uney Map上的MAP下的MAP上的MAP上的MAP优于6.02%,在UC设置下的整体地图上的整体地图为9.1%。此外,我们的方法可以推广到大规模对象检测数据,以进一步扩展动作集。源代码将提供:https://github.com/mrwu-mac/eoid。
Most existing Human-Object Interaction~(HOI) Detection methods rely heavily on full annotations with predefined HOI categories, which is limited in diversity and costly to scale further. We aim at advancing zero-shot HOI detection to detect both seen and unseen HOIs simultaneously. The fundamental challenges are to discover potential human-object pairs and identify novel HOI categories. To overcome the above challenges, we propose a novel end-to-end zero-shot HOI Detection (EoID) framework via vision-language knowledge distillation. We first design an Interactive Score module combined with a Two-stage Bipartite Matching algorithm to achieve interaction distinguishment for human-object pairs in an action-agnostic manner. Then we transfer the distribution of action probability from the pretrained vision-language teacher as well as the seen ground truth to the HOI model to attain zero-shot HOI classification. Extensive experiments on HICO-Det dataset demonstrate that our model discovers potential interactive pairs and enables the recognition of unseen HOIs. Finally, our method outperforms the previous SOTA by 8.92% on unseen mAP and 10.18% on overall mAP under UA setting, by 6.02% on unseen mAP and 9.1% on overall mAP under UC setting. Moreover, our method is generalizable to large-scale object detection data to further scale up the action sets. The source code will be available at: https://github.com/mrwu-mac/EoID.