论文标题
PROCC:开放世界构图的渐进式交叉兼容兼容性
ProCC: Progressive Cross-primitive Compatibility for Open-World Compositional Zero-Shot Learning
论文作者
论文摘要
开放世界的组成零拍学习(OW-CZSL)旨在识别图像中的状态和物体原始物的新颖组成部分,而在组成空间上没有先验,这会诱导一个含有所有可能状态对象组成的大型输出空间。现有的作品要么学习联合成分状态对象嵌入,要么通过单独的分类器来预测简单的原语。但是,前者在很大程度上依赖于外部单词嵌入方法,后者分别忽略了相互依存的原始物的相互作用。在本文中,我们重新审视原始预测方法,并提出了一种新的方法,称为渐进式跨关节兼容性(PROCC),以模仿OW-CZSL任务的人类学习过程。具体而言,交叉兼容的兼容性模块明确地学习了对状态和对象特征与可训练的内存单元的相互作用进行建模,该单元在没有外部知识的情况下,有效地获得了跨关注的视觉关注,以对理性高可可构成。此外,考虑到部分统治设置(PCZSL)以及多个任务预测的不平衡问题,我们设计了一个渐进式培训范式,以使原始分类器能够以易于匹配的方式进行互动以获取歧视性信息。在三个广泛使用的基准数据集上进行的广泛实验表明,我们的方法通过大幅度优于OW-CZSL和PCZSL设置上的其他代表性方法。
Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space, which induces a tremendously large output space containing all possible state-object compositions. Existing works either learn the joint compositional state-object embedding or predict simple primitives with separate classifiers. However, the former heavily relies on external word embedding methods, and the latter ignores the interactions of interdependent primitives, respectively. In this paper, we revisit the primitive prediction approach and propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks. Specifically, the cross-primitive compatibility module explicitly learns to model the interactions of state and object features with the trainable memory units, which efficiently acquires cross-primitive visual attention to reason high-feasibility compositions, without the aid of external knowledge. Moreover, considering the partial-supervision setting (pCZSL) as well as the imbalance issue of multiple task prediction, we design a progressive training paradigm to enable the primitive classifiers to interact to obtain discriminative information in an easy-to-hard manner. Extensive experiments on three widely used benchmark datasets demonstrate that our method outperforms other representative methods on both OW-CZSL and pCZSL settings by large margins.