论文标题
关于利用变异图嵌入开放世界的组成零 - 弹药学习
On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning
论文作者
论文摘要
人类能够识别和分类已知概念的新颖组成。组成零射击学习(CZSL)的任务是学习原始概念的组成,即对象和状态,以便甚至它们的新颖组成也可以被归类为零。在这项工作中,我们不假定有关新颖组成的可行性(即开放世界设置)的任何先验知识,在这种情况下,不可行的组成主导了搜索空间。我们提出了一种组成变分图自动编码器(CVGAE)方法,用于学习原始概念(节点)的变分嵌入以及其组成的可行性(通过边缘)。这种建模使CVGAE可扩展到现实世界应用方案。这与SOTA方法CGE相反,CGE在计算上非常昂贵。例如,对于基准C-GQA数据集,CGE需要3.94 x 10^5节点,而CVGAE仅需要1323个节点。我们学习图形和图像嵌入到公共嵌入空间上的映射。 CVGAE采用了一种深度的度量学习方法,并通过投影图和图像嵌入之间的双向对比损失来了解该空间中的相似性度量。我们验证了方法对三个基准数据集的有效性。我们还通过图像检索任务证明了CVGAE所学的表示形式更适合组成概括。
Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, where infeasible compositions dominate the search space. We propose a Compositional Variational Graph Autoencoder (CVGAE) approach for learning the variational embeddings of the primitive concepts (nodes) as well as feasibility of their compositions (via edges). Such modelling makes CVGAE scalable to real-world application scenarios. This is in contrast to SOTA method, CGE, which is computationally very expensive. e.g.for benchmark C-GQA dataset, CGE requires 3.94 x 10^5 nodes, whereas CVGAE requires only 1323 nodes. We learn a mapping of the graph and image embeddings onto a common embedding space. CVGAE adopts a deep metric learning approach and learns a similarity metric in this space via bi-directional contrastive loss between projected graph and image embeddings. We validate the effectiveness of our approach on three benchmark datasets.We also demonstrate via an image retrieval task that the representations learnt by CVGAE are better suited for compositional generalization.