使用跨模式循环gan的转导零拍学习

论文标题

使用跨模式循环gan的转导零拍学习

Transductive Zero-Shot Learning using Cross-Modal CycleGAN

论文作者

Bordes, Patrick, Zablocki, Eloi, Piwowarski, Benjamin, Gallinari, Patrick

论文摘要

在计算机视觉中，零拍学习（ZSL）旨在分类看不见的类 - 不存在匹配培训图像的类。大多数ZSL作品都学习图像和类标签之间的跨模式映射。但是，可见和看不见的类的数据分布可能会有所不同，从而导致域移位问题。经过此观察，偏置ZSL（T-ZSL）假设在训练期间已经知道了看不见的类及其相关的图像，但却不知道它们的对应关系。由于当前的T-ZSL方法在可见类的数量很高时无法有效地扩展，因此我们通过基于Cyclegan的T-ZSL新模型来解决此问题。我们的模型（i）共同将图像在其见面的类标签上投影，并具有监督的目标，（ii）将看不见的类标签和视觉示例与对抗性和周期一致性目标保持一致。我们显示了我们获得最先进结果的Imagenet T-ZSL任务上的跨模式循环gan模型（CM-GAN）的效率。我们进一步验证了CM-GAN的语言接地任务，并在我们提出的新任务上进行：零声句子对图像的匹配在MS Coco上。

In Computer Vision, Zero-Shot Learning (ZSL) aims at classifying unseen classes -- classes for which no matching training image exists. Most of ZSL works learn a cross-modal mapping between images and class labels for seen classes. However, the data distribution of seen and unseen classes might differ, causing a domain shift problem. Following this observation, transductive ZSL (T-ZSL) assumes that unseen classes and their associated images are known during training, but not their correspondence. As current T-ZSL approaches do not scale efficiently when the number of seen classes is high, we tackle this problem with a new model for T-ZSL based upon CycleGAN. Our model jointly (i) projects images on their seen class labels with a supervised objective and (ii) aligns unseen class labels and visual exemplars with adversarial and cycle-consistency objectives. We show the efficiency of our Cross-Modal CycleGAN model (CM-GAN) on the ImageNet T-ZSL task where we obtain state-of-the-art results. We further validate CM-GAN on a language grounding task, and on a new task that we propose: zero-shot sentence-to-image matching on MS COCO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题