论文标题
旨在提高自回旋插槽VAE的发电质量
Towards Improving the Generation Quality of Autoregressive Slot VAEs
论文作者
论文摘要
无条件的场景推理和产生与单个组成模型共同学习的挑战。尽管在图像中提取以对象表示以对象为中心的表示(“'slots”)的模型上令人鼓舞,但插槽中无条件的场景产生的关注较少。这主要是因为学习要想象连贯的场景所必需的多物体关系是困难的。我们假设大多数基于插槽的模型具有学习对象相关性的能力有限。我们提出了两项改进,以加强对象相关学习。首先是在全局,场景级变量上调节插槽,该变量捕获插槽之间的高阶相关性。其次,我们通过提议学习一致的顺序用于自动回归的场景对象,解决图像中对象的基本缺乏。具体来说,我们在按照学习的顺序按顺序生成场景对象之前训练自回旋插槽。有序的插槽推理首先使用现有方法从图像中提取插槽,然后将这些插槽与以前的插槽进行自动加压的订购插槽对齐。我们在三个多对象环境中进行的实验表明,无条件的场景产生质量中有明显的收益。还提供了详细的消融研究,以验证这两种改进。
Unconditional scene inference and generation are challenging to learn jointly with a single compositional model. Despite encouraging progress on models that extract object-centric representations (''slots'') from images, unconditional generation of scenes from slots has received less attention. This is primarily because learning the multi-object relations necessary to imagine coherent scenes is difficult. We hypothesize that most existing slot-based models have a limited ability to learn object correlations. We propose two improvements that strengthen object correlation learning. The first is to condition the slots on a global, scene-level variable that captures higher-order correlations between slots. Second, we address the fundamental lack of a canonical order for objects in images by proposing to learn a consistent order to use for the autoregressive generation of scene objects. Specifically, we train an autoregressive slot prior to sequentially generate scene objects following a learned order. Ordered slot inference entails first estimating a randomly ordered set of slots using existing approaches for extracting slots from images, then aligning those slots to ordered slots generated autoregressively with the slot prior. Our experiments across three multi-object environments demonstrate clear gains in unconditional scene generation quality. Detailed ablation studies are also provided that validate the two proposed improvements.