论文标题

部分可观测时空混沌系统的无模型预测

Towards Self-Supervised Learning of Global and Object-Centric Representations

论文作者

Baldassarre, Federico, Azizpour, Hossein

论文摘要

自学允许学习自然图像的有意义的表示,这通常包含一个中心对象。它如何转移到多实体场景?我们讨论了学习结构化对象以对象的为中心表示的关键方面,并通过CLEVR数据集上的几个实验来验证我们的见解。关于架构,我们确认了竞争对基于注意力的对象发现的重要性,其中每个对象都专门参加了每个图像补丁。对于训练,我们表明配备匹配的对比损失可以直接在潜在空间中应用,避免基于像素的重建。但是,这样的优化目标对假否定物(反复出现的对象)和误报(匹配错误)敏感。因此,需要仔细考虑数据增强和否定样品选择。

Self-supervision allows learning meaningful representations of natural images, which usually contain one central object. How well does it transfer to multi-entity scenes? We discuss key aspects of learning structured object-centric representations with self-supervision and validate our insights through several experiments on the CLEVR dataset. Regarding the architecture, we confirm the importance of competition for attention-based object discovery, where each image patch is exclusively attended by one object. For training, we show that contrastive losses equipped with matching can be applied directly in a latent space, avoiding pixel-based reconstruction. However, such an optimization objective is sensitive to false negatives (recurring objects) and false positives (matching errors). Careful consideration is thus required around data augmentation and negative sample selection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源