用于复杂城市场景的语义引导网络

论文标题

用于复杂城市场景的语义引导网络

Semantic-Guided Inpainting Network for Complex Urban Scenes Manipulation

论文作者

Ardino, Pierfrancesco, Liu, Yahui, Ricci, Elisa, Lepri, Bruno, De Nadai, Marco

论文摘要

操纵复杂场景的图像以重建，插入和/或删除特定的对象实例是一项具有挑战性的任务。复杂的场景包含多种语义和对象，这些语义和对象经常被混乱或模棱两可，从而阻碍了介入模型的性能。传统的技术通常依赖于结构信息，例如在多阶段方法中产生不可靠的结果和边界的对象轮廓。在这项工作中，我们提出了一个新颖的深度学习模型，以通过删除图像的用户指定部分，并在该场景中连贯插入一个新对象（例如汽车或行人），以改变复杂的城市场景。受图像插入的最新作品的启发，我们提出的方法利用语义分割来建模图像的内容和结构，并了解对象的最佳形状和位置以插入。为了产生可靠的结果，我们设计了一个新的解码器块，该块结合了语义细分和生成任务，以更好地指导新的对象和场景的生成，必须在语义上与图像一致。我们的实验在两个大规模的城市场景（CityScapes and Indian Driving）上进行，表明我们提出的方法成功地解决了语义引入的复杂城市场景的介绍的问题。

Manipulating images of complex scenes to reconstruct, insert and/or remove specific object instances is a challenging task. Complex scenes contain multiple semantics and objects, which are frequently cluttered or ambiguous, thus hampering the performance of inpainting models. Conventional techniques often rely on structural information such as object contours in multi-stage approaches that generate unreliable results and boundaries. In this work, we propose a novel deep learning model to alter a complex urban scene by removing a user-specified portion of the image and coherently inserting a new object (e.g. car or pedestrian) in that scene. Inspired by recent works on image inpainting, our proposed method leverages the semantic segmentation to model the content and structure of the image, and learn the best shape and location of the object to insert. To generate reliable results, we design a new decoder block that combines the semantic segmentation and generation task to guide better the generation of new objects and scenes, which have to be semantically consistent with the image. Our experiments, conducted on two large-scale datasets of urban scenes (Cityscapes and Indian Driving), show that our proposed approach successfully address the problem of semantically-guided inpainting of complex urban scene.

下载PDF全文

下载文献需遵守相关版权规定

论文标题