论文标题
通过查询来支出
Outpainting by Queries
论文作者
论文摘要
通过基于卷积神经网络(CNN)的框架对图像支出进行了很好的研究,最近引起了计算机视觉的更多关注。但是,CNN依靠固有的归纳偏见来实现有效的样本学习,这可能会降低性能上限。在本文中,由变压器结构中具有最小的电感性偏差的灵活自我注意力机制的动机,我们将广义图像支出问题重新构架为贴片的序列到序列自动估计问题,从而使基于查询的图像效果均能覆盖。具体而言,我们提出了一个新型的混合视觉转换器基于编码器框架,名为\ textbf {query} \ textbf {o} utpainting \ textbf {tr} ansformer(\ textbf {queryotr})贴片模式的全球建模能力使我们可以从注意机制的查询角度推断图像。一个新颖的查询扩展模块(QEM)旨在根据编码器的输出从预测查询中整合信息,从而加速了纯变压器的收敛,即使使用相对较小的数据集。为了进一步提高每个贴片之间的连接性,提出的贴片平滑模块(PSM)重新分配并平均重叠区域,从而提供无缝的预测图像。我们在实验上表明,与最先进的图像支出方法相对于最新的图像方法,queryotr可以平稳,现实地产生吸引人的结果。
Image outpainting, which is well studied with Convolution Neural Network (CNN) based framework, has recently drawn more attention in computer vision. However, CNNs rely on inherent inductive biases to achieve effective sample learning, which may degrade the performance ceiling. In this paper, motivated by the flexible self-attention mechanism with minimal inductive biases in transformer architecture, we reframe the generalised image outpainting problem as a patch-wise sequence-to-sequence autoregression problem, enabling query-based image outpainting. Specifically, we propose a novel hybrid vision-transformer-based encoder-decoder framework, named \textbf{Query} \textbf{O}utpainting \textbf{TR}ansformer (\textbf{QueryOTR}), for extrapolating visual context all-side around a given image. Patch-wise mode's global modeling capacity allows us to extrapolate images from the attention mechanism's query standpoint. A novel Query Expansion Module (QEM) is designed to integrate information from the predicted queries based on the encoder's output, hence accelerating the convergence of the pure transformer even with a relatively small dataset. To further enhance connectivity between each patch, the proposed Patch Smoothing Module (PSM) re-allocates and averages the overlapped regions, thus providing seamless predicted images. We experimentally show that QueryOTR could generate visually appealing results smoothly and realistically against the state-of-the-art image outpainting approaches.