有效的图像产生带有变量的注意力头

论文标题

有效的图像产生带有变量的注意力头

Efficient Image Generation with Variadic Attention Heads

论文作者

Walton, Steven, Hassani, Ali, Xu, Xingqian, Wang, Zhangyang, Shi, Humphrey

论文摘要

尽管视觉模型中变压器的整合对视力任务产生了重大改进，但他们仍需要大量的训练和推理计算。受限制的注意机制大大减轻了这些计算负担，但以失去全球或局部连贯性为代价。我们提出了一种简单而强大的方法来减少这些权衡：让单个变压器的注意力头进入多个接收场。我们演示了利用邻里注意力（NA）的方法，并将其集成到基于样式的架构中以生成图像。借助这项名为Stylenat的工作，我们能够在FFHQ上获得2.05的FID，比StyleGAN-XL提高了6％，同时使用了28％的参数，并且具有4 $ \ times $ \ times $ \ times $ the吞吐量。 StyLenat在FFHQ-256上实现了Pareto边界，并在其他数据集上展示了强大而有效的图像生成。我们的代码和模型检查点可公开可用：https：//github.com/shi-labs/stylenat

While the integration of transformers in vision models have yielded significant improvements on vision tasks they still require significant amounts of computation for both training and inference. Restricted attention mechanisms significantly reduce these computational burdens but come at the cost of losing either global or local coherence. We propose a simple, yet powerful method to reduce these trade-offs: allow the attention heads of a single transformer to attend to multiple receptive fields. We demonstrate our method utilizing Neighborhood Attention (NA) and integrate it into a StyleGAN based architecture for image generation. With this work, dubbed StyleNAT, we are able to achieve a FID of 2.05 on FFHQ, a 6% improvement over StyleGAN-XL, while utilizing 28% fewer parameters and with 4$\times$ the throughput capacity. StyleNAT achieves the Pareto Frontier on FFHQ-256 and demonstrates powerful and efficient image generation on other datasets. Our code and model checkpoints are publicly available at: https://github.com/SHI-Labs/StyleNAT

下载PDF全文

下载文献需遵守相关版权规定

论文标题