上下文形式：在学到的图像压缩中进行上下文建模的具有海上渠道注意的变压器

论文标题

上下文形式：在学到的图像压缩中进行上下文建模的具有海上渠道注意的变压器

Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression

论文作者

Koyuncu, A. Burakhan, Gao, Han, Boev, Atanas, Gaikov, Georgii, Alshina, Elena, Steinbach, Eckehard

论文摘要

熵建模是高性能图像压缩算法的关键组件。自回旋上下文建模的最新发展有助于基于学习的方法超越了经典的方法。但是，由于潜在空间中的空间通道依赖性不足，并且可以在上下文适应性的次优实现中，因此可以进一步提高这些模型的性能。受到变压器的自适应特性的启发，我们提出了一个基于变压器的上下文模型，名为ContextFormer，该模型将事实上的标准注意机制推广到时空通道的注意力。我们用上下文形式替换了现代压缩框架的上下文模型，并在广泛使用的柯达，Clic2020和Tecnick Image数据集上进行测试。我们的实验结果表明，与标准多功能视频编码（VVC）测试模型（VTM）16.2相比，提出的模型可节省多达11％的利率，并且在PSNR和MS-SSIM方面胜过各种基于学习的模型。

Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adaptivity. Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak, CLIC2020, and Tecnick image datasets. Our experimental results show that the proposed model provides up to 11% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VTM) 16.2, and outperforms various learning-based models in terms of PSNR and MS-SSIM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题