Gollic：学习全球背景超出补丁的无损高分辨率图像压缩

论文标题

Gollic：学习全球背景超出补丁的无损高分辨率图像压缩

GOLLIC: Learning Global Context beyond Patches for Lossless High-Resolution Image Compression

论文作者

Lan, Yuan, Qin, Liang, Sun, Zhaoyi, Xiang, Yang, Sun, Jie

论文摘要

基于神经网络的方法最近在数据压缩领域出现，并且已经在图像压缩方面取得了重大进展，尤其是在达到较高的压缩比方面。但是，在无损图像压缩方案中，由于计算源的限制，现有方法通常难以学习全尺寸高分辨率图像的概率模型。当前的策略是将高分辨率图像裁剪成多个非重叠的补丁，并独立处理它们。该策略忽略了斑块以外的长期依赖性，从而限制了建模性能。为了解决这个问题，我们提出了一个具有全局上下文的分层潜在变量模型，以捕获高分辨率图像的长期依赖性。除了每个补丁独特的潜在变量外，我们还介绍了补丁之间的共享潜在变量以构建全局上下文。共享的潜在变量是由模型编码器内部的自我监督聚类模块提取的。此聚类模块为每个补丁分配了其属于任何群集的信心。后来，根据补丁的潜在变量及其信心，可以学习共享的潜在变量，这反映了相同集群中补丁的相似性并使全球上下文建模受益。实验结果表明，与三个基准高分辨率图像数据集（Div2k，clic.pro和clic.mobile）上的工程编解码器和深度学习模型相比，我们的全球上下文模型可提高压缩比。

Neural-network-based approaches recently emerged in the field of data compression and have already led to significant progress in image compression, especially in achieving a higher compression ratio. In the lossless image compression scenario, however, existing methods often struggle to learn a probability model of full-size high-resolution images due to the limitation of the computation source. The current strategy is to crop high-resolution images into multiple non-overlapping patches and process them independently. This strategy ignores long-term dependencies beyond patches, thus limiting modeling performance. To address this problem, we propose a hierarchical latent variable model with a global context to capture the long-term dependencies of high-resolution images. Besides the latent variable unique to each patch, we introduce shared latent variables between patches to construct the global context. The shared latent variables are extracted by a self-supervised clustering module inside the model's encoder. This clustering module assigns each patch the confidence that it belongs to any cluster. Later, shared latent variables are learned according to latent variables of patches and their confidence, which reflects the similarity of patches in the same cluster and benefits the global context modeling. Experimental results show that our global context model improves compression ratio compared to the engineered codecs and deep learning models on three benchmark high-resolution image datasets, DIV2K, CLIC.pro, and CLIC.mobile.

下载PDF全文

下载文献需遵守相关版权规定

论文标题