FFNERV：视频的流动引导框架神经表示

论文标题

FFNERV：视频的流动引导框架神经表示

FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos

论文作者

Lee, Joo Chan, Rho, Daniel, Ko, Jong Hwan, Park, Eunbyung

论文摘要

神经场，也称为基于坐标或隐式的神经表示，表现出很大的代表，产生和操纵各种形式的信号的能力。但是，对于视频表示形式，将像素的坐标映射到RGB颜色已显示出相对较低的压缩性能以及缓慢的收敛和推理速度。框架视频表示形式将时间坐标映射到其整个框架上，最近已成为表示视频，提高压缩率和编码速度的替代方法。尽管很有希望，但它仍然无法达到最先进的视频压缩算法的性能。在这项工作中，我们提出了FFNERV，这是一种将流信息纳入框架表示形式中的新颖方法，以利用受标准视频编解码器启发的视频中框架的时间冗余。此外，我们引入了一个完全卷积的体系结构，该体系结构由一维时间网格启用，从而提高了空间特征的连续性。实验结果表明，FFNERV使用框架表示或神经场之间的视频压缩和框架插值产生最佳性能。为了进一步降低模型大小，我们使用小组和侧角卷积设计了更紧凑的卷积架构。借助模型压缩技术，包括量化感知训练和熵编码，FFNERV优于广泛使用的标准视频编解码器（H.264和HEVC），并与最先进的视频压缩算法相同。

Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which maps a temporal coordinate to its entire frame, has recently emerged as an alternative method to represent videos, improving compression rates and encoding speed. While promising, it has still failed to reach the performance of state-of-the-art video compression algorithms. In this work, we propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs. Furthermore, we introduce a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features. Experimental results show that FFNeRV yields the best performance for video compression and frame interpolation among the methods using frame-wise representations or neural fields. To reduce the model size even further, we devise a more compact convolutional architecture using the group and pointwise convolutions. With model compression techniques, including quantization-aware training and entropy coding, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题