关于有损图像和视频压缩对深卷积神经网络体系结构表现的影响

论文标题

关于有损图像和视频压缩对深卷积神经网络体系结构表现的影响

On the Impact of Lossy Image and Video Compression on the Performance of Deep Convolutional Neural Network Architectures

论文作者

Poyser, Matt, Atapour-Abarghouei, Amir, Breckon, Toby P.

论文摘要

广义图像理解的最新进展使深度卷积神经网络（CNN）的使用激增，跨越了广泛的基于图像的检测，分类和预测任务。尽管这些方法的报告表现令人印象深刻，但本研究调查了迄今为止对常见图像和视频压缩技术对这种深度学习体系结构性能的影响的问题。 Focusing on the JPEG and H.264 (MPEG-4 AVC) as a representative proxy for contemporary lossy image/video compression techniques that are in common use within network-connected image/video devices and infrastructure, we examine the impact on performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation.因此，在这项研究中，我们包括跨越端到端卷积的各种网络体系结构和域，编码器码头，基于区域的CNN（R-CNN），双流和生成性对抗网络（GAN）。我们的结果表明，网络性能与应用的有损压缩水平之间的不均匀和不均匀的关系。值得注意的是，性能显着降低了15％的JPEG质量（量化）水平，而H.264恒定速率因子（CRF）为40。但是，在某些情况下，对预压缩成像的重新压缩架构相反，在某些情况下会恢复高达78.4％的网络性能。此外，采用编码器数据管道的体系结构与表现出对有损图像压缩的韧性的体系结构之间存在相关性。输入压缩与输出任务性能之间关系的特征可用于为未来的图像/视频设备和基础架构内的设计决策提供信息。

Recent advances in generalized image understanding have seen a surge in the use of deep convolutional neural networks (CNN) across a broad range of image-based detection, classification and prediction tasks. Whilst the reported performance of these approaches is impressive, this study investigates the hitherto unapproached question of the impact of commonplace image and video compression techniques on the performance of such deep learning architectures. Focusing on the JPEG and H.264 (MPEG-4 AVC) as a representative proxy for contemporary lossy image/video compression techniques that are in common use within network-connected image/video devices and infrastructure, we examine the impact on performance across five discrete tasks: human pose estimation, semantic segmentation, object detection, action recognition, and monocular depth estimation. As such, within this study we include a variety of network architectures and domains spanning end-to-end convolution, encoder-decoder, region-based CNN (R-CNN), dual-stream, and generative adversarial networks (GAN). Our results show a non-linear and non-uniform relationship between network performance and the level of lossy compression applied. Notably, performance decreases significantly below a JPEG quality (quantization) level of 15% and a H.264 Constant Rate Factor (CRF) of 40. However, retraining said architectures on pre-compressed imagery conversely recovers network performance by up to 78.4% in some cases. Furthermore, there is a correlation between architectures employing an encoder-decoder pipeline and those that demonstrate resilience to lossy image compression. The characteristics of the relationship between input compression to output task performance can be used to inform design decisions within future image/video devices and infrastructure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题