使用相机原始快照有效的视觉计算

论文标题

使用相机原始快照有效的视觉计算

Efficient Visual Computing with Camera RAW Snapshots

论文作者

Li, Zhihao, Lu, Ming, Zhang, Xu, Feng, Xin, Asif, M. Salman, Ma, Zhan

论文摘要

常规摄像机在传感器上捕获图像辐照度，并使用图像信号处理器（ISP）将其转换为RGB图像。然后，这些图像可用于在各种应用中的摄影或视觉计算任务，例如公共安全监视和自动驾驶。可以说，由于原始图像包含所有捕获的信息，因此对于视觉计算而言，使用ISP不需要使用RGB转换为RGB。在本文中，我们提出了一个新颖的$ρ$ vision框架，以使用原始图像进行高水平的语义理解和低级压缩，而无需数十年的ISP子系统。考虑到可用的原始图像数据集的稀缺性，我们首先基于不受监督的Cyclegan开发一个未配对的Cycler2R网络，以使用未配对的RAW和RGB图像来训练模块化的ISP和Inverse ISP（Invisp）模型。然后，我们可以使用任何现有的RGB图像数据集和Finetune不同的模型灵活地生成模拟的原始图像（SIMRAW），该模型最初对RGB域进行了训练，以处理真实世界相机的原始图像。我们使用原始域Yolov3和来自各种摄像机的快照上的原始图像压缩机（RIC）在原始域中演示了对象检测和图像压缩功能。定量结果表明，与RGB域处理相比，原始域任务推断提供了更好的检测准确性和压缩。此外，所提出的\ r {ho} vision在各种相机传感器和不同特定于任务的模型上进行了概括。消除ISP的建议$ρ$视频的其他优点是计算和处理时间的潜在减少。

Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $ρ$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $ρ$-Vision that eliminates the ISP are the potential reductions in computations and processing times.

下载PDF全文

下载文献需遵守相关版权规定

论文标题