启用无ISP的低功耗计算机视觉

论文标题

启用无ISP的低功耗计算机视觉

Enabling ISP-less Low-Power Computer Vision

论文作者

Datta, Gourav, Liu, Zeyu, Yin, Zihan, Sun, Linyu, Jaiswal, Akhilesh R., Beerel, Peter A.

论文摘要

为了在资源受限的低功率设备上部署当前的计算机视觉（CV）模型，最近的工作提出了传感器和像素内计算方法，这些方法试图部分/完全绕过图像信号处理器（ISP），并通过在初始网络中降低激活层（CV）层（CV）处理单元（ISP），并产生显着的带宽减少。但是，与用于训练的ISP处理图像相比，对原始图像的直接推断由于图像传感器捕获的原始图像的协方差差异而降低了测试精度。此外，很难在RAW图像上训练Deep CV模型，因为大多数（如果不是全部）大规模开源数据集由RGB图像组成。为了减轻这种关注，我们建议将ISP管道倒置，ISP管道可以将任何数据集的RGB图像转换为其原始图形，并在原始图像上启用模型培训。我们发布了可可数据集的原始版本，这是通用高级视觉任务的大规模基准。对于无ISP的简历系统，与依靠传统的ISP处理的RGB数据集相比，对这些原始图像的培训导致视觉唤醒工作（VWW）数据集的测试准确性提高7.1％。为了进一步提高无ISP CV模型的准确性，并提高了通过传感器/像素内计算获得的能量和带宽益处，我们提出了一种可以与像素内计算中的像素内计算相结合的类似像素内像素示例化的能节能形式。当对Pascalraw数据集的真实传感器捕获的原始图像进行评估时，我们的方法将增加8.1％的地图。最后，我们通过使用新颖的镜头进行了30张镜头的新型Pascalraw数据集的新颖应用，在MAP上进一步增长了20.5％，构成了3个类别。

In order to deploy current computer vision (CV) models on resource-constrained low-power devices, recent works have proposed in-sensor and in-pixel computing approaches that try to partly/fully bypass the image signal processor (ISP) and yield significant bandwidth reduction between the image sensor and the CV processing unit by downsampling the activation maps in the initial convolutional neural network (CNN) layers. However, direct inference on the raw images degrades the test accuracy due to the difference in covariance of the raw images captured by the image sensors compared to the ISP-processed images used for training. Moreover, it is difficult to train deep CV models on raw images, because most (if not all) large-scale open-source datasets consist of RGB images. To mitigate this concern, we propose to invert the ISP pipeline, which can convert the RGB images of any dataset to its raw counterparts, and enable model training on raw images. We release the raw version of the COCO dataset, a large-scale benchmark for generic high-level vision tasks. For ISP-less CV systems, training on these raw images result in a 7.1% increase in test accuracy on the visual wake works (VWW) dataset compared to relying on training with traditional ISP-processed RGB datasets. To further improve the accuracy of ISP-less CV models and to increase the energy and bandwidth benefits obtained by in-sensor/in-pixel computing, we propose an energy-efficient form of analog in-pixel demosaicing that may be coupled with in-pixel CNN computations. When evaluated on raw images captured by real sensors from the PASCALRAW dataset, our approach results in a 8.1% increase in mAP. Lastly, we demonstrate a further 20.5% increase in mAP by using a novel application of few-shot learning with thirty shots each for the novel PASCALRAW dataset, constituting 3 classes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题