为什么视频分析的准确性波动，我们该怎么办？

论文标题

为什么视频分析的准确性波动，我们该怎么办？

Why is the video analytics accuracy fluctuating, and what can we do about it?

论文作者

Paul, Sibendu, Rao, Kunal, Coviello, Giuseppe, Sankaradas, Murugan, Po, Oliver, Hu, Y. Charlie, Chakradhar, Srimat

论文摘要

将视频视为一系列图像（框架），并重新使用深度神经网络模型，这是一种常见的做法，这些模型仅在图像上培训，以完成视频上的类似分析任务。在本文中，我们表明，这种信念的飞跃是，在图像上运作良好的深度学习模型实际上也可以很好地奏效。我们表明，即使摄像机正在查看没有以任何可察觉的方式变化的场景，并且我们控制视频压缩和环境（照明）等外部因素，视频分析应用程序的准确性也会显着波动。这些波动之所以发生，是因为摄像机产生的连续帧可能在视觉上看起来相似，但是视频分析应用程序对这些帧的看法却大不相同。我们观察到这些波动的根本原因是摄像机自动进行的动态摄像头参数更改，以捕获和生成视觉上令人愉悦的视频。摄像机无意间充当无意的对手，因为如我们所示，连续帧中图像像素值的这些微小变化对从视频分析任务中重新使用图像训练的深度学习模型的见解的准确性产生了显着不利影响。为了从相机中解决这种无意的对抗效应，我们探讨了转移学习技术的使用，以通过从图像分析任务中学习知识来改善视频分析任务中的学习。特别是，我们表明我们新训练的Yolov5模型在跨帧的对象检测中降低了波动，从而可以更好地跟踪对象（跟踪中的错误少40％）。我们的论文还提供了新的方向和技术，以减轻相机对用于视频分析应用程序的深度学习模型的对抗性影响。

It is a common practice to think of a video as a sequence of images (frames), and re-use deep neural network models that are trained only on images for similar analytics tasks on videos. In this paper, we show that this leap of faith that deep learning models that work well on images will also work well on videos is actually flawed. We show that even when a video camera is viewing a scene that is not changing in any human-perceptible way, and we control for external factors like video compression and environment (lighting), the accuracy of video analytics application fluctuates noticeably. These fluctuations occur because successive frames produced by the video camera may look similar visually, but these frames are perceived quite differently by the video analytics applications. We observed that the root cause for these fluctuations is the dynamic camera parameter changes that a video camera automatically makes in order to capture and produce a visually pleasing video. The camera inadvertently acts as an unintentional adversary because these slight changes in the image pixel values in consecutive frames, as we show, have a noticeably adverse impact on the accuracy of insights from video analytics tasks that re-use image-trained deep learning models. To address this inadvertent adversarial effect from the camera, we explore the use of transfer learning techniques to improve learning in video analytics tasks through the transfer of knowledge from learning on image analytics tasks. In particular, we show that our newly trained Yolov5 model reduces fluctuation in object detection across frames, which leads to better tracking of objects(40% fewer mistakes in tracking). Our paper also provides new directions and techniques to mitigate the camera's adversarial effect on deep learning models used for video analytics applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题