深层移动摄像机背景模型

论文标题

深层移动摄像机背景模型

A Deep Moving-camera Background Model

论文作者

Erez, Guy, Weber, Ron Shapira, Freifeld, Oren

论文摘要

在视频分析中，背景模型具有许多应用程序，例如背景/前景分离，更改检测，异常检测，跟踪等。但是，尽管在静态摄像头捕获的视频中学习了这种模型是一项公认的任务，但在移动相机背景模型（MCBM）的情况下，由于摄像机运动引起的算法和可伸缩性挑战，成功的范围更为适中。因此，现有的MCBM在其范围和受支持的摄像头类型的限制中受到限制。这些障碍还阻碍了基于深度学习（DL）的端到端解决方案的这项无监督的工作。此外，现有的MCBM通常在典型的大型全景图像或以在线方式的域名上建模背景。不幸的是，前者造成了几个问题，包括可扩展性差，而后者则阻止了摄像机重新访问场景中看到部分部分的案例的识别和利用。本文提出了一种称为DEEPMCBM的新方法，该方法消除了上述所有问题并实现最新结果。具体而言，首先，我们确定了与一般，尤其是在DL设置的视频帧联合对齐相关的困难。接下来，我们提出了一种新的联合一致性策略，使我们可以使用具有正则化的空间变压器网，也不是任何形式的专业化（且不差异）的初始化。再加上在不破坏的稳健中心矩（从关节对齐中获得）的自动编码器，这产生了无端到端的无端正规化MCBM，该MCBM支持广泛的相机运动并优雅地缩放。我们在各种视频上展示了DEEPMCBM的实用程序，包括超出其他方法范围的视频。我们的代码可在https://github.com/bgu-cs-vil/deepmcbm上找到。

In video analysis, background models have many applications such as background/foreground separation, change detection, anomaly detection, tracking, and more. However, while learning such a model in a video captured by a static camera is a fairly-solved task, in the case of a Moving-camera Background Model (MCBM), the success has been far more modest due to algorithmic and scalability challenges that arise due to the camera motion. Thus, existing MCBMs are limited in their scope and their supported camera-motion types. These hurdles also impeded the employment, in this unsupervised task, of end-to-end solutions based on deep learning (DL). Moreover, existing MCBMs usually model the background either on the domain of a typically-large panoramic image or in an online fashion. Unfortunately, the former creates several problems, including poor scalability, while the latter prevents the recognition and leveraging of cases where the camera revisits previously-seen parts of the scene. This paper proposes a new method, called DeepMCBM, that eliminates all the aforementioned issues and achieves state-of-the-art results. Concretely, first we identify the difficulties associated with joint alignment of video frames in general and in a DL setting in particular. Next, we propose a new strategy for joint alignment that lets us use a spatial transformer net with neither a regularization nor any form of specialized (and non-differentiable) initialization. Coupled with an autoencoder conditioned on unwarped robust central moments (obtained from the joint alignment), this yields an end-to-end regularization-free MCBM that supports a broad range of camera motions and scales gracefully. We demonstrate DeepMCBM's utility on a variety of videos, including ones beyond the scope of other methods. Our code is available at https://github.com/BGU-CS-VIL/DeepMCBM .

下载PDF全文

下载文献需遵守相关版权规定

论文标题