实时暴力检测的二级多维卷积网络

论文标题

实时暴力检测的二级多维卷积网络

Two-stream Multi-dimensional Convolutional Network for Real-time Violence Detection

论文作者

Ghosh, Dipon Kumar, Chakrabarty, Amitabha

论文摘要

监视摄像机和安全问题的越来越多使监视镜头的自动暴力活动检测成为有效研究领域。现代深度学习方法在暴力检测方面取得了良好的准确性，并且由于它们在智能监视系统中的适用性而被证明是成功的。但是，由于其特征提取的效率低下，这些模型在计算上昂贵且尺寸较大。这项工作提出了一种用于暴力检测的新型体系结构，称为两流多维卷积网络（2S-MDCN），该网络使用RGB框架和光学流程来检测暴力。我们提出的方法通过1D，2D和3D卷积独立提取时间和空间信息。尽管结合了多维卷积网络，但由于渠道容量的降低，我们的模型既轻量级又有效，但是他们学会了提取有意义的空间和时间信息。此外，将RGB帧和光流组合的结合比单个RGB流的准确度高2.2％。无论复杂性较小，我们的模型在最大的暴力检测基准数据集上获得了89.7％的最新准确性。

The increasing number of surveillance cameras and security concerns have made automatic violent activity detection from surveillance footage an active area for research. Modern deep learning methods have achieved good accuracy in violence detection and proved to be successful because of their applicability in intelligent surveillance systems. However, the models are computationally expensive and large in size because of their inefficient methods for feature extraction. This work presents a novel architecture for violence detection called Two-stream Multi-dimensional Convolutional Network (2s-MDCN), which uses RGB frames and optical flow to detect violence. Our proposed method extracts temporal and spatial information independently by 1D, 2D, and 3D convolutions. Despite combining multi-dimensional convolutional networks, our models are lightweight and efficient due to reduced channel capacity, yet they learn to extract meaningful spatial and temporal information. Additionally, combining RGB frames and optical flow yields 2.2% more accuracy than a single RGB stream. Regardless of having less complexity, our models obtained state-of-the-art accuracy of 89.7% on the largest violence detection benchmark dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题