SSMTL ++：重新审查视频异常检测的自我监督多任务学习

论文标题

SSMTL ++：重新审查视频异常检测的自我监督多任务学习

SSMTL++: Revisiting Self-Supervised Multi-Task Learning for Video Anomaly Detection

论文作者

Barbalau, Antonio, Ionescu, Radu Tudor, Georgescu, Mariana-Iuliana, Dueholm, Jacob, Ramachandra, Bharathkumar, Nasrollahi, Kamal, Khan, Fahad Shahbaz, Moeslund, Thomas B., Shah, Mubarak

论文摘要

最近在文献中引入了用于视频异常检测的自我监督的多任务学习（SSMTL）框架。由于其准确的结果，该方法吸引了许多研究人员的注意。在这项工作中，我们重新访问了自我监管的多任务学习框架，并提出了对原始方法的几个更新。首先，我们研究各种检测方法，例如基于使用光流或背景减法检测高运动区域，因为我们认为当前使用的预训练的Yolov3是次优的，例如从未检测到运动中的对象或未知类的对象。其次，我们通过引入多头自发项模块的启发，通过引入多头自我发项模块来使3D卷积骨干现代化。因此，我们替代地引入了2D和3D卷积视觉变压器（CVT）块。第三，为了进一步改进模型，我们研究了其他自我监督的学习任务，例如通过知识蒸馏，解决拼图拼图，通过知识蒸馏，预测掩盖的区域（Inpaining）以及使用Pseudo-Anomalies来预测身体姿势，通过知识蒸馏来预测细分图。我们进行实验，以评估引入变化的性能影响。在找到框架的更有希望的配置后，称为SSMTL ++ V1和SSMTL ++ V2时，我们将初步实验扩展到了更多数据集，表明我们的性能提高在所有数据集中都是一致的。在大多数情况下，我们在大道，上海技术和Ubnormal上的结果将最新的性能栏提升到了新的水平。

A self-supervised multi-task learning (SSMTL) framework for video anomaly detection was recently introduced in literature. Due to its highly accurate results, the method attracted the attention of many researchers. In this work, we revisit the self-supervised multi-task learning framework, proposing several updates to the original method. First, we study various detection methods, e.g. based on detecting high-motion regions using optical flow or background subtraction, since we believe the currently used pre-trained YOLOv3 is suboptimal, e.g. objects in motion or objects from unknown classes are never detected. Second, we modernize the 3D convolutional backbone by introducing multi-head self-attention modules, inspired by the recent success of vision transformers. As such, we alternatively introduce both 2D and 3D convolutional vision transformer (CvT) blocks. Third, in our attempt to further improve the model, we study additional self-supervised learning tasks, such as predicting segmentation maps through knowledge distillation, solving jigsaw puzzles, estimating body pose through knowledge distillation, predicting masked regions (inpainting), and adversarial learning with pseudo-anomalies. We conduct experiments to assess the performance impact of the introduced changes. Upon finding more promising configurations of the framework, dubbed SSMTL++v1 and SSMTL++v2, we extend our preliminary experiments to more data sets, demonstrating that our performance gains are consistent across all data sets. In most cases, our results on Avenue, ShanghaiTech and UBnormal raise the state-of-the-art performance bar to a new level.

下载PDF全文

下载文献需遵守相关版权规定

论文标题