学习视频的学习像素级别的区别突出显示检测

论文标题

学习视频的学习像素级别的区别突出显示检测

Learning Pixel-Level Distinctions for Video Highlight Detection

论文作者

Wei, Fanyue, Wang, Biao, Ge, Tiezheng, Jiang, Yuning, Li, Wen, Duan, Lixin

论文摘要

视频突出显示检测的目的是从长视频中选择最具吸引力的段，以描绘视频中最有趣的部分。现有方法通常着重于不同视频段之间的建模关系，以学习可以将突出显示分数分配给这些段的模型；但是，这些方法并未明确考虑各个细分市场内的上下文依赖性。为此，我们建议学习像素级别的区别，以改善视频重点检测。此像素级别的区别指示一个视频中的每个像素是否属于一个有趣的部分。建模这样的优质区别的优点是两个方面。首先，它使我们能够在一个视频中利用内容的时间和空间关系，因为一个帧中像素的区别在很大程度上取决于此帧之前的内容以及此帧中此像素周围的内容。其次，学习像素级别的区别还可以很好地解释视频突出显示有关精彩片段中哪些内容对人们有吸引力的任务。我们设计了一个编码器 - 编码网络来估计像素级别的区别，在该网络中，我们利用3D卷积神经网络来利用时间上下文信息，并进一步利用视觉显着性来建模空间区别。三个公共基准的最新性能清楚地验证了我们在视频中的框架的有效性。

The goal of video highlight detection is to select the most attractive segments from a long video to depict the most interesting parts of the video. Existing methods typically focus on modeling relationship between different video segments in order to learning a model that can assign highlight scores to these segments; however, these approaches do not explicitly consider the contextual dependency within individual segments. To this end, we propose to learn pixel-level distinctions to improve the video highlight detection. This pixel-level distinction indicates whether or not each pixel in one video belongs to an interesting section. The advantages of modeling such fine-level distinctions are two-fold. First, it allows us to exploit the temporal and spatial relations of the content in one video, since the distinction of a pixel in one frame is highly dependent on both the content before this frame and the content around this pixel in this frame. Second, learning the pixel-level distinction also gives a good explanation to the video highlight task regarding what contents in a highlight segment will be attractive to people. We design an encoder-decoder network to estimate the pixel-level distinction, in which we leverage the 3D convolutional neural networks to exploit the temporal context information, and further take advantage of the visual saliency to model the spatial distinction. State-of-the-art performance on three public benchmarks clearly validates the effectiveness of our framework for video highlight detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题