论文标题

以角色为中心的视频缩略图检索

Character-focused Video Thumbnail Retrieval

论文作者

Ardeshir, Shervin, Kamath, Nagendra, Taghavi, Hossein

论文摘要

我们探索以角色为中心的视频框架作为视频缩略图的候选人。为了根据存在的字符评估视频的每个帧,字符(面)在两个方面进行评估:面部表达:我们训练CNN模型以测量面部是否具有在视频缩略图中的面部表达方式。该模型经过培训,可以将提取的面孔与艺术品/缩略图区分开,并将其与从随机视频框架中提取的面孔区分开。突出和互动:缩略图中的字符应为视频中的重要特征,以防止算法建议将非代表性框架作为候选者。我们使用脸部聚类来识别视频中的字符,并形成一个图表,其中字符的突出性(外观频率)及其相互作用(共同出现)被捕获。我们使用此图来推断每个候选框架中存在的字符的相关性。一旦根据上面的两个标准对每个脸进行评分,我们将通过结合框架中所有面的得分来推断帧级别得分。

We explore retrieving character-focused video frames as candidates for being video thumbnails. To evaluate each frame of the video based on the character(s) present in it, characters (faces) are evaluated in two aspects: Facial-expression: We train a CNN model to measure whether a face has an acceptable facial expression for being in a video thumbnail. This model is trained to distinguish faces extracted from artworks/thumbnails, from faces extracted from random frames of videos. Prominence and interactions: Character(s) in the thumbnail should be important character(s) in the video, to prevent the algorithm from suggesting non-representative frames as candidates. We use face clustering to identify the characters in the video, and form a graph in which the prominence (frequency of appearance) of the character(s), and their interactions (co-occurrence) are captured. We use this graph to infer the relevance of the characters present in each candidate frame. Once every face is scored based on the two criteria above, we infer frame level scores by combining the scores for all the faces within a frame.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源