用于检测娱乐媒体背景字符的视觉字符配置文件

论文标题

用于检测娱乐媒体背景字符的视觉字符配置文件

Audio visual character profiles for detecting background characters in entertainment media

论文作者

Sharma, Rahul, Narayanan, Shrikanth

论文摘要

计算媒体智能的一个基本目标是支持理解媒体故事（无论是新闻，商业或娱乐媒体）如何代表和反映社会，这些刻画被认为。人们是媒体故事的核心元素。本文重点是了解媒体描述中背景角色的表示和描述，主要是电影和电视节目。我们将背景字符定义为那些在整个电影中没有声音参与任何场景的人，并解决了视频中本地化背景字符的问题。我们使用主动的扬声器定位系统来提取高信心的面对语音关联，并通过自动聚类来为电影中的字符产生视听配置文件。然后，使用面部验证系统，我们修剪所有与任何生成的字符配置文件匹配并获得背景字符面向轨迹的所有面轨。我们策划了一个背景字符数据集，该数据集为一组电视节目提供背景字符的注释，并使用它来评估背景字符检测框架的性能。

An essential goal of computational media intelligence is to support understanding how media stories -- be it news, commercial or entertainment media -- represent and reflect society and these portrayals are perceived. People are a central element of media stories. This paper focuses on understanding the representation and depiction of background characters in media depictions, primarily movies and TV shows. We define the background characters as those who do not participate vocally in any scene throughout the movie and address the problem of localizing background characters in videos. We use an active speaker localization system to extract high-confidence face-speech associations and generate audio-visual profiles for talking characters in a movie by automatically clustering them. Using a face verification system, we then prune all the face-tracks which match any of the generated character profiles and obtain the background character face-tracks. We curate a background character dataset which provides annotations for background character for a set of TV shows, and use it to evaluate the performance of the background character detection framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题