论文标题
通过融合场景分类,检测和事件的上下文意义,在全部运动视频中
Contextual Sense Making by Fusing Scene Classification, Detections, and Events in Full Motion Video
论文作者
论文摘要
随着成像传感器的扩散,多模式成像的体积远远超过了人类分析人员充分消耗和利用它的能力。全运动视频(FMV)具有包含大量冗余时间数据的额外挑战。我们旨在满足人类分析师在给定空中FMV的消耗和利用数据的需求。我们已经调查并设计了一个能够检测到偏离FMV提要的观察基线模式的事件和感兴趣的活动的系统。我们将问题分为三个任务:(1)上下文意识,(2)对象分类和(3)事件检测。上下文意识的目的是限制视频数据中的视觉搜索和检测问题。自定义图像分类器用一个或多个标签对场景进行分类,以识别操作上下文和环境。此步骤有助于减少下游任务的语义搜索空间,以提高其准确性。第二步是对象编目,其中对象检测器的集合可以找到并标记场景中发现的任何已知对象(人,车辆,车辆,船,飞机,建筑物等)。最后,将上下文信息和检测发送到事件检测引擎以监视某些行为。一系列分析通过跟踪对象计数和对象交互来监视场景。如果未在当前场景中宣布这些对象相互作用,则系统将报告,地理位置和记录事件。感兴趣的事件包括将人们聚集成为会议和/或人群,警告何时在海滩上卸下货物,人们增加了进入建筑物的人数,人们进入和/或从感兴趣的工具中出现的人等。我们已经在各种地理区域的不同分辨率的不同传感器中应用了方法。
With the proliferation of imaging sensors, the volume of multi-modal imagery far exceeds the ability of human analysts to adequately consume and exploit it. Full motion video (FMV) possesses the extra challenge of containing large amounts of redundant temporal data. We aim to address the needs of human analysts to consume and exploit data given aerial FMV. We have investigated and designed a system capable of detecting events and activities of interest that deviate from the baseline patterns of observation given FMV feeds. We have divided the problem into three tasks: (1) Context awareness, (2) object cataloging, and (3) event detection. The goal of context awareness is to constraint the problem of visual search and detection in video data. A custom image classifier categorizes the scene with one or multiple labels to identify the operating context and environment. This step helps reducing the semantic search space of downstream tasks in order to increase their accuracy. The second step is object cataloging, where an ensemble of object detectors locates and labels any known objects found in the scene (people, vehicles, boats, planes, buildings, etc.). Finally, context information and detections are sent to the event detection engine to monitor for certain behaviors. A series of analytics monitor the scene by tracking object counts, and object interactions. If these object interactions are not declared to be commonly observed in the current scene, the system will report, geolocate, and log the event. Events of interest include identifying a gathering of people as a meeting and/or a crowd, alerting when there are boats on a beach unloading cargo, increased count of people entering a building, people getting in and/or out of vehicles of interest, etc. We have applied our methods on data from different sensors at different resolutions in a variety of geographical areas.