论文标题
电池,相机,动作!学习表达机器人摄影的语义控制空间
Batteries, camera, action! Learning a semantic control space for expressive robot cinematography
论文作者
论文摘要
航空车正在彻底改变电影制片人可以通过构成新颖的空中和动态观点来捕捉演员的镜头的方式。但是,尽管自动飞行技术取得了长足的进步,但生成表达的摄像头行为仍然是一个挑战,需要非技术用户编辑大量不直觉的控制参数。在这项工作中,我们开发了一个数据驱动的框架,该框架能够在语义空间(例如平静,愉快,建立)中编辑这些复杂的相机定位参数。首先,我们在照片真实的模拟器中生成一个具有各种镜头的视频剪辑数据库,并在人群源框架中使用数百名参与者为每个剪辑提供一组语义描述符的分数。接下来,我们分析描述符之间的相关性,并根据摄影指南和人类的感知研究建立语义控制空间。最后,我们学习了一个生成模型,该模型可以将一组所需的语义视频描述符映射到低级相机轨迹参数中。我们通过证明我们的模型成功生成由参与者评估为每个描述符的预期表达程度的照片来评估我们的系统。我们还表明,我们的模型在模拟和现实世界实验中都概括为不同的场景。数据和视频在以下网址找到:https://sites.google.com/view/robotcam。
Aerial vehicles are revolutionizing the way film-makers can capture shots of actors by composing novel aerial and dynamic viewpoints. However, despite great advancements in autonomous flight technology, generating expressive camera behaviors is still a challenge and requires non-technical users to edit a large number of unintuitive control parameters. In this work, we develop a data-driven framework that enables editing of these complex camera positioning parameters in a semantic space (e.g. calm, enjoyable, establishing). First, we generate a database of video clips with a diverse range of shots in a photo-realistic simulator, and use hundreds of participants in a crowd-sourcing framework to obtain scores for a set of semantic descriptors for each clip. Next, we analyze correlations between descriptors and build a semantic control space based on cinematography guidelines and human perception studies. Finally, we learn a generative model that can map a set of desired semantic video descriptors into low-level camera trajectory parameters. We evaluate our system by demonstrating that our model successfully generates shots that are rated by participants as having the expected degrees of expression for each descriptor. We also show that our models generalize to different scenes in both simulation and real-world experiments. Data and video found at: https://sites.google.com/view/robotcam.