论文标题
带有360个视频和高阶Ambisonics音频的视听数据库,以进行感知,认知,行为和QoE评估研究
Audiovisual Database with 360 Video and Higher-Order Ambisonics Audio for Perception, Cognition, Behavior, and QoE Evaluation Research
论文作者
论文摘要
对多模式感知,人类认知,行为和注意力的研究可以从高保真内容中受益,这些内容可能会在头部安装的显示器上渲染时可能会重现现实生活中的场景。此外,视听感知,认知过程和行为的各个方面可能补充了基于问卷的经验质量(QOE)评估交互式虚拟环境。当前,缺乏高质量的开源视听数据库,可用于评估能够复制高质量内容的这些方面或系统。在本文中,我们提供了一个公开可用的视听数据库,该数据库由十二个场景组成,捕获现实生活中的自然和城市环境,视频分辨率为7680x3840,每秒钟60帧,以及第4阶Ambisonics Audio。这些360个视频序列的平均持续时间为60秒,代表了用于系统评估单/多模式感知,认知,行为和QoE的各个方面的现实生活设置。本文提供了场景要求,记录方法和场景描述的详细信息。该数据库提供了高质量的参考材料,以平衡的关注听觉和视觉感官信息。数据库将不断更新,并使用其他场景和其他元数据,例如人类评级和显着信息。
Research into multi-modal perception, human cognition, behavior, and attention can benefit from high-fidelity content that may recreate real-life-like scenes when rendered on head-mounted displays. Moreover, aspects of audiovisual perception, cognitive processes, and behavior may complement questionnaire-based Quality of Experience (QoE) evaluation of interactive virtual environments. Currently, there is a lack of high-quality open-source audiovisual databases that can be used to evaluate such aspects or systems capable of reproducing high-quality content. With this paper, we provide a publicly available audiovisual database consisting of twelve scenes capturing real-life nature and urban environments with a video resolution of 7680x3840 at 60 frames-per-second and with 4th-order Ambisonics audio. These 360 video sequences, with an average duration of 60 seconds, represent real-life settings for systematically evaluating various dimensions of uni-/multi-modal perception, cognition, behavior, and QoE. The paper provides details of the scene requirements, recording approach, and scene descriptions. The database provides high-quality reference material with a balanced focus on auditory and visual sensory information. The database will be continuously updated with additional scenes and further metadata such as human ratings and saliency information.