健壮和资源效率的机器在虚拟现实中学习辅助视口预测

论文标题

健壮和资源效率的机器在虚拟现实中学习辅助视口预测

Robust and Resource-efficient Machine Learning Aided Viewport Prediction in Virtual Reality

论文作者

Jiang, Yuang, Poularakis, Konstantinos, Kiedanski, Diego, Kompella, Sastry, Tassiulas, Leandros

论文摘要

近年来，由于头部安装显示（HMD）和全景相机的快速发展，近年来，360度全景视频引起了人们的关注。流媒体全景视频的一个主要问题是，与传统视频相比，全景视频的大小要大得多。此外，用户设备通常在无线环境中，电池电量，计算功率和带宽有限。为了减少资源消耗，研究人员提出了预测用户视口的方法，因此只有一部分视频需要从服务器传输。但是，这种预测方法的鲁棒性在文献中被忽略了：通常假定只有少数模型（预先训练过去用户的经验）被应用于所有用户。我们观察到，这些预训练的模型对某些用户的性能较差，因为它们可能与大多数人的行为截然不同，并且预训练的模型无法捕获看不见的视频中的功能。在这项工作中，我们提出了一种基于元学习的新型视口预测范式，以减轻最差的预测性能并确保视口预测的鲁棒性。该范式使用两个机器学习模型，其中第一个模型可以预测观看方向，第二个模型预测了最小视频预取大小，其中可能包括实际的视口。我们首先训练两种元模型，以使它们对新的培训数据敏感，然后在观看视频时快速将其调整给用户。评估结果表明，元模型可以快速适应每个用户，并且可以显着提高预测准确性，尤其是对于表现最差的预测。

360-degree panoramic videos have gained considerable attention in recent years due to the rapid development of head-mounted displays (HMDs) and panoramic cameras. One major problem in streaming panoramic videos is that panoramic videos are much larger in size compared to traditional ones. Moreover, the user devices are often in a wireless environment, with limited battery, computation power, and bandwidth. To reduce resource consumption, researchers have proposed ways to predict the users' viewports so that only part of the entire video needs to be transmitted from the server. However, the robustness of such prediction approaches has been overlooked in the literature: it is usually assumed that only a few models, pre-trained on past users' experiences, are applied for prediction to all users. We observe that those pre-trained models can perform poorly for some users because they might have drastically different behaviors from the majority, and the pre-trained models cannot capture the features in unseen videos. In this work, we propose a novel meta learning based viewport prediction paradigm to alleviate the worst prediction performance and ensure the robustness of viewport prediction. This paradigm uses two machine learning models, where the first model predicts the viewing direction, and the second model predicts the minimum video prefetch size that can include the actual viewport. We first train two meta models so that they are sensitive to new training data, and then quickly adapt them to users while they are watching the videos. Evaluation results reveal that the meta models can adapt quickly to each user, and can significantly increase the prediction accuracy, especially for the worst-performing predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题