论文标题
10位视频:效率和隐私的少量视频QA
Video in 10 Bits: Few-Bit VideoQA for Efficiency and Privacy
论文作者
论文摘要
在视频问题回答(videoqa)中,回答有关视频的一般性问题需要其视觉信息。但是,视频通常包含与VideoQA任务无关的冗余信息。例如,如果任务只是要回答类似于“有人在视频中笑的问题?”类似的问题,那么所有其他信息都可以丢弃。本文研究了视频中真正需要多少位,以便通过引入一个新颖的几位视频问题来进行VideoQA进行视频QA,目的是用很少的视频信息(例如10位)完成VideoQA。我们提出了一种简单但有效的特定任务特征压缩方法来解决此问题。具体来说,我们将轻巧的功能压缩模块(Feat Comp)插入一个视频QA模型中,该模型学会了提取特定于任务的小型功能,仅10位,这对于回答某些类型的问题是最佳的。我们在MPEG4编码的视频上展示了超过100,000倍的存储效率,而在常规浮点功能上,准确性的绝对损失仅为2.0-6.6%,这是一个令人惊讶和新颖的发现。最后,我们分析了学到的微小特征捕获的内容,并证明它们已经消除了大多数非任务特定信息,并引入了一些激活图以可视化存储的信息。这通过为功能中的技术提供K匿名性和鲁棒性来降低数据的隐私风险,这可以影响机器学习社区,从而使我们能够以隐私保证存储数据,同时仍在有效执行任务。
In Video Question Answering (VideoQA), answering general questions about a video requires its visual information. Yet, video often contains redundant information irrelevant to the VideoQA task. For example, if the task is only to answer questions similar to "Is someone laughing in the video?", then all other information can be discarded. This paper investigates how many bits are really needed from the video in order to do VideoQA by introducing a novel Few-Bit VideoQA problem, where the goal is to accomplish VideoQA with few bits of video information (e.g., 10 bits). We propose a simple yet effective task-specific feature compression approach to solve this problem. Specifically, we insert a lightweight Feature Compression Module (FeatComp) into a VideoQA model which learns to extract task-specific tiny features as little as 10 bits, which are optimal for answering certain types of questions. We demonstrate more than 100,000-fold storage efficiency over MPEG4-encoded videos and 1,000-fold over regular floating point features, with just 2.0-6.6% absolute loss in accuracy, which is a surprising and novel finding. Finally, we analyze what the learned tiny features capture and demonstrate that they have eliminated most of the non-task-specific information, and introduce a Bit Activation Map to visualize what information is being stored. This decreases the privacy risk of data by providing k-anonymity and robustness to feature-inversion techniques, which can influence the machine learning community, allowing us to store data with privacy guarantees while still performing the task effectively.