论文标题
联合的自我监督学习,以了解视频理解
Federated Self-supervised Learning for Video Understanding
论文作者
论文摘要
启用相机的移动设备的无处不在导致在边缘生成大量未标记的视频数据。尽管已经提出了各种自我监督的学习(SSL)方法,以收集其潜在的时空表示,以进行特定于任务的培训,包括隐私问题和沟通成本在内的实践挑战,阻止SSL在大规模上部署。为了减轻这些问题,我们建议将联合学习(FL)用于视频SSL的任务。在这项工作中,我们评估了当前最新ART(SOTA)视频-SSL技术的性能,并在整合到使用Kinetics-400数据集模拟的大规模FL设置中确定其缺点。我们遵循,为视频(称为FedVSSL)提出了一个新颖的联合SSL框架,该框架集成了不同的聚合策略和部分重量更新。广泛的实验证明了FEDVSSL的有效性和意义,因为它在UCF-101上优于下游检索任务的集中式SOTA,而HMDB-51的效率为6.66%。
The ubiquity of camera-enabled mobile devices has lead to large amounts of unlabelled video data being produced at the edge. Although various self-supervised learning (SSL) methods have been proposed to harvest their latent spatio-temporal representations for task-specific training, practical challenges including privacy concerns and communication costs prevent SSL from being deployed at large scales. To mitigate these issues, we propose the use of Federated Learning (FL) to the task of video SSL. In this work, we evaluate the performance of current state-of-the-art (SOTA) video-SSL techniques and identify their shortcomings when integrated into the large-scale FL setting simulated with kinetics-400 dataset. We follow by proposing a novel federated SSL framework for video, dubbed FedVSSL, that integrates different aggregation strategies and partial weight updating. Extensive experiments demonstrate the effectiveness and significance of FedVSSL as it outperforms the centralized SOTA for the downstream retrieval task by 6.66% on UCF-101 and 5.13% on HMDB-51.