论文标题
联合自我监督的语音表示:我们在那里吗?
Federated Self-supervised Speech Representations: Are We There Yet?
论文作者
论文摘要
支持麦克风的设备的普遍存在导致在边缘生产大量未标记的音频数据。自我监督学习(SSL)和联合学习(FL)的整合到一个连贯的系统中可以提供数据隐私保证,同时还可以提高语音表示的质量和稳健性。在本文中,从算法,硬件和系统限制的角度来看,我们对FL场景下的培训语音SSL模型的可行性和复杂性提供了首个系统研究。尽管它们的组合具有很高的潜力,但我们发现现有的系统限制和算法行为使SSL和FL系统几乎无法构建。然而,至关重要的是,我们的结果表明了特定的绩效瓶颈和研究机会,这将使这种情况得到逆转。尽管我们的分析表明,鉴于硬件的现有趋势,混合SSL和FL语音系统要等到2027年才能可行。我们认为,这项研究可以成为加速工作以提早达到这一里程碑的路线图。
The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of speech representations. In this paper, we provide a first-of-its-kind systematic study of the feasibility and complexities for training speech SSL models under FL scenarios from the perspective of algorithms, hardware, and systems limits. Despite the high potential of their combination, we find existing system constraints and algorithmic behaviour make SSL and FL systems nearly impossible to build today. Yet critically, our results indicate specific performance bottlenecks and research opportunities that would allow this situation to be reversed. While our analysis suggests that, given existing trends in hardware, hybrid SSL and FL speech systems will not be viable until 2027. We believe this study can act as a roadmap to accelerate work towards reaching this milestone much earlier.