在大规模智能手机数据上表征异质性在联邦学习中的影响

论文标题

在大规模智能手机数据上表征异质性在联邦学习中的影响

Characterizing Impacts of Heterogeneity in Federated Learning upon Large-Scale Smartphone Data

论文作者

Yang, Chengxu, Wang, Qipeng, Xu, Mengwei, Chen, Zhenpeng, Bian, Kaigui, Liu, Yunxin, Liu, Xuanzhe

论文摘要

联邦学习（FL）是一种新兴的，保存隐私的机器学习范式，在学术界和工业中都引起了极大的关注。 FL的独特特征是异质性，它位于参与设备的各种硬件规格和动态状态。从理论上讲，异质性可以对FL培训过程产生巨大影响，例如，导致无法培训的设备或无法上传其模型更新。不幸的是，这些影响从未在现有的FL文献中进行系统地研究和量化。在本文中，我们进行了第一项实证研究，以表征FL中异质性的影响。我们从136K智能手机中收集大规模数据，这些数据可以忠实地反映现实世界中的异质性。我们还建立了一个异质性 - 感知的FL平台，该平台符合标准FL协议，但考虑了异质性。基于数据和平台，我们进行了广泛的实验，以比较异质性 - 感知和异质性 - 诺瓦尔环境下最先进的FL算法的性能。结果表明，异质性会导致FL的非平凡性能降解，包括高达9.2％的精度下降，2.32倍延长训练时间和破坏的公平性。此外，我们分析了潜在的影响因素，并发现设备故障和参与者偏见是性能降解的两个潜在因素。我们的研究为FL从业者带来了洞察力。一方面，我们的发现表明，FL算法设计师在评估过程中考虑了必要的异质性。另一方面，我们的发现敦促系统提供商设计特定的机制来减轻异质性的影响。

Federated learning (FL) is an emerging, privacy-preserving machine learning paradigm, drawing tremendous attention in both academia and industry. A unique characteristic of FL is heterogeneity, which resides in the various hardware specifications and dynamic states across the participating devices. Theoretically, heterogeneity can exert a huge influence on the FL training process, e.g., causing a device unavailable for training or unable to upload its model updates. Unfortunately, these impacts have never been systematically studied and quantified in existing FL literature. In this paper, we carry out the first empirical study to characterize the impacts of heterogeneity in FL. We collect large-scale data from 136k smartphones that can faithfully reflect heterogeneity in real-world settings. We also build a heterogeneity-aware FL platform that complies with the standard FL protocol but with heterogeneity in consideration. Based on the data and the platform, we conduct extensive experiments to compare the performance of state-of-the-art FL algorithms under heterogeneity-aware and heterogeneity-unaware settings. Results show that heterogeneity causes non-trivial performance degradation in FL, including up to 9.2% accuracy drop, 2.32x lengthened training time, and undermined fairness. Furthermore, we analyze potential impact factors and find that device failure and participant bias are two potential factors for performance degradation. Our study provides insightful implications for FL practitioners. On the one hand, our findings suggest that FL algorithm designers consider necessary heterogeneity during the evaluation. On the other hand, our findings urge system providers to design specific mechanisms to mitigate the impacts of heterogeneity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题