应用联合学习：在隐私意识设置中用于稳健有效学习的建筑设计

论文标题

应用联合学习：在隐私意识设置中用于稳健有效学习的建筑设计

Applied Federated Learning: Architectural Design for Robust and Efficient Learning in Privacy Aware Settings

论文作者

Stojkovic, Branislav, Woodbridge, Jonathan, Fang, Zhihan, Cai, Jerry, Petrov, Andrey, Iyer, Sathya, Huang, Daoyu, Yau, Patrick, Kumar, Arvind Sastha, Jawa, Hitesh, Guha, Anamita

论文摘要

经典的机器学习范式需要在中心位置汇总用户数据，在该位置，机器学习从业者可以预处理数据，计算功能，调整模型并评估性能。这种方法的优点包括利用高性能硬件（例如GPU）以及机器学习实践者在深度数据分析中进行的能力以提高模型性能。但是，这些优势可能是为了支付数据隐私的费用。收集，汇总和存储在集中式服务器上以进行模型开发。数据集中构成风险，包括内部和外部安全事件的风险增加以及意外数据滥用。具有不同隐私的联合学习旨在通过将ML学习步骤带给用户设备来避免服务器端集中化陷阱。学习是以联合方式完成的，每个移动设备都会在模型的本地副本上运行训练循环。来自设备模型的更新通过加密通信和通过差异隐私发送到服务器，以改善全局模型。在此范式中，用户的个人数据仍在其设备上。出乎意料的是，以这种方式进行模型培训是模型性能的相当最小的退化。但是，由于其分布式性质，异质计算环境以及缺乏数据可见性，联邦学习带来了许多其他挑战。本文探讨了这些挑战，并概述了我们正在探索和测试的建筑设计解决方案，以在元评估中生产联合学习。

The classical machine learning paradigm requires the aggregation of user data in a central location where machine learning practitioners can preprocess data, calculate features, tune models and evaluate performance. The advantage of this approach includes leveraging high performance hardware (such as GPUs) and the ability of machine learning practitioners to do in depth data analysis to improve model performance. However, these advantages may come at a cost to data privacy. User data is collected, aggregated, and stored on centralized servers for model development. Centralization of data poses risks, including a heightened risk of internal and external security incidents as well as accidental data misuse. Federated learning with differential privacy is designed to avoid the server-side centralization pitfall by bringing the ML learning step to users' devices. Learning is done in a federated manner where each mobile device runs a training loop on a local copy of a model. Updates from on-device models are sent to the server via encrypted communication and through differential privacy to improve the global model. In this paradigm, users' personal data remains on their devices. Surprisingly, model training in this manner comes at a fairly minimal degradation in model performance. However, federated learning comes with many other challenges due to its distributed nature, heterogeneous compute environments and lack of data visibility. This paper explores those challenges and outlines an architectural design solution we are exploring and testing to productionize federated learning at Meta scale.

下载PDF全文

下载文献需遵守相关版权规定

论文标题