论文标题
从现实世界中的联邦学习中学习
Learnings from Federated Learning in the Real world
论文作者
论文摘要
适用于现实世界数据的联合学习(FL)可能会遭受几种特质。这样的特质之一就是跨设备的数据分布。可以分配跨设备的数据,以便有一些“重型设备”,其中有大量数据,而只有许多数据点有许多“光用户”。跨设备的数据也存在异质性。在这项研究中,我们评估了这种特质对使用FL训练的自然语言理解(NLU)模型的影响。我们对从大规模NLU系统获得的数据进行实验,该系统为数千个设备提供服务,并根据每轮FL训练的相互作用数量来表明简单的非均匀设备选择可以提高模型的性能。在连续的时间段内,在连续的FL中进一步扩大了该好处,在此期间,不均匀的采样可以立即使用所有数据迅速赶上FL方法。
Federated Learning (FL) applied to real world data may suffer from several idiosyncrasies. One such idiosyncrasy is the data distribution across devices. Data across devices could be distributed such that there are some "heavy devices" with large amounts of data while there are many "light users" with only a handful of data points. There also exists heterogeneity of data across devices. In this study, we evaluate the impact of such idiosyncrasies on Natural Language Understanding (NLU) models trained using FL. We conduct experiments on data obtained from a large scale NLU system serving thousands of devices and show that simple non-uniform device selection based on the number of interactions at each round of FL training boosts the performance of the model. This benefit is further amplified in continual FL on consecutive time periods, where non-uniform sampling manages to swiftly catch up with FL methods using all data at once.