论文标题

联合的移动NLP的少数学习

Federated Few-Shot Learning for Mobile NLP

论文作者

Cai, Dongqi, Wang, Shangguang, Wu, Yaozong, Lin, Felix Xiaozhu, Xu, Mengwei

论文摘要

自然语言处理(NLP)看到丰富的移动应用。为了支持各种语言理解任务,基础NLP模型通常在联合,隐私的设置(FL)中进行微调。目前,此过程依靠移动客户的至少数十万个标记的培训样本;然而,移动用户通常缺乏标记其数据的意愿或知识。这种数据标签的这种不足被称为几种镜头。它成为移动NLP应用程序的关键阻挡器。 这项工作首次在几场比赛(FEDFSL)中调查了联邦NLP。通过翻新伪标记和及时学习的算法进步,我们首先建立了一条培训管道,当只有0.05%(少于100)的培训数据被标记并且其余标记时,该培训管道可提供有竞争力的准确性。为了实例化工作流程,我们进一步提出了一个系统FES,以新颖的设计来解决高执行成本。 (1)课程起搏,将伪标签注入培训工作流程,以与学习进度相称的速度; (2)代表性多样性,一种选择最可学习的数据的机制,仅生成伪标签; (3)共同计划模型的训练深度和层容量。这些设计共同将培训延迟,客户能源和网络流量减少到46.0 $ \ times $,41.2 $ \ times $和3000.0 $ \ times $。通过算法/系统共同设计,FFNLP证明,FL可以适用于大多数训练样本未标记的具有挑战性的设置。

Natural language processing (NLP) sees rich mobile applications. To support various language understanding tasks, a foundation NLP model is often fine-tuned in a federated, privacy-preserving setting (FL). This process currently relies on at least hundreds of thousands of labeled training samples from mobile clients; yet mobile users often lack willingness or knowledge to label their data. Such an inadequacy of data labels is known as a few-shot scenario; it becomes the key blocker for mobile NLP applications. For the first time, this work investigates federated NLP in the few-shot scenario (FedFSL). By retrofitting algorithmic advances of pseudo labeling and prompt learning, we first establish a training pipeline that delivers competitive accuracy when only 0.05% (fewer than 100) of the training data is labeled and the remaining is unlabeled. To instantiate the workflow, we further present a system FeS, addressing the high execution cost with novel designs. (1) Curriculum pacing, which injects pseudo labels to the training workflow at a rate commensurate to the learning progress; (2) Representational diversity, a mechanism for selecting the most learnable data, only for which pseudo labels will be generated; (3) Co-planning of a model's training depth and layer capacity. Together, these designs reduce the training delay, client energy, and network traffic by up to 46.0$\times$, 41.2$\times$ and 3000.0$\times$, respectively. Through algorithm/system co-design, FFNLP demonstrates that FL can apply to challenging settings where most training samples are unlabeled.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源