联合的移动NLP的少数学习

论文标题

联合的移动NLP的少数学习

Federated Few-Shot Learning for Mobile NLP

论文作者

Cai, Dongqi, Wang, Shangguang, Wu, Yaozong, Lin, Felix Xiaozhu, Xu, Mengwei

论文摘要

自然语言处理（NLP）看到丰富的移动应用。为了支持各种语言理解任务，基础NLP模型通常在联合，隐私的设置（FL）中进行微调。目前，此过程依靠移动客户的至少数十万个标记的培训样本；然而，移动用户通常缺乏标记其数据的意愿或知识。这种数据标签的这种不足被称为几种镜头。它成为移动NLP应用程序的关键阻挡器。这项工作首次在几场比赛（FEDFSL）中调查了联邦NLP。通过翻新伪标记和及时学习的算法进步，我们首先建立了一条培训管道，当只有0.05％（少于100）的培训数据被标记并且其余标记时，该培训管道可提供有竞争力的准确性。为了实例化工作流程，我们进一步提出了一个系统FES，以新颖的设计来解决高执行成本。（1）课程起搏，将伪标签注入培训工作流程，以与学习进度相称的速度；（2）代表性多样性，一种选择最可学习的数据的机制，仅生成伪标签；（3）共同计划模型的训练深度和层容量。这些设计共同将培训延迟，客户能源和网络流量减少到46.0 $ \ times $，41.2 $ \ times $和3000.0 $ \ times $。通过算法/系统共同设计，FFNLP证明，FL可以适用于大多数训练样本未标记的具有挑战性的设置。

Natural language processing (NLP) sees rich mobile applications. To support various language understanding tasks, a foundation NLP model is often fine-tuned in a federated, privacy-preserving setting (FL). This process currently relies on at least hundreds of thousands of labeled training samples from mobile clients; yet mobile users often lack willingness or knowledge to label their data. Such an inadequacy of data labels is known as a few-shot scenario; it becomes the key blocker for mobile NLP applications. For the first time, this work investigates federated NLP in the few-shot scenario (FedFSL). By retrofitting algorithmic advances of pseudo labeling and prompt learning, we first establish a training pipeline that delivers competitive accuracy when only 0.05% (fewer than 100) of the training data is labeled and the remaining is unlabeled. To instantiate the workflow, we further present a system FeS, addressing the high execution cost with novel designs. (1) Curriculum pacing, which injects pseudo labels to the training workflow at a rate commensurate to the learning progress; (2) Representational diversity, a mechanism for selecting the most learnable data, only for which pseudo labels will be generated; (3) Co-planning of a model's training depth and layer capacity. Together, these designs reduce the training delay, client energy, and network traffic by up to 46.0$\times$, 41.2$\times$ and 3000.0$\times$, respectively. Through algorithm/system co-design, FFNLP demonstrates that FL can apply to challenging settings where most training samples are unlabeled.

下载PDF全文

下载文献需遵守相关版权规定

论文标题