小组知识转移：边缘大型CNN的联合学习

论文标题

小组知识转移：边缘大型CNN的联合学习

Group Knowledge Transfer: Federated Learning of Large CNNs at the Edge

论文作者

He, Chaoyang, Annavaram, Murali, Avestimehr, Salman

论文摘要

众所周知，扩大卷积神经网络（CNN）大小（例如宽度，深度等）可以有效提高模型准确性。但是，大型型号阻碍了对资源受限的边缘设备的培训。例如，即使由于其隐私和保密性，联邦学习（FL）可能会承担边缘节点的计算能力的不适当负担。为了解决边缘设备的资源约束现实，我们将FL重新制定为一种称为FedGkt的群体知识转移培训算法。 FedGKT设计了交替最小化方法的一种变体，以训练边缘节点上的小型CNN，并通过知识蒸馏定期转移其知识到大型服务器端CNN。 FedGKT将几个优势巩固为一个框架：减少对边缘计算的需求，较低的大型CNN的通信带宽和异步训练，同时保持与FedAvg相当的模型精度。我们使用三个不同的数据集（CIFAR-10，CIFAR-100和CINIC-10）及其非I.I.I.D训练基于RESNET-56和RESNET-110设计的CNN。变体。我们的结果表明，FedGKT可以获得比FedAvg获得的可比性甚至更高的精度。更重要的是，FedGKT使边缘培训负担得起。与使用FedAvg的边缘训练相比，FedGKT在边缘设备上需要减少9至17倍的计算能力（FLOP），并且在边缘CNN中需要少54至105倍的参数。我们的源代码在FEDML（https://fedml.ai）上发布。

Scaling up the convolutional neural network (CNN) size (e.g., width, depth, etc.) is known to effectively improve model accuracy. However, the large model size impedes training on resource-constrained edge devices. For instance, federated learning (FL) may place undue burden on the compute capability of edge nodes, even though there is a strong practical need for FL due to its privacy and confidentiality properties. To address the resource-constrained reality of edge devices, we reformulate FL as a group knowledge transfer training algorithm, called FedGKT. FedGKT designs a variant of the alternating minimization approach to train small CNNs on edge nodes and periodically transfer their knowledge by knowledge distillation to a large server-side CNN. FedGKT consolidates several advantages into a single framework: reduced demand for edge computation, lower communication bandwidth for large CNNs, and asynchronous training, all while maintaining model accuracy comparable to FedAvg. We train CNNs designed based on ResNet-56 and ResNet-110 using three distinct datasets (CIFAR-10, CIFAR-100, and CINIC-10) and their non-I.I.D. variants. Our results show that FedGKT can obtain comparable or even slightly higher accuracy than FedAvg. More importantly, FedGKT makes edge training affordable. Compared to the edge training using FedAvg, FedGKT demands 9 to 17 times less computational power (FLOPs) on edge devices and requires 54 to 105 times fewer parameters in the edge CNN. Our source code is released at FedML (https://fedml.ai).

下载PDF全文

下载文献需遵守相关版权规定

论文标题