论文标题

lo-fi:分发微调而无需交流

lo-fi: distributed fine-tuning without communication

论文作者

Wortsman, Mitchell, Gururangan, Suchin, Li, Shen, Farhadi, Ali, Schmidt, Ludwig, Rabbat, Michael, Morcos, Ari S.

论文摘要

微调大型神经网络时,通常使用多个节点并在每个优化步骤中传达梯度。相比之下,我们研究了完全局部的微调,我们将其称为Lo-fi。在Lo-Fi期间,每个节点都在没有任何通信的情况下独立进行微调。然后,在微调结束时将权重跨节点平均。当对Imagenet上的微调DEIT碱基和DEIT-LARGE进行微调时,与基线相比,此过程与分布的准确性相匹配,并提高了分布的准确性,与基线相比,该过程观察到相同数量的数据,但在每个步骤都传达了梯度。我们还观察到Lo-Fi在普通爬网上进行微调OPT语言模型(最高1.3B参数)时与基线的性能相匹配。通过删除沟通要求,Lo-Fi降低了用于微调大型模型的资源障碍,并可以在具有过度通信成本的设置中进行微调。

When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源