lo-fi：分发微调而无需交流

论文标题

lo-fi：分发微调而无需交流

lo-fi: distributed fine-tuning without communication

论文作者

Wortsman, Mitchell, Gururangan, Suchin, Li, Shen, Farhadi, Ali, Schmidt, Ludwig, Rabbat, Michael, Morcos, Ari S.

论文摘要

微调大型神经网络时，通常使用多个节点并在每个优化步骤中传达梯度。相比之下，我们研究了完全局部的微调，我们将其称为Lo-fi。在Lo-Fi期间，每个节点都在没有任何通信的情况下独立进行微调。然后，在微调结束时将权重跨节点平均。当对Imagenet上的微调DEIT碱基和DEIT-LARGE进行微调时，与基线相比，此过程与分布的准确性相匹配，并提高了分布的准确性，与基线相比，该过程观察到相同数量的数据，但在每个步骤都传达了梯度。我们还观察到Lo-Fi在普通爬网上进行微调OPT语言模型（最高1.3B参数）时与基线的性能相匹配。通过删除沟通要求，Lo-Fi降低了用于微调大型模型的资源障碍，并可以在具有过度通信成本的设置中进行微调。

When fine-tuning large neural networks, it is common to use multiple nodes and to communicate gradients at each optimization step. By contrast, we investigate completely local fine-tuning, which we refer to as lo-fi. During lo-fi, each node is fine-tuned independently without any communication. Then, the weights are averaged across nodes at the conclusion of fine-tuning. When fine-tuning DeiT-base and DeiT-large on ImageNet, this procedure matches accuracy in-distribution and improves accuracy under distribution shift compared to the baseline, which observes the same amount of data but communicates gradients at each step. We also observe that lo-fi matches the baseline's performance when fine-tuning OPT language models (up to 1.3B parameters) on Common Crawl. By removing the communication requirement, lo-fi reduces resource barriers for fine-tuning large models and enables fine-tuning in settings with prohibitive communication cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题