论文标题

Foundation Layernorm:将Bert和GPT缩放到1,000层

FoundationLayerNorm: Scaling BERT and GPT to 1,000 Layers

论文作者

Shen, Dezhou

论文摘要

主流BERT/GPT模型仅包含10至20层,几乎没有文献讨论Deep Bert/GPT的培训。本文提出了一种简单而有效的方法来稳定BERT和GPT培训。我们成功地将BERT和GPT扩展到1,000层,这比以前的BERT和GPT更深。所提出的方法基础底层化可以有效地训练深神经网络,并以1000层量表进行验证。

The mainstream BERT/GPT model contains only 10 to 20 layers, and there is little literature to discuss the training of deep BERT/GPT. This paper proposes a simple yet effective method to stabilize BERT and GPT training. We successfully scale up BERT and GPT to 1,000 layers, which is an order of magnitude deeper than previous BERT and GPT. The proposed method FoundationLayerNormalization enables efficient training of deep neural networks and is validated at the 1000-layer scale.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源