论文标题
芬兰语言建模与深度变压器模型
Finnish Language Modeling with Deep Transformer Models
论文作者
论文摘要
在长期以来,变形金刚在LSTM被认为是主要模型体系结构之后,在语言建模上占据了中心地位。在此项目中,我们研究了对于语言建模任务的变压器体系结构 - 伯特和变形金刚-XL的性能。我们使用芬兰语言的子字模型设置,并将其与先前的艺术状态(SOTA)LSTM模型进行比较。伯特(Bert)的伪透明度得分为14.5,据我们所知,这是第一个达到的措施。 Transformer-XL的困惑得分提高到73.58,比LSTM模型好27 \%。
Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.