芬兰语言建模与深度变压器模型

论文标题

芬兰语言建模与深度变压器模型

Finnish Language Modeling with Deep Transformer Models

论文作者

Jain, Abhilash, Ruohe, Aku, Grönroos, Stig-Arne, Kurimo, Mikko

论文摘要

在长期以来，变形金刚在LSTM被认为是主要模型体系结构之后，在语言建模上占据了中心地位。在此项目中，我们研究了对于语言建模任务的变压器体系结构 - 伯特和变形金刚-XL的性能。我们使用芬兰语言的子字模型设置，并将其与先前的艺术状态（SOTA）LSTM模型进行比较。伯特（Bert）的伪透明度得分为14.5，据我们所知，这是第一个达到的措施。 Transformer-XL的困惑得分提高到73.58，比LSTM模型好27 \％。

Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) LSTM model. BERT achieves a pseudo-perplexity score of 14.5, which is the first such measure achieved as far as we know. Transformer-XL improves upon the perplexity score to 73.58 which is 27\% better than the LSTM model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题