论文标题
什么[面具]?理解特定语言的BERT模型
What the [MASK]? Making Sense of Language-Specific BERT Models
论文作者
论文摘要
最近,由于新颖的,鉴定的上下文表示模型的出现,自然语言处理(NLP)在许多领域都取得了令人印象深刻的进步。特别是Devlin等。 (2019年)提出了一个称为BERT(Transformers的双向编码器表示)的模型,该模型使研究人员能够通过对其数据集和任务的表示形式进行微调,而无需开发和培训高度特定的体系结构,从而可以通过对其数据集和任务进行微调来获得众多NLP任务的最先进。作者还发布了多种语言Bert(Mbert),该模型是通过104种语言训练的模型,可以用作通用语言模型。该模型在零拍的跨语性自然推理任务上获得了令人印象深刻的结果。在BERT模型的潜力驱动下,NLP社区已开始研究并生成大量以特定语言培训的BERT模型,并在特定的数据域和任务上进行了测试。这使我们能够通过比较这些更具体的模型的性能来评估Mbert作为通用语言模型的真正潜力。本文以特定于语言的BERT模型介绍了当前的艺术状态,从而提供了有关不同维度(即体系结构,数据域和任务)的整体情况。我们的目的是立即概述语言(特定语言)BERT模型与Mbert之间的共同点和差异。我们还提供了一个交互式且不断更新的网站,可用于探索我们收集的信息,网址为https://bertlang.unibocconi.it。
Recently, Natural Language Processing (NLP) has witnessed an impressive progress in many areas, due to the advent of novel, pretrained contextual representation models. In particular, Devlin et al. (2019) proposed a model, called BERT (Bidirectional Encoder Representations from Transformers), which enables researchers to obtain state-of-the art performance on numerous NLP tasks by fine-tuning the representations on their data set and task, without the need for developing and training highly-specific architectures. The authors also released multilingual BERT (mBERT), a model trained on a corpus of 104 languages, which can serve as a universal language model. This model obtained impressive results on a zero-shot cross-lingual natural inference task. Driven by the potential of BERT models, the NLP community has started to investigate and generate an abundant number of BERT models that are trained on a particular language, and tested on a specific data domain and task. This allows us to evaluate the true potential of mBERT as a universal language model, by comparing it to the performance of these more specific models. This paper presents the current state of the art in language-specific BERT models, providing an overall picture with respect to different dimensions (i.e. architectures, data domains, and tasks). Our aim is to provide an immediate and straightforward overview of the commonalities and differences between Language-Specific (language-specific) BERT models and mBERT. We also provide an interactive and constantly updated website that can be used to explore the information we have collected, at https://bertlang.unibocconi.it.