俄罗斯人：俄罗斯语言理解评估基准

论文标题

俄罗斯人：俄罗斯语言理解评估基准

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

论文作者

Shavrina, Tatiana, Fenogenova, Alena, Emelyanov, Anton, Shevelev, Denis, Artemova, Ekaterina, Malykh, Valentin, Mikhailov, Vladislav, Tikhonova, Maria, Chertok, Andrey, Evlampiev, Andrey

论文摘要

在本文中，我们介绍了先进的俄罗斯一般语言理解评估基准-Russianglue。通用语言模型和变形金刚领域的最新进展要求开发一种方法论，以进行广泛的诊断和一般智力技能的测试 - 对自然语言推理，常识性推理的检测，进行简单的逻辑操作的能力，无论文本主题或词典。第一次，与超级劳动方法相似地收集和组织了九项任务的基准是从头开始的俄罗斯语言。我们提供基线，人力水平评估，评估模型的开源框架（https://github.com/russiannlp/russiansuperglue）以及俄罗斯语言变压器模型的整体排行榜。此外，我们提出了在适应的诊断测试集中比较多语言模型的第一个结果，并提供了第一个步骤，以进一步扩展或评估最新模型，独立于语言。

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benchmark of nine tasks, collected and organized analogically to the SuperGLUE methodology, was developed from scratch for the Russian language. We provide baselines, human level evaluation, an open-source framework for evaluating models (https://github.com/RussianNLP/RussianSuperGLUE), and an overall leaderboard of transformer models for the Russian language. Besides, we present the first results of comparing multilingual models in the adapted diagnostic test set and offer the first steps to further expanding or assessing state-of-the-art models independently of language.

下载PDF全文

下载文献需遵守相关版权规定

论文标题