论文标题

通用和独立:用于详尽模型解释和评估的多语言探测框架

Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation

论文作者

Serikov, Oleg, Protasov, Vitaly, Voloshina, Ekaterina, Knyazkova, Viktoria, Shavrina, Tatiana

论文摘要

语言模型的语言分析是解释和描述其推理,弱点和局限性的方法之一。在模型可解释性研究的探测部分中,研究涉及单个语言以及单个语言结构。出现的问题是:检测到的规律性是语言相干的,还是相反,它们是在类型学规模上不和谐的吗?此外,大多数研究涉及固有的语言和语言结构集,从而使实际的类型学多样性知识置于范围之外。在本文中,我们介绍并应用了GUI辅助框架,使我们可以轻松地探究通用依赖性数据中存在的所有形态句法特征的大量语言。我们表明,反映了过去几年中NLP中以盎格鲁为中心的趋势,Mbert模型中揭示的大多数规律是西方欧洲语言的典型特征。我们的框架可以与现有的探测工具箱,型号卡和排行榜集成在一起,从而使从业者可以使用并共享其标准探测方法来解释多语言模型。因此,我们提出了一个工具包,以系统化多语言模型中的多语言缺陷,为104种语言和80个形态句法特征提供可重现的实验设置。 https://github.com/airi-institute/probing_framework

Linguistic analysis of language models is one of the ways to explain and describe their reasoning, weaknesses, and limitations. In the probing part of the model interpretability research, studies concern individual languages as well as individual linguistic structures. The question arises: are the detected regularities linguistically coherent, or on the contrary, do they dissonate at the typological scale? Moreover, the majority of studies address the inherent set of languages and linguistic structures, leaving the actual typological diversity knowledge out of scope. In this paper, we present and apply the GUI-assisted framework allowing us to easily probe a massive number of languages for all the morphosyntactic features present in the Universal Dependencies data. We show that reflecting the anglo-centric trend in NLP over the past years, most of the regularities revealed in the mBERT model are typical for the western-European languages. Our framework can be integrated with the existing probing toolboxes, model cards, and leaderboards, allowing practitioners to use and share their standard probing methods to interpret multilingual models. Thus we propose a toolkit to systematize the multilingual flaws in multilingual models, providing a reproducible experimental setup for 104 languages and 80 morphosyntactic features. https://github.com/AIRI-Institute/Probing_framework

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源