论文标题
大量多语言自然语言理解2022(MMNLU-22)研讨会和竞争
The Massively Multilingual Natural Language Understanding 2022 (MMNLU-22) Workshop and Competition
论文作者
论文摘要
尽管自然语言理解(NLU)最近取得了进展,但多语言NLU系统的创建仍然是一个挑战。由于缺乏可用的数据,将NLU系统限制在一部分语言中是很常见的。它们的性能通常也很大。我们启动了三相方法来解决NLU的局限性,并帮助NLU技术推向新的高度。我们发布了一个52个语言数据集,称为多语言Amazon SLU资源包(SLURP),用于填充插槽,意图分类和虚拟助手评估或大规模评估,以解决语音助手的并行数据可用性。我们组织了大规模的多语言NLU 2022挑战,以提供竞争环境,并将模型的最新能力推向其他语言。最后,我们举办了第一个大型多语言NLU研讨会,将这些组件融合在一起。 MMNLU研讨会旨在通过为在现场展示新研究的平台并连接在这一研究方向上工作的团队,以推动多语言NLU背后的科学。本文总结了数据集,研讨会,竞争以及每个阶段的发现。
Despite recent progress in Natural Language Understanding (NLU), the creation of multilingual NLU systems remains a challenge. It is common to have NLU systems limited to a subset of languages due to lack of available data. They also often vary widely in performance. We launch a three-phase approach to address the limitations in NLU and help propel NLU technology to new heights. We release a 52 language dataset called the Multilingual Amazon SLU resource package (SLURP) for Slot-filling, Intent classification, and Virtual assistant Evaluation, or MASSIVE, in an effort to address parallel data availability for voice assistants. We organize the Massively Multilingual NLU 2022 Challenge to provide a competitive environment and push the state-of-the art in the transferability of models into other languages. Finally, we host the first Massively Multilingual NLU workshop which brings these components together. The MMNLU workshop seeks to advance the science behind multilingual NLU by providing a platform for the presentation of new research in the field and connecting teams working on this research direction. This paper summarizes the dataset, workshop and the competition and the findings of each phase.