价值：了解NLU中的方言差异

论文标题

价值：了解NLU中的方言差异

VALUE: Understanding Dialect Disparity in NLU

论文作者

Ziems, Caleb, Chen, Jiaao, Harris, Camille, Anderson, Jessica, Yang, Diyi

论文摘要

英语自然语言理解（NLU）系统已经取得了出色的表现，甚至在胶水和超级胶水等基准上表现出色。但是，这些基准仅包含教科书标准美国英语（SAE）。在NLP社区中，其他方言在很大程度上被忽略了。这导致偏见且不平等的NLU系统，仅服务于说话者的子人群。为了了解当前模型的差异并促进了更多的语言功能性NLU系统，我们介绍了白话语言理解评估（Value）基准，这是我们使用一套词汇和形态句法转换规则创建的具有挑战性的胶水变体。在此最初版本（v.1）中，我们为非裔美国人白话英语（AAVE）的11个特征构建规则，并招募流利的AAVE扬声器，以通过参与性设计方式通过语言可接受性判断来验证每个特征转换。实验表明，这些新的方言功能可以导致模型性能下降。要运行转换代码并下载合成和金标准的方言胶水标准，请参见https://github.com/salt-nlp/value

English Natural Language Understanding (NLU) systems have achieved great performances and even outperformed humans on benchmarks like GLUE and SuperGLUE. However, these benchmarks contain only textbook Standard American English (SAE). Other dialects have been largely overlooked in the NLP community. This leads to biased and inequitable NLU systems that serve only a sub-population of speakers. To understand disparities in current models and to facilitate more dialect-competent NLU systems, we introduce the VernAcular Language Understanding Evaluation (VALUE) benchmark, a challenging variant of GLUE that we created with a set of lexical and morphosyntactic transformation rules. In this initial release (V.1), we construct rules for 11 features of African American Vernacular English (AAVE), and we recruit fluent AAVE speakers to validate each feature transformation via linguistic acceptability judgments in a participatory design manner. Experiments show that these new dialectal features can lead to a drop in model performance. To run the transformation code and download both synthetic and gold-standard dialectal GLUE benchmarks, see https://github.com/SALT-NLP/value

下载PDF全文

下载文献需遵守相关版权规定

论文标题