论文标题

评估指定实体识别的人口偏见

Assessing Demographic Bias in Named Entity Recognition

论文作者

Mishra, Shubhanshu, He, Sijun, Belli, Luca

论文摘要

命名实体识别(NER)通常是从原始文本中产生自动化知识库(KB)的第一步。在这项工作中,我们评估了各种人口统计群体中具有合成生成语料库的各种命名实体识别系统(NER)系统的偏见。我们的分析表明,模型在识别两个数据集的特定人口组中的名称方面表现更好。我们还确定,Debias的嵌入无助于解决此问题。最后,我们观察到基于角色的上下文化单词表示模型,例如Elmo,会导致人口统计学的偏见最少。由于系统排除某些人口统计学的命名实体,我们的工作可以阐明自动化KB生成的潜在偏见。

Named Entity Recognition (NER) is often the first step towards automated Knowledge Base (KB) generation from raw text. In this work, we assess the bias in various Named Entity Recognition (NER) systems for English across different demographic groups with synthetically generated corpora. Our analysis reveals that models perform better at identifying names from specific demographic groups across two datasets. We also identify that debiased embeddings do not help in resolving this issue. Finally, we observe that character-based contextualized word representation models such as ELMo results in the least bias across demographics. Our work can shed light on potential biases in automated KB generation due to systematic exclusion of named entities belonging to certain demographics.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源