论文标题
Valnorm量化语义,以揭示语言和数百年来的一致的价偏见
ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across Languages and Over Centuries
论文作者
论文摘要
单词嵌入式从单词共发生统计信息捕获的语言规律性中学习隐式偏见。通过扩展量化单词嵌入中人类偏见的方法,我们引入了Valnorm,这是一种新颖的内在评估任务和方法,用于量化社会心理学中人为单词集中影响的价值维度。我们将valnorm应用于七种语言(中文,英语,德语,波兰语,葡萄牙语,西班牙语和土耳其语)的静态单词嵌入,以及跨越200年的历史英语文本。 Valnorm在量化非歧视性,非社会群体单词集的价始终具有高度的精度。具体而言,瓦尔诺姆(Valnorm)对于人类的判断得分r = 0.88达到了r = 0.88的相关性。相比之下,我们使用相同的单词嵌入来衡量性别刻板印象,发现社会偏见因语言而异。我们的结果表明,非歧视性的,非社会群体单词的价值是用七种语言和200多年的广泛共享的关联。
Word embeddings learn implicit biases from linguistic regularities captured by word co-occurrence statistics. By extending methods that quantify human-like biases in word embeddings, we introduceValNorm, a novel intrinsic evaluation task and method to quantify the valence dimension of affect in human-rated word sets from social psychology. We apply ValNorm on static word embeddings from seven languages (Chinese, English, German, Polish, Portuguese, Spanish, and Turkish) and from historical English text spanning 200 years. ValNorm achieves consistently high accuracy in quantifying the valence of non-discriminatory, non-social group word sets. Specifically, ValNorm achieves a Pearson correlation of r=0.88 for human judgment scores of valence for 399 words collected to establish pleasantness norms in English. In contrast, we measure gender stereotypes using the same set of word embeddings and find that social biases vary across languages. Our results indicate that valence associations of non-discriminatory, non-social group words represent widely-shared associations, in seven languages and over 200 years.