论文标题
相同的分数:改进的单词嵌入基于余弦的偏差分数
The SAME score: Improved cosine based bias score for word embeddings
论文作者
论文摘要
随着大语言模型的富裕程度,许多研究人员提出了有关此类模型中纳入的社会偏见的道德问题。已经引入了几种衡量社会偏见的方法,但是显然,这些方法并不一定就偏见的存在或严重性一致。此外,某些著作显示了一些偏见措施的理论问题或严重局限性。因此,我们引入了同样的嵌入语义偏见的新颖偏见分数。与文献的类似偏差分数相比,我们进行了彻底的理论分析以及实验以显示其益处。我们进一步强调了通过与下游偏见测量的语义偏差的实质性关系,该联系最近被认为可以忽略不计。取而代之的是,我们表明同样能够衡量语义偏见并确定下游任务中社会偏见的潜在原因。
With the enourmous popularity of large language models, many researchers have raised ethical concerns regarding social biases incorporated in such models. Several methods to measure social bias have been introduced, but apparently these methods do not necessarily agree regarding the presence or severity of bias. Furthermore, some works have shown theoretical issues or severe limitations with certain bias measures. For that reason, we introduce SAME, a novel bias score for semantic bias in embeddings. We conduct a thorough theoretical analysis as well as experiments to show its benefits compared to similar bias scores from the literature. We further highlight a substantial relation of semantic bias measured by SAME with downstream bias, a connection that has recently been argued to be negligible. Instead, we show that SAME is capable of measuring semantic bias and identify potential causes for social bias in downstream tasks.