LLMS中的因果测试性别偏见：关于职业偏见的案例研究

论文标题

LLMS中的因果测试性别偏见：关于职业偏见的案例研究

Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

论文作者

Chen, Yuen, Raghuram, Vethavikashini Chithrra, Mattern, Justus, Mihalcea, Rada, Jin, Zhijing

论文摘要

已经显示出来自大语言模型（LLM）的生成文本表现出各种有害的人类偏见，以针对各种人口统计数据。这些发现激发了旨在了解和衡量这种影响的研究工作。本文介绍了生成语言模型中偏见测量的因果公式。基于这个理论基础，我们概述了设计稳健偏置基准的Desiderata列表。然后，我们提出了一个名为Occugender的基准，并采用了偏见测量的程序来调查职业性别偏见。我们在Occugender上测试了几个最先进的开源LLM，包括Llama，Mistral及其指令调整版本。结果表明，这些模型表现出很大的职业性别偏见。最后，我们讨论了提示缓解偏见的策略，并扩展了我们的因果关系，以说明我们的框架的普遍性。我们的代码和数据https://github.com/chenyuen0103/gender-bias。

Generated texts from large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics. These findings motivate research efforts aiming to understand and measure such effects. This paper introduces a causal formulation for bias measurement in generative language models. Based on this theoretical foundation, we outline a list of desiderata for designing robust bias benchmarks. We then propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. We test several state-of-the-art open-source LLMs on OccuGender, including Llama, Mistral, and their instruction-tuned versions. The results show that these models exhibit substantial occupational gender bias. Lastly, we discuss prompting strategies for bias mitigation and an extension of our causal formulation to illustrate the generalizability of our framework. Our code and data https://github.com/chenyuen0103/gender-bias.

下载PDF全文

下载文献需遵守相关版权规定

论文标题