性别偏见和在哪里找到它们：使用动作修剪来探索基于预训练的变压器的语言模型中的性别偏见

论文标题

性别偏见和在哪里找到它们：使用动作修剪来探索基于预训练的变压器的语言模型中的性别偏见

Gender Biases and Where to Find Them: Exploring Gender Bias in Pre-Trained Transformer-based Language Models Using Movement Pruning

论文作者

Joniak, Przemyslaw, Aizawa, Akiko

论文摘要

语言模型偏见已成为NLP社区的重要研究领域。提出了许多辩论技术，但偏见消融仍然是一个未解决的问题。我们展示了一个新颖的框架，用于通过运动修剪来检查预训练的基于变压器的语言模型的偏见。给定模型和一个辩护目标，我们的框架找到了与原始模型相比的模型的子集。我们通过对模型进行修剪来实现我们的框架，同时将其以歧义目标进行微调。优化仅是修剪分数 - 参数以及模型的权重。我们尝试修剪注意力头，这是变压器的重要组成部分：我们修剪正方形块，并建立了一种修剪整个头部的新方法。最后，我们使用性别偏见证明了我们的框架的用法，并且根据我们的发现，我们提出了改进现有的辩论方法。此外，我们重新发现了偏见 - 绩效的权衡：模型的执行越好，其包含的偏见就越大。

Language model debiasing has emerged as an important field of study in the NLP community. Numerous debiasing techniques were proposed, but bias ablation remains an unaddressed issue. We demonstrate a novel framework for inspecting bias in pre-trained transformer-based language models via movement pruning. Given a model and a debiasing objective, our framework finds a subset of the model containing less bias than the original model. We implement our framework by pruning the model while fine-tuning it on the debiasing objective. Optimized are only the pruning scores - parameters coupled with the model's weights that act as gates. We experiment with pruning attention heads, an important building block of transformers: we prune square blocks, as well as establish a new way of pruning the entire heads. Lastly, we demonstrate the usage of our framework using gender bias, and based on our findings, we propose an improvement to an existing debiasing method. Additionally, we re-discover a bias-performance trade-off: the better the model performs, the more bias it contains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题