论文标题
通过推理时间自适应优化在语言生成中统一的排毒和偏见
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
论文作者
论文摘要
警告:本文包含表现出进攻性和偏见的模型输出。最近,预先训练的语言模型(PLM)在各种自然语言生成(NLG)任务中蓬勃发展,因为它们能够产生流利的文本。然而,观察到这些模型可以捕获和繁殖培训语料库中的有害内容,通常是有毒语言和社会偏见,从而引发了严重的道德问题。先前的关于道德NLG解决排毒和偏见的作品,这是有问题的,因为我们发现伪造模型仍然表现出毒性,而排毒甚至加剧了社会偏见。为了应对这一挑战,我们提出了名为Uddia的第一个统一的排毒和辩论框架,该框架共同将这两个问题正式形式化为纠正了输出空间。从理论上讲,我们将我们的框架解释为学习文本分布混合加权属性。此外,UDDIA基于参数有效的调整模式在解码过程中仅进行少数参数的自适应优化,而无需任何培训数据。这会导致最小的生成质量损失和可接受的计算成本改善的调整性能。实验结果表明,与几种强大的基线相比,乌迪亚(Uddia)同时实现了偏见和排毒,并更好地平衡效率和有效性,迈向实用的伦理NLG。
Warning: this paper contains model outputs exhibiting offensiveness and biases. Recently pre-trained language models (PLMs) have prospered in various natural language generation (NLG) tasks due to their ability to generate fairly fluent text. Nevertheless, these models are observed to capture and reproduce harmful contents in training corpora, typically toxic language and social biases, raising severe moral issues. Prior works on ethical NLG tackle detoxifying and debiasing separately, which is problematic since we find debiased models still exhibit toxicity while detoxified ones even exacerbate social biases. To address such a challenge, we propose the first unified framework of detoxifying and debiasing called UDDIA, which jointly formalizes these two problems as rectifying the output space. We theoretically interpret our framework as learning a text distribution mixing weighted attributes. Besides, UDDIA conducts adaptive optimization of only a few parameters during decoding based on a parameter-efficient tuning schema without any training data. This leads to minimal generation quality loss and improved rectification performance with acceptable computational cost. Experimental results demonstrate that compared to several strong baselines, UDDIA achieves debiasing and detoxifying simultaneously and better balances efficiency and effectiveness, taking a further step towards practical ethical NLG.