论文标题

偏置扩增的系统研究

A Systematic Study of Bias Amplification

论文作者

Hall, Melissa, van der Maaten, Laurens, Gustafson, Laura, Jones, Maxwell, Adcock, Aaron

论文摘要

最近的研究表明,通过机器学习模型做出的预测可以扩大培训数据中存在的偏差。当模型放大偏差时,它会根据训练数据统计数据对某些组的预期进行某些预测,以更高的速度进行预测。缓解这种偏见的放大需要对现代机器学习中的力学有深入的了解,从而引起这种扩增。我们对何时以及如何发生偏见进行了第一个系统,对照研究。为了启用这项研究,我们设计了一个简单的图像分类问题,我们可以在其中紧密控制(合成)偏见。我们对这个问题的研究表明,偏置扩增的强度与诸如模型准确性,模型容量,模型过度自信和训练数据量之类的措施相关。我们还发现,在培训期间,偏差放大可能差异很大。最后,我们发现偏差放大可能取决于分类任务相对于识别小组成员身份的难度的难度:偏见放大似乎主要是在比集体成员更容易识别组成员身份时发生。我们的结果提出了培训机器学习模型的最佳实践,我们希望这将为制定更好的缓解策略铺平道路。可以在https://github.com/facebookresearch/cv_bias_amplification上找到代码。

Recent research suggests that predictions made by machine-learning models can amplify biases present in the training data. When a model amplifies bias, it makes certain predictions at a higher rate for some groups than expected based on training-data statistics. Mitigating such bias amplification requires a deep understanding of the mechanics in modern machine learning that give rise to that amplification. We perform the first systematic, controlled study into when and how bias amplification occurs. To enable this study, we design a simple image-classification problem in which we can tightly control (synthetic) biases. Our study of this problem reveals that the strength of bias amplification is correlated to measures such as model accuracy, model capacity, model overconfidence, and amount of training data. We also find that bias amplification can vary greatly during training. Finally, we find that bias amplification may depend on the difficulty of the classification task relative to the difficulty of recognizing group membership: bias amplification appears to occur primarily when it is easier to recognize group membership than class membership. Our results suggest best practices for training machine-learning models that we hope will help pave the way for the development of better mitigation strategies. Code can be found at https://github.com/facebookresearch/cv_bias_amplification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源