论文标题
在巴西基于ML的信用评分中种族偏见机制的实验
An experiment on the mechanisms of racial bias in ML-based credit scoring in Brazil
论文作者
论文摘要
我们剖析了一个具有真实数据的实验信用评分模型,并证明 - 无需访问受保护的属性 - 位置信息的使用如何引入种族偏见。我们借助游戏理论启发的机器学习解释性技术,反事实实验和巴西人口普查数据来分析树梯度增强模型。通过暴露算法的种族偏见,解释了训练有素的机器学习模型内部机制,该实验包括一个有趣的工件,以帮助对机器学习系统中种族偏见的理论理解的努力。如果不访问个人的种族类别,我们将展示使用地理位置定义的群体的分类衡量度量如何携带有关种族偏见的信息。该实验证明了在审核ML模型时无法访问受保护属性的方法和语言的需求,在解决种族问题时考虑区域细节的重要性以及在AI研究社区中普查数据的核心作用。据我们所知,这是第一个记录在巴西ML的信用评分的算法种族偏见的案例,该国是世界第二大黑人人口。
We dissect an experimental credit scoring model developed with real data and demonstrate - without access to protected attributes - how the use of location information introduces racial bias. We analyze the tree gradient boosting model with the aid of a game-theoretic inspired machine learning explainability technique, counterfactual experiments and Brazilian census data. By exposing algorithmic racial bias explaining the trained machine learning model inner mechanisms, this experiment comprises an interesting artifact to aid the endeavor of theoretical understanding of the emergence of racial bias in machine learning systems. Without access to individuals' racial categories, we show how classification parity measures using geographically defined groups could carry information about model racial bias. The experiment testifies to the need for methods and language that do not presuppose access to protected attributes when auditing ML models, the importance of considering regional specifics when addressing racial issues, and the central role of census data in the AI research community. To the best of our knowledge, this is the first documented case of algorithmic racial bias in ML-based credit scoring in Brazil, the country with the second largest Black population in the world.