性别刻板印象在面部表达识别中的影响

论文标题

性别刻板印象在面部表达识别中的影响

Gender Stereotyping Impact in Facial Expression Recognition

论文作者

Dominguez-Catena, Iris, Paternain, Daniel, Galar, Mikel

论文摘要

面部表情识别（FER）使用面部图像来识别用户的情绪状态，从而使人与自主系统之间的互动更紧密。不幸的是，由于图像自然地整合了一些人口统计信息，例如明显的年龄，性别和受试者的种族，因此这些系统容易出现人口偏见问题。近年来，基于机器学习的模型已成为最受欢迎的FER方法。这些模型需要在面部表达图像的大数据集上进行培训，并且它们的概括功能与数据集的特征密切相关。在公开可用的FER数据集中，显而易见的性别表示通常是平衡的，但是它们在单个标签中的表示不是，将社会刻板印象嵌入数据集中并产生危害潜力。尽管到目前为止，这种偏见已经被忽略了，但重要的是要了解它在FER的背景下可能产生的影响。为此，我们使用流行的FER数据集FER+来通过改变某些标签的性别比例来生成具有不同刻板印象偏见的导数数据集。然后，我们继续衡量在这些数据集上为明显的性别群体训练的模型的性能之间的差异。在最坏的偏见条件下，我们观察到在$ 29 \％$的性别之间的某些情绪中的识别方面存在差异。我们的结果还表明，在数据集中刻板印象偏差的安全范围似乎并未在结果模型中产生刻板印象的偏见。我们的发现支持在FER等问题中对公共数据集进行彻底偏见分析的需求，在这种问题中，人口统计学的全球平衡仍然可以隐藏其他损害某些人口统计组的偏见。

Facial Expression Recognition (FER) uses images of faces to identify the emotional state of users, allowing for a closer interaction between humans and autonomous systems. Unfortunately, as the images naturally integrate some demographic information, such as apparent age, gender, and race of the subject, these systems are prone to demographic bias issues. In recent years, machine learning-based models have become the most popular approach to FER. These models require training on large datasets of facial expression images, and their generalization capabilities are strongly related to the characteristics of the dataset. In publicly available FER datasets, apparent gender representation is usually mostly balanced, but their representation in the individual label is not, embedding social stereotypes into the datasets and generating a potential for harm. Although this type of bias has been overlooked so far, it is important to understand the impact it may have in the context of FER. To do so, we use a popular FER dataset, FER+, to generate derivative datasets with different amounts of stereotypical bias by altering the gender proportions of certain labels. We then proceed to measure the discrepancy between the performance of the models trained on these datasets for the apparent gender groups. We observe a discrepancy in the recognition of certain emotions between genders of up to $29 \%$ under the worst bias conditions. Our results also suggest a safety range for stereotypical bias in a dataset that does not appear to produce stereotypical bias in the resulting model. Our findings support the need for a thorough bias analysis of public datasets in problems like FER, where a global balance of demographic representation can still hide other types of bias that harm certain demographic groups.

下载PDF全文

下载文献需遵守相关版权规定

论文标题