论文标题
训练或不训练对手:对说话者认可的缓解偏见缓解策略的研究
To train or not to train adversarially: A study of bias mitigation strategies for speaker recognition
论文作者
论文摘要
演讲者的识别越来越多地用于几种日常应用中,包括智能扬声器,客户服务中心和其他语音驱动分析。准确评估和减轻基于机器学习(ML)语音技术(例如说话者识别)中存在的偏见以确保其包容性采用至关重要。与其他以人为中心的应用(例如面部识别)相比,有关现代说话者识别系统中各种人口统计学因素的ML公平研究正在落后。现有关于说话者识别系统公平性的研究在很大程度上仅限于评估系统特定操作点的偏见,这可能会导致对公平性的错误期望。此外,对于说话者识别系统,只有少数几种偏见缓解策略。在本文中,我们系统地评估了在一系列系统操作点上,在说话者识别系统中存在有关性别的偏见。我们还提出了对抗性和多任务学习技术,以改善这些系统的公平性。我们通过定量和定性评估表明,所提出的方法改善了使用数据平衡技术训练的基线方法的ASV系统的公平性。我们还提出了公平的实用性权衡分析,以共同检查公平性和整体系统性能。我们表明,尽管使用对抗技术训练的系统改善了公平性,但它们容易降低效用。另一方面,多任务方法可以在保留公用事业的同时提高公平性。这些发现可以告知在说话者认可领域的偏见缓解策略。
Speaker recognition is increasingly used in several everyday applications including smart speakers, customer care centers and other speech-driven analytics. It is crucial to accurately evaluate and mitigate biases present in machine learning (ML) based speech technologies, such as speaker recognition, to ensure their inclusive adoption. ML fairness studies with respect to various demographic factors in modern speaker recognition systems are lagging compared to other human-centered applications such as face recognition. Existing studies on fairness in speaker recognition systems are largely limited to evaluating biases at specific operating points of the systems, which can lead to false expectations of fairness. Moreover, there are only a handful of bias mitigation strategies developed for speaker recognition systems. In this paper, we systematically evaluate the biases present in speaker recognition systems with respect to gender across a range of system operating points. We also propose adversarial and multi-task learning techniques to improve the fairness of these systems. We show through quantitative and qualitative evaluations that the proposed methods improve the fairness of ASV systems over baseline methods trained using data balancing techniques. We also present a fairness-utility trade-off analysis to jointly examine fairness and the overall system performance. We show that although systems trained using adversarial techniques improve fairness, they are prone to reduced utility. On the other hand, multi-task methods can improve the fairness while retaining the utility. These findings can inform the choice of bias mitigation strategies in the field of speaker recognition.