论文标题

我们真的可以信任您多少?迈向深层神经网络的简单,可解释的信任量化指标

How Much Can We Really Trust You? Towards Simple, Interpretable Trust Quantification Metrics for Deep Neural Networks

论文作者

Wong, Alexander, Wang, Xiao Yu, Hryniowski, Andrew

论文摘要

建立值得信赖的深神经网络的关键步骤是信任量化,我们在其中提出一个问题:我们可以信任深层神经网络?在这项研究中,我们通过引入一套指标来朝着简单,可解释的指标迈向信任量化的指标,以根据其回答一系列问题时的行为来评估深神经网络的整体可信度。我们进行了一个思想实验,并探讨了有关信任有关的两个关键问题:1)我们对有多大信心给出错误答案的演员有多少信任? 2)我们对犹豫地给出正确答案的演员有多少信任?基于获得的见解,我们介绍了问题答案信任的概念,以根据正确和错误的答案场景中的自信行为来量化个人答案的可信度,以及信任密度的概念,以表征个人答案场景的整体信任分布。我们进一步介绍了信任范围的概念,用于代表整体信任,以正确且错误地回答问题的各种答案方案。最后,我们介绍了NetTrustScore,这是标量指标,总结了整体可信赖性。指标套件与过去研究信任与信心之间的关系的过去社会心理学研究保持一致。利用这些指标,我们量化了几种著名的深层神经网络体系结构的可信赖性,以使图像识别以更深入地了解信任的分解。拟议的指标绝不是完美的,但希望是将对话推向更好的指标,以帮助指导从业者和监管机构生产,部署和认证可以信任的深度学习解决方案,以在现实世界中的任务临界场景中运作。

A critical step to building trustworthy deep neural networks is trust quantification, where we ask the question: How much can we trust a deep neural network? In this study, we take a step towards simple, interpretable metrics for trust quantification by introducing a suite of metrics for assessing the overall trustworthiness of deep neural networks based on their behaviour when answering a set of questions. We conduct a thought experiment and explore two key questions about trust in relation to confidence: 1) How much trust do we have in actors who give wrong answers with great confidence? and 2) How much trust do we have in actors who give right answers hesitantly? Based on insights gained, we introduce the concept of question-answer trust to quantify trustworthiness of an individual answer based on confident behaviour under correct and incorrect answer scenarios, and the concept of trust density to characterize the distribution of overall trust for an individual answer scenario. We further introduce the concept of trust spectrum for representing overall trust with respect to the spectrum of possible answer scenarios across correctly and incorrectly answered questions. Finally, we introduce NetTrustScore, a scalar metric summarizing overall trustworthiness. The suite of metrics aligns with past social psychology studies that study the relationship between trust and confidence. Leveraging these metrics, we quantify the trustworthiness of several well-known deep neural network architectures for image recognition to get a deeper understanding of where trust breaks down. The proposed metrics are by no means perfect, but the hope is to push the conversation towards better metrics to help guide practitioners and regulators in producing, deploying, and certifying deep learning solutions that can be trusted to operate in real-world, mission-critical scenarios.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源