自助服务高风险决策的可解释的机器学习

论文标题

自助服务高风险决策的可解释的机器学习

Interpretable Machine Learning for Self-Service High-Risk Decision-Making

论文作者

Recaido, Charles, Kovalerchuk, Boris

论文摘要

本文通过一般线坐标（GLC）中的视觉知识发现有助于解释的机器学习。将超块作为可解释的数据集单元和一般线坐标的概念组合在一起，以创建视觉自助机器学习模型。提出了DSC1和DSC2无损多维坐标系。 DSC1和DSC2可以使用图形构造算法将多个数据集属性映射到单个二维（X，Y）笛卡尔平面。 Hyperblock分析用于确定视觉吸引力的数据集属性订单并减少线阻塞。结果表明，超块可以概括决策树规则，一系列DSC1或DSC2图可以可视化决策树。从UCI ML存储库中测试了DSC1和DSC2图。他们允许对数据进行视觉分类。此外，发现了超块杂质的区域，并用于建立数据集拆分，以突出显示最坏情况模型精度的上限，以指导模型选择高风险决策。 DSC1和DSC2的主要好处是它们高度易于解释的本质。它们允许域专家通过视觉模式发现控制或建立新的机器学习模型。

This paper contributes to interpretable machine learning via visual knowledge discovery in general line coordinates (GLC). The concepts of hyperblocks as interpretable dataset units and general line coordinates are combined to create a visual self-service machine learning model. The DSC1 and DSC2 lossless multidimensional coordinate systems are proposed. DSC1 and DSC2 can map multiple dataset attributes to a single two-dimensional (X, Y) Cartesian plane using a graph construction algorithm. The hyperblock analysis was used to determine visually appealing dataset attribute orders and to reduce line occlusion. It is shown that hyperblocks can generalize decision tree rules and a series of DSC1 or DSC2 plots can visualize a decision tree. The DSC1 and DSC2 plots were tested on benchmark datasets from the UCI ML repository. They allowed for visual classification of data. Additionally, areas of hyperblock impurity were discovered and used to establish dataset splits that highlight the upper estimate of worst-case model accuracy to guide model selection for high-risk decision-making. Major benefits of DSC1 and DSC2 is their highly interpretable nature. They allow domain experts to control or establish new machine learning models through visual pattern discovery.

下载PDF全文

下载文献需遵守相关版权规定

论文标题