可解释的混合数据表示和无损可视化工具包用于知识发现

论文标题

可解释的混合数据表示和无损可视化工具包用于知识发现

Explainable Mixed Data Representation and Lossless Visualization Toolkit for Knowledge Discovery

论文作者

Kovalerchuk, Boris, McCoy, Elijah

论文摘要

开发用于异质/混合数据的机器学习（ML）算法是一个长期的问题。许多ML算法不适用于混合数据，其中包括数字数据和非数字数据，文本，图形等以生成可解释的模型。另一个长期存在的问题是开发用于多维混合数据无损可视化的算法。 ML的进一步进展在很大程度上取决于可解释的ML算法的混合数据和多维数据的无限解释可视化。后来允许使用最终用户发现视觉知识发现的可解释的ML模型，他们可以带来培训数据中没有的宝贵领域知识。混合数据的挑战包括：（1）生成数字编码方案，用于数字ML算法的非数字属性，以提供准确且可解释的ML模型，（2）生成这些可视化中N-D非数字数据和视觉规则发现的无损可视化的方法。本文介绍了混合数据类型的分类，分析了它们对ML的重要性，并提出了开发的实验工具包来处理混合数据。它结合了GitHub上可用的数据类型编辑器，Viscanvas数据可视化和规则发现系统。

Developing Machine Learning (ML) algorithms for heterogeneous/mixed data is a longstanding problem. Many ML algorithms are not applicable to mixed data, which include numeric and non-numeric data, text, graphs and so on to generate interpretable models. Another longstanding problem is developing algorithms for lossless visualization of multidimensional mixed data. The further progress in ML heavily depends on success interpretable ML algorithms for mixed data and lossless interpretable visualization of multidimensional data. The later allows developing interpretable ML models using visual knowledge discovery by end-users, who can bring valuable domain knowledge which is absent in the training data. The challenges for mixed data include: (1) generating numeric coding schemes for non-numeric attributes for numeric ML algorithms to provide accurate and interpretable ML models, (2) generating methods for lossless visualization of n-D non-numeric data and visual rule discovery in these visualizations. This paper presents a classification of mixed data types, analyzes their importance for ML and present the developed experimental toolkit to deal with mixed data. It combines the Data Types Editor, VisCanvas data visualization and rule discovery system which is available on GitHub.

下载PDF全文

下载文献需遵守相关版权规定

论文标题