使用群集描述符实现实践解释性

论文标题

使用群集描述符实现实践解释性

Towards Practical Explainability with Cluster Descriptors

论文作者

Liu, Xiaoyuan, Tyagin, Ilya, Ushijima-Mwesigwa, Hayato, Ghosh, Indradeep, Safro, Ilya

论文摘要

随着机器学习的快速发展，提高其解释性已成为关键的研究目标。我们研究了通过研究群集描述符来使集群更加解释的问题。给定一组对象$ s $，这些对象的聚类$π$以及一组未参与群集算法的标签$ t $。 $ S $中的每个对象都与$ t $的子集有关。目的是为每个群集找到一组代表性的标签集，称为群集描述符，并以这样的约束，即我们发现这些描述符是成对的脱节，并且所有描述符的总尺寸被最小化。通常，这个问题是NP-HARD。我们提出了一个新颖的解释性模型，该模型以一种不促进解释性且未充分区分簇的标签加强了先前的模型。所提出的模型被配制为二次无约束的二进制优化问题，该问题适合于现代优化硬件加速器解决。我们通过实验表明，如何在专门的硬件上求解提出的可解释性模型，以加速组合优化，富士通数字退火器，并使用现实生活中的Twitter和PubMed数据集用于用例。

With the rapid development of machine learning, improving its explainability has become a crucial research goal. We study the problem of making the clusters more explainable by investigating the cluster descriptors. Given a set of objects $S$, a clustering of these objects $π$, and a set of tags $T$ that have not participated in the clustering algorithm. Each object in $S$ is associated with a subset of $T$. The goal is to find a representative set of tags for each cluster, referred to as the cluster descriptors, with the constraint that these descriptors we find are pairwise disjoint, and the total size of all the descriptors is minimized. In general, this problem is NP-hard. We propose a novel explainability model that reinforces the previous models in such a way that tags that do not contribute to explainability and do not sufficiently distinguish between clusters are not added to the optimal descriptors. The proposed model is formulated as a quadratic unconstrained binary optimization problem which makes it suitable for solving on modern optimization hardware accelerators. We experimentally demonstrate how a proposed explainability model can be solved on specialized hardware for accelerating combinatorial optimization, the Fujitsu Digital Annealer, and use real-life Twitter and PubMed datasets for use cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题