单独征服的启发式允许在分类，回归和生存数据中强大的挖掘对比度集

论文标题

单独征服的启发式允许在分类，回归和生存数据中强大的挖掘对比度集

Separate and conquer heuristic allows robust mining of contrast sets in classification, regression, and survival data

论文作者

Gudyś, Adam, Sikora, Marek, Wróbel, Łukasz

论文摘要

确定群体之间的差异是最重要的知识发现问题之一。该程序（也称为对比度集开采）在医学，工业或经济学等广泛领域应用。在论文中，我们介绍了Rulekit -CS，这是一种基于单独和征服的对比度开采的算法，这是决策规则诱导的良好启发式启发式启发式。伴随属性惩罚方案的多个通行证提供了对比集，描述了具有不同属性的相同示例，从而将提出的方法与标准的单独和征服区分开。该算法也被概括用于回归和生存数据，允许鉴定其标记属性/生存预后的对比集与预定义对比组的标签/预后一致。此功能不是现有方法提供的，进一步扩展了Rulekit-CS的可用性。对来自各个领域的130多个数据集的实验以及对选定案例的详细分析确认的Rulekit-CS是发现定义组之间差异的有用工具。该算法是根据GNU AGPL 3许可（https://github.com/adaa-polsl/rulekit）在GitHub上获得的Rulekit Suite的一部分。关键字：对比度，分开和征服，回归，生存

Identifying differences between groups is one of the most important knowledge discovery problems. The procedure, also known as contrast sets mining, is applied in a wide range of areas like medicine, industry, or economics. In the paper we present RuleKit-CS, an algorithm for contrast set mining based on separate and conquer - a well established heuristic for decision rule induction. Multiple passes accompanied with an attribute penalization scheme provide contrast sets describing same examples with different attributes, distinguishing presented approach from the standard separate and conquer. The algorithm was also generalized for regression and survival data allowing identification of contrast sets whose label attribute/survival prognosis is consistent with the label/prognosis for the predefined contrast groups. This feature, not provided by the existing approaches, further extends the usability of RuleKit-CS. Experiments on over 130 data sets from various areas and detailed analysis of selected cases confirmed RuleKit-CS to be a useful tool for discovering differences between defined groups. The algorithm was implemented as a part of the RuleKit suite available at GitHub under GNU AGPL 3 licence (https://github.com/adaa-polsl/RuleKit). Keywords: contrast sets, separate and conquer, regression, survival

下载PDF全文

下载文献需遵守相关版权规定

论文标题