论文标题

相互信息标准的主动特征选择

Active Feature Selection for the Mutual Information Criterion

论文作者

Schnapp, Shachar, Sabato, Sivan

论文摘要

我们研究主动特征选择,这是一个新的功能选择设置,其中可用的数据可用,但是标签的预算有限,并且可以通过算法积极选择标签的示例。我们专注于使用经典互助标准的功能选择,该标准选择具有标签的最大互信息的$ K $特征。在主动功能选择设置中,目标是使用比数据集大小的标签要少得多,并且仍然找到$ k $功能,其基于\ emph {整个}数据集的标签的共同信息很大。我们解释并实验研究了我们在算法中做出的选择,并表明与其他更幼稚的方法相比,它们导致了成功的算法。我们的设计借鉴了将主动特征选择问题与纯探索多臂匪徒设置的研究相关的见解。尽管我们在这里关注相互信息,但我们的一般方法也可以适应其他功能质量措施。该代码可通过以下URL提供:https://github.com/shacharschnapp/activefeatureselection。

We study active feature selection, a novel feature selection setting in which unlabeled data is available, but the budget for labels is limited, and the examples to label can be actively selected by the algorithm. We focus on feature selection using the classical mutual information criterion, which selects the $k$ features with the largest mutual information with the label. In the active feature selection setting, the goal is to use significantly fewer labels than the data set size and still find $k$ features whose mutual information with the label based on the \emph{entire} data set is large. We explain and experimentally study the choices that we make in the algorithm, and show that they lead to a successful algorithm, compared to other more naive approaches. Our design draws on insights which relate the problem of active feature selection to the study of pure-exploration multi-armed bandits settings. While we focus here on mutual information, our general methodology can be adapted to other feature-quality measures as well. The code is available at the following url: https://github.com/ShacharSchnapp/ActiveFeatureSelection.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源