使用相似性查询的主动公制学习和分类

论文标题

使用相似性查询的主动公制学习和分类

Active metric learning and classification using similarity queries

论文作者

Nadagouda, Namrata, Xu, Austin, Davenport, Mark A.

论文摘要

主动学习通常用于通过自适应选择最有用的查询来训练标签效率模型。但是，大多数活跃的学习策略旨在学习数据的表示（例如，嵌入或指标学习），或者在数据上（例如，分类）表现良好。但是，许多机器学习任务都涉及表示学习和特定于任务的目标的结合。在此激励的情况下，我们提出了一个新颖的统一查询框架，该框架可以应用于关键组成部分的任何问题，即学习反映相似性的数据的表示形式。我们的方法建立在相似性或最近的邻居（NN）查询的基础上，该查询寻求选择可改善嵌入的样品。查询由参考和一组对象组成，甲骨文选择最相似的对象（即最近）。为了减少征求查询的数量，根据信息理论标准可以适应地选择它们。我们使用各种综合和现实世界数据集证明了拟议策略对两个任务的有效性 - 主动度量学习和主动分类。特别是，我们证明，在深度度量学习环境中，积极选择的NN查询胜过最近开发了主动三重态选择方法。此外，我们表明，在分类中，可以主动选择班级标签可以重新重新重新构成选择最有用的NN查询的过程，从而直接应用我们的方法。

Active learning is commonly used to train label-efficient models by adaptively selecting the most informative queries. However, most active learning strategies are designed to either learn a representation of the data (e.g., embedding or metric learning) or perform well on a task (e.g., classification) on the data. However, many machine learning tasks involve a combination of both representation learning and a task-specific goal. Motivated by this, we propose a novel unified query framework that can be applied to any problem in which a key component is learning a representation of the data that reflects similarity. Our approach builds on similarity or nearest neighbor (NN) queries which seek to select samples that result in improved embeddings. The queries consist of a reference and a set of objects, with an oracle selecting the object most similar (i.e., nearest) to the reference. In order to reduce the number of solicited queries, they are chosen adaptively according to an information theoretic criterion. We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification -- using a variety of synthetic and real world datasets. In particular, we demonstrate that actively selected NN queries outperform recently developed active triplet selection methods in a deep metric learning setting. Further, we show that in classification, actively selecting class labels can be reformulated as a process of selecting the most informative NN query, allowing direct application of our method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题