积极的模仿学习和嘈杂的指导

论文标题

积极的模仿学习和嘈杂的指导

Active Imitation Learning with Noisy Guidance

论文作者

Brantley, Kianté, Sharaf, Amr, Daumé III, Hal

论文摘要

模仿学习算法通过学习近乎最佳的搜索策略，为许多结构化预测任务提供最新的结果。这种算法假设可以在任何查询状态下提供最佳行动的专家的培训时间访问权限；不幸的是，此类查询的数量通常是过时的，通常会使这些方法不切实际。为了打击这种查询的复杂性，我们考虑了一个积极的学习环境，在该设置中，学习算法可以额外访问便宜的嘈杂启发式，以提供嘈杂的指导。我们的算法Leaqi学习了一个差异分类器，该分类器可以预测专家何时可能不同意启发式，并且只有在必要时才查询专家。我们将LEAQI应用于三个序列标记任务，向专家表明查询明显较少，并且在被动方法上的精度可比（或更高）。

Imitation learning algorithms provide state-of-the-art results on many structured prediction tasks by learning near-optimal search policies. Such algorithms assume training-time access to an expert that can provide the optimal action at any queried state; unfortunately, the number of such queries is often prohibitive, frequently rendering these approaches impractical. To combat this query complexity, we consider an active learning setting in which the learning algorithm has additional access to a much cheaper noisy heuristic that provides noisy guidance. Our algorithm, LEAQI, learns a difference classifier that predicts when the expert is likely to disagree with the heuristic, and queries the expert only when necessary. We apply LEAQI to three sequence labeling tasks, demonstrating significantly fewer queries to the expert and comparable (or better) accuracies over a passive approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题