论文标题
通过动态算法配置学习启发式选择
Learning Heuristic Selection with Dynamic Algorithm Configuration
论文作者
论文摘要
满足计划的关键挑战是在一次启发式搜索中使用多种启发式方法。例如,多种启发式估计值的聚合,例如,通过最大程度的估计,缺点,即单个启发式的不良估计可能会对整个搜索产生负面影响。由于启发式的性能随实例而异,因此可以成功应用算法选择之类的方法。此外,在搜索过程中进行多个启发式方法之间的交替使得可以平等地使用所有启发式方法并提高性能。但是,所有这些方法都忽略了计划系统的内部搜索动力学,这可以帮助为当前扩展步骤选择最有用的启发式方法。我们表明,动态算法配置可用于动态启发式选择,该选择考虑了计划系统的内部搜索动力学。此外,我们证明这种方法概括了现有方法,并且可以指数级提高启发式搜索的性能。为了学习动态的启发式选择,我们提出了一种基于强化学习的方法,并经验表明,域名学会的政策(考虑到计划系统的内部搜索动态)可以超越现有方法。
A key challenge in satisficing planning is to use multiple heuristics within one heuristic search. An aggregation of multiple heuristic estimates, for example by taking the maximum, has the disadvantage that bad estimates of a single heuristic can negatively affect the whole search. Since the performance of a heuristic varies from instance to instance, approaches such as algorithm selection can be successfully applied. In addition, alternating between multiple heuristics during the search makes it possible to use all heuristics equally and improve performance. However, all these approaches ignore the internal search dynamics of a planning system, which can help to select the most useful heuristics for the current expansion step. We show that dynamic algorithm configuration can be used for dynamic heuristic selection which takes into account the internal search dynamics of a planning system. Furthermore, we prove that this approach generalizes over existing approaches and that it can exponentially improve the performance of the heuristic search. To learn dynamic heuristic selection, we propose an approach based on reinforcement learning and show empirically that domain-wise learned policies, which take the internal search dynamics of a planning system into account, can exceed existing approaches.