论文标题
TRENSPATH:通过变压器学习基于网格的探路的启发式方法
TransPath: Learning Heuristics For Grid-Based Pathfinding via Transformers
论文作者
论文摘要
启发式搜索算法,例如A*是用于网格上的探路的常用工具,即广泛用来代表机器人,视频游戏等环境等的常规结构图等。曼哈顿的距离,不要考虑障碍,因此,在障碍富裕的环境中,这种启发式方法的搜索效果很差。为此,我们建议学习依赖实例的启发式代理,这些代理应该显着提高搜索效率。我们建议学习的第一个启发式代理是校正因子,即独立成本估算与完美的估计值之间的比率(在培训阶段计算得出)。与学习成本到启发式功能的绝对价值不同,以前是以前知道的,当学习校正因素时,使用了独立的启发式启发式的知识。第二个启发式代理是路径概率,它表明网格单元位于最短路径上的可能性。这种启发式词可以在焦点搜索框架中用作次要启发式,使我们能够保留解决方案有限的亚临时性的保证。我们通过包含注意力障碍(变形金刚)的最先进的神经网络以有监督的方式学习了启发式方法。我们对计划任务的全面数据集进行了彻底的经验评估,这表明建议的技术i)在生产解决方案的同时将A*的计算工作减少到$ 4 $ X,而最佳解决方案的成本超过了最佳解决方案的成本低于$ 0.3 $ 0.3美元; ii)优于竞争对手,其中包括启发式搜索的传统技术,即加权A*以及最先进的可学习计划者。
Heuristic search algorithms, e.g. A*, are the commonly used tools for pathfinding on grids, i.e. graphs of regular structure that are widely employed to represent environments in robotics, video games etc. Instance-independent heuristics for grid graphs, e.g. Manhattan distance, do not take the obstacles into account and, thus, the search led by such heuristics performs poorly in the obstacle-rich environments. To this end, we suggest learning the instance-dependent heuristic proxies that are supposed to notably increase the efficiency of the search. The first heuristic proxy we suggest to learn is the correction factor, i.e. the ratio between the instance independent cost-to-go estimate and the perfect one (computed offline at the training phase). Unlike learning the absolute values of the cost-to-go heuristic function, which was known before, when learning the correction factor the knowledge of the instance-independent heuristic is utilized. The second heuristic proxy is the path probability, which indicates how likely the grid cell is lying on the shortest path. This heuristic can be utilized in the Focal Search framework as the secondary heuristic, allowing us to preserve the guarantees on the bounded sub-optimality of the solution. We learn both suggested heuristics in a supervised fashion with the state-of-the-art neural networks containing attention blocks (transformers). We conduct a thorough empirical evaluation on a comprehensive dataset of planning tasks, showing that the suggested techniques i) reduce the computational effort of the A* up to a factor of $4$x while producing the solutions, which costs exceed the costs of the optimal solutions by less than $0.3$% on average; ii) outperform the competitors, which include the conventional techniques from the heuristic search, i.e. weighted A*, as well as the state-of-the-art learnable planners.