混合整数分支和绑定的学习搜索近似研究：SCIP中的节点选择

论文标题

混合整数分支和绑定的学习搜索近似研究：SCIP中的节点选择

A Study of Learning Search Approximation in Mixed Integer Branch and Bound: Node Selection in SCIP

论文作者

Yilmaz, Kaan, Yorke-Smith, Neil

论文摘要

与使用机器学习来帮助解决组合优化问题的不断增长的趋势相一致，一个有前途的想法是通过使用学习的策略来改善混合整数编程（MIP）分支机构和结合树中的节点选择。使用模仿学习的先前工作表明，通过学习自适应节点搜索顺序来获取节点选择策略的可行性。相比之下，我们的模仿学习政策仅专注于学习节点的孩子要选择哪个。我们提出了一种离线方法，可以在两个环境中学习这样的政策：一种通过承诺修剪节点来组成启发式方法；一个精确的和从叶子回溯，以确保找到最佳整数解决方案。前者的设置对应于跌落过程中的儿童选择器，而后者类似于潜水启发式。我们在启发式和精确的设置中都将该策略应用于流行的开源求解器SCIP。五个MIP数据集的经验结果表明，与文献中最新的先例相比，我们的节点选择策略更快地导致解决方案。尽管我们没有在求解精确解决方案的时间方面击败高度优化的SCIP先进的基线节点选择器，但如果预测模型的准确性就足够，我们的启发式策略比所有基线都具有比所有基础线更好的最佳差距。此外，结果还表明，当应用时间限制时，我们的启发式方法比大多数测试问题中的所有基准都找到了更好的解决方案。我们通过表明学识渊博的政策模仿了SCIP基线来解释结果，但没有后者的早期暴跌。我们的建议是，尽管对文献有明显的改进，但使用MIP分支和结合树的决策中的学习，在更广泛的方法中可以更好地看到这种MIP儿童选择器。

In line with the growing trend of using machine learning to help solve combinatorial optimisation problems, one promising idea is to improve node selection within a mixed integer programming (MIP) branch-and-bound tree by using a learned policy. Previous work using imitation learning indicates the feasibility of acquiring a node selection policy, by learning an adaptive node searching order. In contrast, our imitation learning policy is focused solely on learning which of a node's children to select. We present an offline method to learn such a policy in two settings: one that comprises a heuristic by committing to pruning of nodes; one that is exact and backtracks from a leaf to guarantee finding the optimal integer solution. The former setting corresponds to a child selector during plunging, while the latter is akin to a diving heuristic. We apply the policy within the popular open-source solver SCIP, in both heuristic and exact settings. Empirical results on five MIP datasets indicate that our node selection policy leads to solutions significantly more quickly than the state-of-the-art precedent in the literature. While we do not beat the highly-optimised SCIP state-of-practice baseline node selector in terms of solving time on exact solutions, our heuristic policies have a consistently better optimality gap than all baselines, if the accuracy of the predictive model is sufficient. Further, the results also indicate that, when a time limit is applied, our heuristic method finds better solutions than all baselines in the majority of problems tested. We explain the results by showing that the learned policies have imitated the SCIP baseline, but without the latter's early plunge abort. Our recommendation is that, despite the clear improvements over the literature, this kind of MIP child selector is better seen in a broader approach using learning in MIP branch-and-bound tree decisions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题