专栏生成的深度加固学习框架

论文标题

专栏生成的深度加固学习框架

A Deep Reinforcement Learning Framework For Column Generation

论文作者

Chi, Cheng, Aboussalah, Amine Mohamed, Khalil, Elias B., Wang, Juyoung, Sherkat-Masoumi, Zoha

论文摘要

列生成（CG）是用于求解线性程序（LPS）的迭代算法，具有大量变量（列）。 CG是解决大规模\ textit {Integer}线性程序的主力军，该程序依靠CG来解决分支机构和价格算法中的LP放松。有两个规范的应用程序是与时间窗口（VRPTW）的切割库存问题（CSP）和车辆路由问题。例如，在vrptw中，每个二进制变量表示包含或排除a \ textit {route}的决定，其中有很多； CG逐步生长所使用的列的子集，最终收敛到最佳解决方案。我们提出了RLCG，这是CG的第一种加强学习（RL）方法。与典型的列选择规则不同，在每次迭代中，根据本地信息近视选择列，我们将CG视为一个顺序决策问题：在给定迭代中选择的列会影响后续的列选择。这种观点将自己带入了一种深厚的增强学习方法，该方法使用图形神经网络（GNN）代表感兴趣的LP中的可变构造结构。我们使用用于VRPTW的CSP和所罗门基准测试的公开可用的BPPLIB基准进行了广泛的实验。与常用的贪婪策略相比，RLCG的收敛速度更快，CSP的CG迭代次数减少22.4 \％，而VRPTW的CG迭代率平均为40.9 \％。我们的代码可在https://github.com/chichengmessi/reinforception-learning-for-column-generation.git上找到。

Column Generation (CG) is an iterative algorithm for solving linear programs (LPs) with an extremely large number of variables (columns). CG is the workhorse for tackling large-scale \textit{integer} linear programs, which rely on CG to solve LP relaxations within a branch and price algorithm. Two canonical applications are the Cutting Stock Problem (CSP) and Vehicle Routing Problem with Time Windows (VRPTW). In VRPTW, for example, each binary variable represents the decision to include or exclude a \textit{route}, of which there are exponentially many; CG incrementally grows the subset of columns being used, ultimately converging to an optimal solution. We propose RLCG, the first Reinforcement Learning (RL) approach for CG. Unlike typical column selection rules which myopically select a column based on local information at each iteration, we treat CG as a sequential decision-making problem: the column selected in a given iteration affects subsequent column selections. This perspective lends itself to a Deep Reinforcement Learning approach that uses Graph Neural Networks (GNNs) to represent the variable-constraint structure in the LP of interest. We perform an extensive set of experiments using the publicly available BPPLIB benchmark for CSP and Solomon benchmark for VRPTW. RLCG converges faster and reduces the number of CG iterations by 22.4\% for CSP and 40.9\% for VRPTW on average compared to a commonly used greedy policy. Our code is available at https://github.com/chichengmessi/reinforcement-learning-for-column-generation.git.

下载PDF全文

下载文献需遵守相关版权规定

论文标题