库存控制的深度控制学习

论文标题

库存控制的深度控制学习

Deep Controlled Learning for Inventory Control

论文作者

Temizöz, Tarkan, Imdahl, Christina, Dijkman, Remco, Lamghari-Idrissi, Douniel, van Jaarsveld, Willem

论文摘要

深入加固学习（DRL）在库存管理中的应用是一个新兴领域。但是，最初是为诸如游戏玩法和机器人技术之类的不同领域开发的传统DRL算法，对于库存管理带来的具体挑战可能并不适合。因此，这些算法通常无法胜过确定的启发式方法。例如，没有现有的DRL方法一致地超过了丢失的销售库存控制中的限制基础库存策略。这突出了DRL在库存管理上的实际应用中的关键差距：库存问题的高度随机性质需要量身定制的解决方案。作为回应，我们提出了深层控制学习（DCL），这是一种新的DRL算法，专为高度随机问题而设计。 DCL基于近似政策迭代，并结合了有效的仿真机制，将顺序减半与常见随机数相结合。我们的数值研究表明，DCL始终超过各种库存设置的最先进的启发式方法和DRL算法，包括销售损失，易腐烂的库存系统和具有随机交货时间的库存系统。 DCL在所有测试案例中均达到较低的平均成本，同时保持最佳差距不超过0.2 \％。值得注意的是，这种性能是使用所有实验中相同的超参数设置来实现的，强调了我们方法的鲁棒性和概括性。这些发现有助于持续探索量身定制的DRL算法用于库存管理，为该领域的进一步研究和实际应用提供了基础。

The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field. However, traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management. Consequently, these algorithms often fail to outperform established heuristics; for instance, no existing DRL approach consistently surpasses the capped base-stock policy in lost sales inventory control. This highlights a critical gap in the practical application of DRL to inventory management: the highly stochastic nature of inventory problems requires tailored solutions. In response, we propose Deep Controlled Learning (DCL), a new DRL algorithm designed for highly stochastic problems. DCL is based on approximate policy iteration and incorporates an efficient simulation mechanism, combining Sequential Halving with Common Random Numbers. Our numerical studies demonstrate that DCL consistently outperforms state-of-the-art heuristics and DRL algorithms across various inventory settings, including lost sales, perishable inventory systems, and inventory systems with random lead times. DCL achieves lower average costs in all test cases while maintaining an optimality gap of no more than 0.2\%. Remarkably, this performance is achieved using the same hyperparameter set across all experiments, underscoring the robustness and generalizability of our approach. These findings contribute to the ongoing exploration of tailored DRL algorithms for inventory management, providing a foundation for further research and practical application in this area.

下载PDF全文

下载文献需遵守相关版权规定

论文标题