低碳系统中的强化学习和用于电厂调度的混合企业编程：比较和杂交

论文标题

低碳系统中的强化学习和用于电厂调度的混合企业编程：比较和杂交

Reinforcement Learning and Mixed-Integer Programming for Power Plant Scheduling in Low Carbon Systems: Comparison and Hybridisation

论文作者

O'Malley, Cormac, de Mars, Patrick, Badesa, Luis, Strbac, Goran

论文摘要

脱碳正在推动可再生发电的急剧增长。这增加了发电厂要服务的负载的不确定性，并使其有效的计划（称为单位承诺（UC）问题）更加困难。 UC在实践中通过混合智能编程（MIP）方法解决；但是，人们对包括增强学习（RL）在内的新兴数据驱动方法越来越兴趣。在本文中，我们在大量的测试日和问题尺寸上进行了广泛测试两个MIP（确定性和随机性）和两个RL（无模型和lookahead）调度方法，这是首次比较在水平竞争场上这两种方法的最新方法。我们发现，确定性和随机MIP始终产生比RL的低成本UC计划，具有更好的可靠性和可扩展性，并具有问题大小。对于50转测试案例，RL的平均运营成本比随机MIP大的2倍以上，而在最糟糕的情况下，成本大的13倍。但是，RL的关键优势在于能够立即生产解决方案，而与问题大小无关。我们利用这一优势生成各种初始解决方案，用于温暖启动并发随机MIP溶液。通过同时产生几种近乎最佳的解决方案，然后使用蒙特卡洛方法对其进行评估，可以利用真实成本函数和制定MIP所需的离散近似值之间的差异。所得的混合技术的表现分别优于RL和MIP方法，使总运营成本平均降低了0.3％。

Decarbonisation is driving dramatic growth in renewable power generation. This increases uncertainty in the load to be served by power plants and makes their efficient scheduling, known as the unit commitment (UC) problem, more difficult. UC is solved in practice by mixed-integer programming (MIP) methods; however, there is growing interest in emerging data-driven methods including reinforcement learning (RL). In this paper, we extensively test two MIP (deterministic and stochastic) and two RL (model-free and with lookahead) scheduling methods over a large set of test days and problem sizes, for the first time comparing the state-of-the-art of these two approaches on a level playing field. We find that deterministic and stochastic MIP consistently produce lower-cost UC schedules than RL, exhibiting better reliability and scalability with problem size. Average operating costs of RL are more than 2 times larger than stochastic MIP for a 50-generator test case, while the cost is 13 times larger in the worst instance. However, the key strength of RL is the ability to produce solutions practically instantly, irrespective of problem size. We leverage this advantage to produce various initial solutions for warm starting concurrent stochastic MIP solves. By producing several near-optimal solutions simultaneously and then evaluating them using Monte Carlo methods, the differences between the true cost function and the discrete approximation required to formulate the MIP are exploited. The resulting hybrid technique outperforms both the RL and MIP methods individually, reducing total operating costs by 0.3% on average.

下载PDF全文

下载文献需遵守相关版权规定

论文标题