连续盘中市场招标的深入加固学习框架

论文标题

连续盘中市场招标的深入加固学习框架

A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding

论文作者

Boukas, Ioannis, Ernst, Damien, Théate, Thibaut, Bolland, Adrien, Huynen, Alexandre, Buchwald, Martin, Wynants, Christelle, Cornélusse, Bertrand

论文摘要

可变能源资源的大量整合预计将使大部分能源交换更接近实时，在该实时可以进行更准确的预测。在这种情况下，短期的电力市场，尤其是盘中市场被认为是这些交易所发生的合适交易层。成功的可再生能源集成的关键组成部分是储能的使用。在本文中，我们提出了一个新颖的建模框架，用于在欧洲连续的盘中市场中储能的战略参与，在欧洲，通过集中订单进行交流。存储设备运营商的目的是在整个交易范围内获得的利润最大化，同时考虑到单位的运营约束。在日内市场中交易的顺序决策问题被建模为马尔可夫决策过程。由于其样品效率，选择了拟合Q迭代算法的异步分布式版本来解决此问题。订单簿中的大量和可变数量的现有订单激发了高级动作和替代状态表示的使用。历史数据用于生成大量人工轨迹，以解决学习过程中的探索问题。由此产生的政策是经过反测试的，并与当前工业标准的基准策略进行了比较。结果表明，代理人将平均总收入的政策收集到比基准策略更高的总收入。

The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous distributed version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves in average higher total revenues than the benchmark strategy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题