深入强化学习微电网优化策略考虑优先级灵活需求方面

论文标题

深入强化学习微电网优化策略考虑优先级灵活需求方面

Deep Reinforcement Learning Microgrid Optimization Strategy Considering Priority Flexible Demand Side

论文作者

Sang, Jinsong, Sun, Hongbin, Kou, Lei

论文摘要

作为整合多个分布式能源和用户端的有效方法，微电网主要面临着小规模波动性，不确定性，间歇性和需求侧不确定性的问题。传统的微电网具有单一形式，无法满足复杂需求侧和微电网之间的柔性能量调度。为了应对这个问题，提出了风能，恒温控制负载，储能系统，价格响应负载和主要网格的整体环境。其次，微电网操作的集中控制对于控制分布式电源的反应功率和电压以及电网频率的调整非常方便。但是，存在一个问题，即灵活的负载总负载并在电价谷期间产生峰值。现有的研究考虑了微电网的功率限制，并且无法确保单个柔性负载足够的电能供应。本文根据微电网的整体环境操作考虑了TCLS和ESS的每个单位组件的响应优先级，以确保微电网的柔性负载的电源并在很大程度上节省电源输入成本。最后，环境的仿真优化可以表示为马尔可夫决策过程。在培训过程中，它结合了离线和在线操作的两个阶段。缺乏历史数据学习的多个线程会导致学习效率较低。添加了带有体验重播池存储器库的异步参与者 - 批评者，以解决训练期间的数据相关性和非静态分布问题。

As an efficient way to integrate multiple distributed energy resources and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads, energy storage systems, price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor-critic with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training.

下载PDF全文

下载文献需遵守相关版权规定

论文标题