舰队重新平衡扩展共享的电子动力系统：一种多代理的深入强化学习方法

论文标题

舰队重新平衡扩展共享的电子动力系统：一种多代理的深入强化学习方法

Fleet Rebalancing for Expanding Shared e-Mobility Systems: A Multi-agent Deep Reinforcement Learning Approach

论文作者

Luo, Man, Du, Bowen, Zhang, Wenzhe, Song, Tianyou, Li, Kun, Zhu, Hongming, Birkin, Mark, Wen, Hongkai

论文摘要

共享流动性的电气在全球范围内变得流行。许多城市都部署了新的共享电子运输系统，并不断扩大从中央地区到城市边缘的覆盖范围。这些系统运行中的一个主要挑战是舰队重新平衡，即应如何重新定位电动汽车以更好地满足未来的需求。在扩展系统的背景下，这尤其具有挑战性，因为i）电动汽车的范围有限，而充电时间通常很长，这限制了可行的重新平衡操作； ii）系统中的EV站正在动态变化，即，随着时间的推移，重新平衡操作的合法目标可能会有所不同。我们首先研究了从真实世界共享的电子携带系统收集的丰富数据来应对这些挑战，并分析了这种新移动性模式的操作模型，使用模式和扩展动态。借助知识，我们设计了一个高保真模拟器，该模拟器能够在精细粒度上抽象EV共享的关键操作细节。然后，我们将连续扩展下共享的电子动力系统的重新平衡任务建模为多机构增强学习（MARL）问题，该问题直接考虑了电动汽车的范围和充电属性。我们进一步提出了一种新型的政策优化方法，即动作级联，该方法能够应对扩展动态并解决配制的MARL。我们广泛评估了提出的方法，实验结果表明，我们的方法的表现优于最先进，在满意的需求和净收入方面都可以绩效增长。

The electrification of shared mobility has become popular across the globe. Many cities have their new shared e-mobility systems deployed, with continuously expanding coverage from central areas to the city edges. A key challenge in the operation of these systems is fleet rebalancing, i.e., how EVs should be repositioned to better satisfy future demand. This is particularly challenging in the context of expanding systems, because i) the range of the EVs is limited while charging time is typically long, which constrain the viable rebalancing operations; and ii) the EV stations in the system are dynamically changing, i.e., the legitimate targets for rebalancing operations can vary over time. We tackle these challenges by first investigating rich sets of data collected from a real-world shared e-mobility system for one year, analyzing the operation model, usage patterns and expansion dynamics of this new mobility mode. With the learned knowledge we design a high-fidelity simulator, which is able to abstract key operation details of EV sharing at fine granularity. Then we model the rebalancing task for shared e-mobility systems under continuous expansion as a Multi-Agent Reinforcement Learning (MARL) problem, which directly takes the range and charging properties of the EVs into account. We further propose a novel policy optimization approach with action cascading, which is able to cope with the expansion dynamics and solve the formulated MARL. We evaluate the proposed approach extensively, and experimental results show that our approach outperforms the state-of-the-art, offering significant performance gain in both satisfied demand and net revenue.

下载PDF全文

下载文献需遵守相关版权规定

论文标题