论文标题
分布式自适应增强学习:一种最佳路由的方法
Distributed Adaptive Reinforcement Learning: A Method for Optimal Routing
论文作者
论文摘要
在本文中,提出了一种基于学习的最佳运输算法,用于自动出租车和乘车车辆。目的是设计一种机制来解决多个自动驾驶汽车和多个客户的路由问题,以最大程度地提高运输公司的利润。结果,每辆车从长远来看,其要求最大化公司的利润。为了解决此问题,该系统使用过去的客户数据将其建模为Markov决策过程(MDP)。通过求解定义的MDP,可以获得集中式的高级计划建议,其中该离线解决方案被用作实时学习的初始值。然后,提出了分布式的SARSA增强学习算法来捕获模型错误和环境变化,例如每个区域,流量和票价中客户分布的变化,从而实时提供最佳的路由策略。车辆或代理只使用其本地信息和互动,例如当前的乘客要求和邻居任务及其最佳行动的估计,以分布式方式获得最佳政策。引入了最佳自适应率,以使分布式的SARSA算法能够适应环境变化并跟踪随时间变化的最佳策略。此外,提出了一种基于游戏理论的任务分配算法,每个代理使用分布式SARSA的最佳策略及其值以分布式方式从一组本地可用请求中选择其客户。最后,芝加哥市提供的客户数据用于验证拟议的算法。
In this paper, a learning-based optimal transportation algorithm for autonomous taxis and ridesharing vehicles is presented. The goal is to design a mechanism to solve the routing problem for multiple autonomous vehicles and multiple customers in order to maximize the transportation company's profit. As a result, each vehicle selects the customer whose request maximizes the company's profit in the long run. To solve this problem, the system is modeled as a Markov Decision Process (MDP) using past customers data. By solving the defined MDP, a centralized high-level planning recommendation is obtained, where this offline solution is used as an initial value for the real-time learning. Then, a distributed SARSA reinforcement learning algorithm is proposed to capture the model errors and the environment changes, such as variations in customer distributions in each area, traffic, and fares, thereby providing optimal routing policies in real-time. Vehicles, or agents, use only their local information and interaction, such as current passenger requests and estimates of neighbors' tasks and their optimal actions, to obtain the optimal policies in a distributed fashion. An optimal adaptive rate is introduced to make the distributed SARSA algorithm capable of adapting to changes in the environment and tracking the time-varying optimal policies. Furthermore, a game-theory-based task assignment algorithm is proposed, where each agent uses the optimal policies and their values from distributed SARSA to select its customer from the set of local available requests in a distributed manner. Finally, the customers data provided by the city of Chicago is used to validate the proposed algorithms.