分散的强化学习：通过当地经济交易的全球决策

论文标题

分散的强化学习：通过当地经济交易的全球决策

Decentralized Reinforcement Learning: Global Decision-Making via Local Economic Transactions

论文作者

Chang, Michael, Kaushik, Sidhant, Weinberg, S. Matthew, Griffiths, Thomas L., Levine, Sergey

论文摘要

本文旨在建立一个框架，以指导一个简单，专业，自私自利的代理商的社会，以解决传统上所构成的单片单代理顺序决策问题。使用分散的方法集体优化核心目标的原因使得很难表征非合作游戏的均衡策略概况。为了克服这一挑战，我们设计了一种定义每个代理的学习环境的机制，我们知道，全球目标的最佳解决方案与代理人优化其本地目标的代理人的NASH均衡策略概况相吻合。该社会是代理商的经济经济，通过互相购买和销售在环境状态下运营的权利来学习信用分配过程本身。我们得出了一类分散的增强学习算法，这些算法不仅适用于标准的增强学习，而且适用于在半MDP中选择选项和动态组成计算图。最后，我们证明了社会固有的模块化结构的潜在优势，以进行更有效的转移学习。

This paper seeks to establish a framework for directing a society of simple, specialized, self-interested agents to solve what traditionally are posed as monolithic single-agent sequential decision problems. What makes it challenging to use a decentralized approach to collectively optimize a central objective is the difficulty in characterizing the equilibrium strategy profile of non-cooperative games. To overcome this challenge, we design a mechanism for defining the learning environment of each agent for which we know that the optimal solution for the global objective coincides with a Nash equilibrium strategy profile of the agents optimizing their own local objectives. The society functions as an economy of agents that learn the credit assignment process itself by buying and selling to each other the right to operate on the environment state. We derive a class of decentralized reinforcement learning algorithms that are broadly applicable not only to standard reinforcement learning but also for selecting options in semi-MDPs and dynamically composing computation graphs. Lastly, we demonstrate the potential advantages of a society's inherent modular structure for more efficient transfer learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题