通过离散的主动推断奖励最大化

论文标题

通过离散的主动推断奖励最大化

Reward Maximisation through Discrete Active Inference

论文作者

Da Costa, Lancelot, Sajid, Noor, Parr, Thomas, Friston, Karl, Smith, Ryan

论文摘要

主动推断是建模生物学和人造药物行为的概率框架，它源自最小化自由能的原理。近年来，该框架已成功地应用于各种情况，目的是最大程度地提高奖励，提供可比性的，有时甚至是卓越的性能与替代方法。在本文中，我们通过演示如何以及何时进行主动推理代理执行最佳奖励的动作来阐明奖励最大化与主动推断之间的联系。确切地说，我们展示了主动推理为Bellman方程提供最佳解决方案的条件 - 该公式是基于模型的增强学习和控制的几种方法。在部分观察到的马尔可夫决策过程中，标准的主动推理方案可以为计划视野1的最佳动作产生最佳动作，但不能超越。相比之下，最近开发的递归活跃推理方案（复杂的推理）可以在任何有限的颞范围内产生最佳作用。我们通过讨论主动推理和强化学习之间更广泛的关系来补充分析。

Active inference is a probabilistic framework for modelling the behaviour of biological and artificial agents, which derives from the principle of minimising free energy. In recent years, this framework has successfully been applied to a variety of situations where the goal was to maximise reward, offering comparable and sometimes superior performance to alternative approaches. In this paper, we clarify the connection between reward maximisation and active inference by demonstrating how and when active inference agents perform actions that are optimal for maximising reward. Precisely, we show the conditions under which active inference produces the optimal solution to the Bellman equation--a formulation that underlies several approaches to model-based reinforcement learning and control. On partially observed Markov decision processes, the standard active inference scheme can produce Bellman optimal actions for planning horizons of 1, but not beyond. In contrast, a recently developed recursive active inference scheme (sophisticated inference) can produce Bellman optimal actions on any finite temporal horizon. We append the analysis with a discussion of the broader relationship between active inference and reinforcement learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题