通过考虑未来的任务来避免副作用

论文标题

通过考虑未来的任务来避免副作用

Avoiding Side Effects By Considering Future Tasks

论文作者

Krakovna, Victoria, Orseau, Laurent, Ngo, Richard, Martic, Miljan, Legg, Shane

论文摘要

设计奖励功能很困难：设计师必须指定要做什么（完成任务意味着什么）以及不做什么（完成任务时应避免的副作用）。为了减轻奖励设计师的负担，我们提出了一种算法来自动产生辅助奖励功能，以惩罚副作用。这个辅助目标奖励完成可能的未来任务的能力，如果代理在当前任务中会造成副作用，则可以降低。未来的任务奖励还可以使代理商有动力干扰环境中的事件，这些事件使未来的任务降低了，例如其他代理商的不可逆转行动。为了避免这种干扰激励措施，我们引入了一个基线策略，该策略代表默认的行动课程（例如什么都不做），并使用它来滤除默认情况下无法实现的未来任务。我们正式定义了干涉激励措施，并表明，基线政策的未来任务方法在确定性案例中避免了这些激励措施。使用测试副作用和干扰的网格世界环境，我们表明我们的方法避免了干扰，并且比惩罚不可逆动作的常见方法更有效地避免副作用。

Designing reward functions is difficult: the designer has to specify what to do (what it means to complete the task) as well as what not to do (side effects that should be avoided while completing the task). To alleviate the burden on the reward designer, we propose an algorithm to automatically generate an auxiliary reward function that penalizes side effects. This auxiliary objective rewards the ability to complete possible future tasks, which decreases if the agent causes side effects during the current task. The future task reward can also give the agent an incentive to interfere with events in the environment that make future tasks less achievable, such as irreversible actions by other agents. To avoid this interference incentive, we introduce a baseline policy that represents a default course of action (such as doing nothing), and use it to filter out future tasks that are not achievable by default. We formally define interference incentives and show that the future task approach with a baseline policy avoids these incentives in the deterministic case. Using gridworld environments that test for side effects and interference, we show that our method avoids interference and is more effective for avoiding side effects than the common approach of penalizing irreversible actions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题