通过信息瓶颈学习任务驱动的控制政策

论文标题

通过信息瓶颈学习任务驱动的控制政策

Learning Task-Driven Control Policies via Information Bottlenecks

论文作者

Pacelli, Vincent, Majumdar, Anirudha

论文摘要

本文提出了一种增强学习方法，用于综合具有丰富感觉方式（例如，视觉或深度）的机器人系统的任务驱动控制策略。标准的增强学习算法通常会制定策略，这些策略将控制动作紧密地与整个系统状态和丰富的传感器观察结果相结合。结果，所产生的策略通常可以敏感到状态的任务 - 无关紧要的部分或观察结果（例如，背景颜色变化）。相比之下，我们在这里提出的方法学会了创建用于计算控制动作的任务驱动的表示。正式地，这是通过得出策略梯度式算法来实现的，该算法在国家和任务驱动的表示之间创建信息瓶颈；这将动作限制为仅取决于与任务相关的信息。我们在多个示例中进行了一组模拟结果，包括使用深度图像和使用RGB图像的球捕获任务的掌握任务。与标准策略梯度方法的比较表明，我们算法制定的任务驱动政策通常对环境中的传感器噪声和任务 - 毫无疑问的变化更为强大。

This paper presents a reinforcement learning approach to synthesizing task-driven control policies for robotic systems equipped with rich sensory modalities (e.g., vision or depth). Standard reinforcement learning algorithms typically produce policies that tightly couple control actions to the entirety of the system's state and rich sensor observations. As a consequence, the resulting policies can often be sensitive to changes in task-irrelevant portions of the state or observations (e.g., changing background colors). In contrast, the approach we present here learns to create a task-driven representation that is used to compute control actions. Formally, this is achieved by deriving a policy gradient-style algorithm that creates an information bottleneck between the states and the task-driven representation; this constrains actions to only depend on task-relevant information. We demonstrate our approach in a thorough set of simulation results on multiple examples including a grasping task that utilizes depth images and a ball-catching task that utilizes RGB images. Comparisons with a standard policy gradient approach demonstrate that the task-driven policies produced by our algorithm are often significantly more robust to sensor noise and task-irrelevant changes in the environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题