在随机观察的环境中进行探路，并具有视觉信息深度强化学习

论文标题

在随机观察的环境中进行探路，并具有视觉信息深度强化学习

Pathfinding in Random Partially Observable Environments with Vision-Informed Deep Reinforcement Learning

论文作者

Dowling, Anthony

论文摘要

深度强化学习是一种解决各种环境中问题的技术，从Atari视频游戏到股票交易。该方法利用深层神经网络模型根据对特定环境的观察做出决策，其目标是最大化奖励功能，该功能可以纳入成本和奖励，以实现目标。为了进行探路，奖励条件可以包括到达指定的目标区域以及运动成本。在这项工作中，对多个深Q-Network（DQN）代理进行了培训，可以在部分可观察到的环境中进行操作，目的是在最小的旅行时间内到达目标区域。代理根据周围环境的视觉表示，因此具有观察环境的能力限制。进行DQN，DQN-GRU和DQN-LSTM之间的比较，以检查具有两种不同类型输入的每个模型功能。通过此评估，可以证明，通过等效训练和类似的模型体系结构，DQN模型能够超越其复发性的对应物。

Deep reinforcement learning is a technique for solving problems in a variety of environments, ranging from Atari video games to stock trading. This method leverages deep neural network models to make decisions based on observations of a given environment with the goal of maximizing a reward function that can incorporate cost and rewards for reaching goals. With the aim of pathfinding, reward conditions can include reaching a specified target area along with costs for movement. In this work, multiple Deep Q-Network (DQN) agents are trained to operate in a partially observable environment with the goal of reaching a target zone in minimal travel time. The agent operates based on a visual representation of its surroundings, and thus has a restricted capability to observe the environment. A comparison between DQN, DQN-GRU, and DQN-LSTM is performed to examine each models capabilities with two different types of input. Through this evaluation, it is been shown that with equivalent training and analogous model architectures, a DQN model is able to outperform its recurrent counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题