论文标题

在随机观察的环境中进行探路,并具有视觉信息深度强化学习

Pathfinding in Random Partially Observable Environments with Vision-Informed Deep Reinforcement Learning

论文作者

Dowling, Anthony

论文摘要

深度强化学习是一种解决各种环境中问题的技术,从Atari视频游戏到股票交易。该方法利用深层神经网络模型根据对特定环境的观察做出决策,其目标是最大化奖励功能,该功能可以纳入成本和奖励,以实现目标。为了进行探路,奖励条件可以包括到达指定的目标区域以及运动成本。在这项工作中,对多个深Q-Network(DQN)代理进行了培训,可以在部分可观察到的环境中进行操作,目的是在最小的旅行时间内到达目标区域。代理根据周围环境的视觉表示,因此具有观察环境的能力限制。进行DQN,DQN-GRU和DQN-LSTM之间的比较,以检查具有两种不同类型输入的每个模型功能。通过此评估,可以证明,通过等效训练和类似的模型体系结构,DQN模型能够超越其复发性的对应物。

Deep reinforcement learning is a technique for solving problems in a variety of environments, ranging from Atari video games to stock trading. This method leverages deep neural network models to make decisions based on observations of a given environment with the goal of maximizing a reward function that can incorporate cost and rewards for reaching goals. With the aim of pathfinding, reward conditions can include reaching a specified target area along with costs for movement. In this work, multiple Deep Q-Network (DQN) agents are trained to operate in a partially observable environment with the goal of reaching a target zone in minimal travel time. The agent operates based on a visual representation of its surroundings, and thus has a restricted capability to observe the environment. A comparison between DQN, DQN-GRU, and DQN-LSTM is performed to examine each models capabilities with two different types of input. Through this evaluation, it is been shown that with equivalent training and analogous model architectures, a DQN model is able to outperform its recurrent counterparts.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源