上下文感知的深层Q网络，用于分散合作侦察的机器人群

论文标题

上下文感知的深层Q网络，用于分散合作侦察的机器人群

Context-Aware Deep Q-Network for Decentralized Cooperative Reconnaissance by a Robotic Swarm

论文作者

Mohanty, Nishant, Gadde, Mohitvishnu S., Sundaram, Suresh, Sundararajan, Narasimhan, Sujit, P. B.

论文摘要

基于机器人群的操作中的关键问题之一是在未知和不确定的环境中搜索和中和异质目标，而无需群体内部的任何通信。在这里，某些目标可以被单个机器人中和，而另一些目标则需要特定序列中的多个机器人来中和。问题的复杂性是由于可扩展性和信息不确定性而产生的，这限制了机器人对群体和目标分布的认识。在本文中，通过提出一种新颖的环境意识深度Q-Network（CA-DQN）框架来解决该问题，以获得群体中机器人之间的自由沟通合作。每个机器人都在附近保持自适应网格表示，并嵌入其中的上下文信息，以使群保持完整，同时搜索和中和目标。问题表述使用了强化学习框架，其中两个深Q-Networks（DQN）分别处理“冲突”和“无冲突”场景。基于自我播放的方法用于确定DQN的最佳策略。进行了与最先进的联盟形成算法的蒙特卡洛模拟和比较研究，以验证具有不同环境参数的CA-DQN的性能。结果表明，该方法对于检测到的目标数量和群中的机器人数不变。本文还介绍了使用实验室环境中的地面机器人实时实施CA-DQN，以证明CA-DQN使用低功率计算设备的工作。

One of the crucial problems in robotic swarm-based operation is to search and neutralize heterogeneous targets in an unknown and uncertain environment, without any communication within the swarm. Here, some targets can be neutralized by a single robot, while others need multiple robots in a particular sequence to neutralize them. The complexity in the problem arises due to the scalability and information uncertainty, which restricts the robot's awareness of the swarm and the target distribution. In this paper, this problem is addressed by proposing a novel Context-Aware Deep Q-Network (CA-DQN) framework to obtain communication free cooperation between the robots in the swarm. Each robot maintains an adaptive grid representation of the vicinity with the context information embedded into it to keep the swarm intact while searching and neutralizing the targets. The problem formulation uses a reinforcement learning framework where two Deep Q-Networks (DQNs) handle 'conflict' and 'conflict-free' scenarios separately. The self-play-in-based approach is used to determine the optimal policy for the DQNs. Monte-Carlo simulations and comparison studies with a state-of-the-art coalition formation algorithm are performed to verify the performance of CA-DQN with varying environmental parameters. The results show that the approach is invariant to the number of detected targets and the number of robots in the swarm. The paper also presents the real-time implementation of CA-DQN for different scenarios using ground robots in a laboratory environment to demonstrate the working of CA-DQN with low-power computing devices.

下载PDF全文

下载文献需遵守相关版权规定

论文标题