论文标题
集体条件反射:一种生物启发的快速紧急反应机制,用于设计安全的多机器人系统
Collective Conditioned Reflex: A Bio-Inspired Fast Emergency Reaction Mechanism for Designing Safe Multi-Robot Systems
论文作者
论文摘要
多机器人系统(MRS)是一组协调的机器人,旨在彼此合作并完成给定的任务。由于操作环境中的不确定性,该系统可能会遇到紧急情况,例如未观察到的障碍,行驶的车辆和极端天气。蜂群等动物群体会引发集体紧急反应行为,例如绕过障碍和避免掠食者,类似于肌肉条件的反射,该反射组织局部肌肉,以避免在第一反应中避免危害,而不会延迟通过大脑。受此启发,我们为多机器人系统制定了类似的集体条件反射机制,以应对紧急情况。在这项研究中,基于动物集体行为分析和多代理增强学习(MARL),开发了一种由生物启发的紧急反应机制(MARL)开发的集体条件反射(CCR)。该算法使用物理模型来确定机器人是否正在遇到紧急情况。然后,通过相应的启发式奖励增强了涉及紧急情况的机器人的奖励,该启发式奖励评估紧急情况和后果并决定当地机器人的参与。 CCR在三个典型的紧急情况下进行了验证:\ textit {湍流,强风和隐藏障碍物}。仿真结果表明,与基线方法相比,CCR以更快的反应速度和更安全的轨迹调整来提高机器人团队的紧急反应能力。
A multi-robot system (MRS) is a group of coordinated robots designed to cooperate with each other and accomplish given tasks. Due to the uncertainties in operating environments, the system may encounter emergencies, such as unobserved obstacles, moving vehicles, and extreme weather. Animal groups such as bee colonies initiate collective emergency reaction behaviors such as bypassing obstacles and avoiding predators, similar to muscle-conditioned reflex which organizes local muscles to avoid hazards in the first response without delaying passage through the brain. Inspired by this, we develop a similar collective conditioned reflex mechanism for multi-robot systems to respond to emergencies. In this study, Collective Conditioned Reflex (CCR), a bio-inspired emergency reaction mechanism, is developed based on animal collective behavior analysis and multi-agent reinforcement learning (MARL). The algorithm uses a physical model to determine if the robots are experiencing an emergency; then, rewards for robots involved in the emergency are augmented with corresponding heuristic rewards, which evaluate emergency magnitudes and consequences and decide local robots' participation. CCR is validated on three typical emergency scenarios: \textit{turbulence, strong wind, and hidden obstacle}. Simulation results demonstrate that CCR improves robot teams' emergency reaction capability with faster reaction speed and safer trajectory adjustment compared with baseline methods.