论文标题
部分可观测时空混沌系统的无模型预测
Towards Comprehensive Testing on the Robustness of Cooperative Multi-agent Reinforcement Learning
论文作者
论文摘要
尽管深度神经网络(DNNS)增强了合作多代理增强学习(C-MARL)的表现,但对抗性实例很容易扰动代理政策。考虑到C-MARL的安全性关键应用,例如交通管理,电源管理和无人机控制,在将C-MARL算法的稳健性进行现实之前,测试C-MARL算法的鲁棒性至关重要。现有的MAL的对抗攻击可以用于测试,但仅限于一个稳健性方面(例如奖励,状态,行动),而C-MARL模型可以从任何方面攻击。为了克服挑战,我们提出了Marlsafe,这是C-MARL算法的第一个鲁棒性测试框架。首先,是由马尔可夫决策过程(MDP)激励的,Marlsafe考虑了C-Marl算法的鲁棒性,从三个方面全面考虑了C-MARL算法,即表达鲁棒性,鲁棒性和奖励鲁棒性。任何C-MARL算法都必须同时满足这些鲁棒性方面才能被认为是安全的。其次,由于C-MARL攻击的稀缺性,我们提出了C-MARL攻击,作为来自多个方面的鲁棒性测试算法。 \ textIt {SMAC}环境上的实验表明,许多最新的C-MARL算法在各个方面都具有低鲁棒性,指出迫切需要测试和增强C-MARL算法的鲁棒性。
While deep neural networks (DNNs) have strengthened the performance of cooperative multi-agent reinforcement learning (c-MARL), the agent policy can be easily perturbed by adversarial examples. Considering the safety critical applications of c-MARL, such as traffic management, power management and unmanned aerial vehicle control, it is crucial to test the robustness of c-MARL algorithm before it was deployed in reality. Existing adversarial attacks for MARL could be used for testing, but is limited to one robustness aspects (e.g., reward, state, action), while c-MARL model could be attacked from any aspect. To overcome the challenge, we propose MARLSafe, the first robustness testing framework for c-MARL algorithms. First, motivated by Markov Decision Process (MDP), MARLSafe consider the robustness of c-MARL algorithms comprehensively from three aspects, namely state robustness, action robustness and reward robustness. Any c-MARL algorithm must simultaneously satisfy these robustness aspects to be considered secure. Second, due to the scarceness of c-MARL attack, we propose c-MARL attacks as robustness testing algorithms from multiple aspects. Experiments on \textit{SMAC} environment reveals that many state-of-the-art c-MARL algorithms are of low robustness in all aspect, pointing out the urgent need to test and enhance robustness of c-MARL algorithms.