论文标题
元认知。认知雷达的倒数强化学习方法
Meta-Cognition. An Inverse-Inverse Reinforcement Learning Approach for Cognitive Radars
论文作者
论文摘要
本文在对抗环境中考虑了元认知雷达。认知雷达最佳地适应其波形(响应),以响应可能的对抗运动目标的操纵(探针)。元认知雷达意识到目标的对抗性,并试图减轻对抗目标。元认知雷达应该如何选择其反应以充分混淆试图估计雷达效用函数的对手?本文根据国家的光谱(特征值)和观察噪声协方差矩阵,将雷达的元认知问题提取,并将代数riccati方程嵌入基于经济学的实用性最大化设置中。这种对手目标是一个反向加强学习者。通过观察雷达响应的嘈杂序列(波形),对抗目标使用统计假设检验来检测雷达是否是实用性最大化器。反过来,元认知雷达故意选择了次优响应,从而增加了对手检测器的I型误差概率。我们将元认知雷达采取的反对反向步骤称为反逆增强学习(I-irl)。我们通过简单的数值示例说明了本文的元认知结果。本文中我们的元认知方法基于微观经济学的偏好理论,并受到机器学习中差异隐私和对抗性混淆的结果的启发。
This paper considers meta-cognitive radars in an adversarial setting. A cognitive radar optimally adapts its waveform (response) in response to maneuvers (probes) of a possibly adversarial moving target. A meta-cognitive radar is aware of the adversarial nature of the target and seeks to mitigate the adversarial target. How should the meta-cognitive radar choose its responses to sufficiently confuse the adversary trying to estimate the radar's utility function? This paper abstracts the radar's meta-cognition problem in terms of the spectra (eigenvalues) of the state and observation noise covariance matrices, and embeds the algebraic Riccati equation into an economics-based utility maximization setup. This adversarial target is an inverse reinforcement learner. By observing a noisy sequence of radar's responses (waveforms), the adversarial target uses a statistical hypothesis test to detect if the radar is a utility maximizer. In turn, the meta-cognitive radar deliberately chooses sub-optimal responses that increasing its Type-I error probability of the adversary's detector. We call this counter-adversarial step taken by the meta-cognitive radar as inverse inverse reinforcement learning (I-IRL). We illustrate the meta-cognition results of this paper via simple numerical examples. Our approach for meta-cognition in this paper is based on revealed preference theory in micro-economics and inspired by results in differential privacy and adversarial obfuscation in machine learning.