动态信息流跟踪游戏的强化学习方法用于检测高级持续威胁

论文标题

动态信息流跟踪游戏的强化学习方法用于检测高级持续威胁

A Reinforcement Learning Approach for Dynamic Information Flow Tracking Games for Detecting Advanced Persistent Threats

论文作者

Sahabandu, Dinuka, Moothedath, Shana, Allen, Joey, Bushnell, Linda, Lee, Wenke, Poovendran, Radha

论文摘要

高级持久威胁（APTS）是威胁敏感信息的安全性和隐私的隐形攻击。 APTS与受害系统的交互引入系统日志中记录的信息流。动态信息流动跟踪（DIFT）是用于检测APTS的有前途的检测机制。 Dift污染信息流源于容易受到攻击的系统实体，跟踪受污染流的传播，并根据预定的安全策略在某些系统组件上对受污染的流进行身份验证。 Dift的部署以防御网络系统中的APT受到与Dift相关的重型资源和性能开销的限制。在本文中，我们通过合并与Dift相关的安全成本，虚假申请和假阴性，为Dift提出了一种资源有效的模型。具体而言，我们开发了游戏理论框架，并提供了DIFT的分析模型，该模型可以研究资源效率和检测有效性之间的权衡。我们的游戏模型是非零和无限马，平均奖励随机游戏。我们的模型结合了Dift无法区分恶意流与良性流和APT无法知道Dift执行安全分析的位置的播放器之间的信息不对称。此外，由于过渡概率（假阳性和假阴性率）尚不清楚，因此游戏具有不完整的信息。我们提出了一种多次刻度随机近似算法，以学习游戏的平衡解决方案。我们证明我们的算法会收敛到平均奖励NASH平衡。我们在现实世界中评估了我们提出的模型和算法，并验证了提出的方法的有效性。

Advanced Persistent Threats (APTs) are stealthy attacks that threaten the security and privacy of sensitive information. Interactions of APTs with victim system introduce information flows that are recorded in the system logs. Dynamic Information Flow Tracking (DIFT) is a promising detection mechanism for detecting APTs. DIFT taints information flows originating at system entities that are susceptible to an attack, tracks the propagation of the tainted flows, and authenticates the tainted flows at certain system components according to a pre-defined security policy. Deployment of DIFT to defend against APTs in cyber systems is limited by the heavy resource and performance overhead associated with DIFT. In this paper, we propose a resource-efficient model for DIFT by incorporating the security costs, false-positives, and false-negatives associated with DIFT. Specifically, we develop a game-theoretic framework and provide an analytical model of DIFT that enables the study of trade-off between resource efficiency and the effectiveness of detection. Our game model is a nonzero-sum, infinite-horizon, average reward stochastic game. Our model incorporates the information asymmetry between players that arises from DIFT's inability to distinguish malicious flows from benign flows and APT's inability to know the locations where DIFT performs a security analysis. Additionally, the game has incomplete information as the transition probabilities (false-positive and false-negative rates) are unknown. We propose a multiple-time scale stochastic approximation algorithm to learn an equilibrium solution of the game. We prove that our algorithm converges to an average reward Nash equilibrium. We evaluate our proposed model and algorithm on a real-world ransomware dataset and validate the effectiveness of the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题