用增量训练方法基于域的基于DGA检测

论文标题

用增量训练方法基于域的基于DGA检测

Domain-Embeddings Based DGA Detection with Incremental Training Method

论文作者

Fang, Xin, Sun, Xiaoqing, Yang, Jiahai, Liu, Xinran

论文摘要

基于DGA的僵尸网络使用域生成算法（DGAS）来逃避监督，已成为对网络安全性最具破坏性威胁的一部分。在过去的几十年中，已经出现了针对领域特征的大量防御机制来解决该问题。尽管如此，由于互联网流量的大数据性质以及仅从域名中提取的语言特征的潜在事实不足，并且敌人可以轻松地锻造它们来干扰检测，但DGA检测仍然是一项艰巨而挑战的任务。在本文中，我们提出了一个新型的DGA检测系统，该系统采用增量单词插入方法来捕获最终宿主和域之间的相互作用，以表征每个IP地址的DNS查询的时间序列模式，因此探索域之间的时间相似性。我们仔细修改了2VEC算法，并利用它为超过190万个域学习动态和歧视性特征表示，并开发一个简单的分类器，以区分恶意域和良性。鉴于能够识别域的时间模式并逐步更新模型，因此提出的方案使得适应DGA域的不断变化和不断发展的策略的进步。对我们的系统进行了评估，并将其与最先进的系统Fanci和两种深度学习方法CNN和LSTM进行了比较，并与来自大型大学网络TUNET的数据进行了比较。结果表明，我们的系统在多个指标上的优于强大的竞争对手优于强大的竞争对手，同时在模型更新方面取得了显着的加速。

DGA-based botnet, which uses Domain Generation Algorithms (DGAs) to evade supervision, has become a part of the most destructive threats to network security. Over the past decades, a wealth of defense mechanisms focusing on domain features have emerged to address the problem. Nonetheless, DGA detection remains a daunting and challenging task due to the big data nature of Internet traffic and the potential fact that the linguistic features extracted only from the domain names are insufficient and the enemies could easily forge them to disturb detection. In this paper, we propose a novel DGA detection system which employs an incremental word-embeddings method to capture the interactions between end hosts and domains, characterize time-series patterns of DNS queries for each IP address and therefore explore temporal similarities between domains. We carefully modify the Word2Vec algorithm and leverage it to automatically learn dynamic and discriminative feature representations for over 1.9 million domains, and develop an simple classifier for distinguishing malicious domains from the benign. Given the ability to identify temporal patterns of domains and update models incrementally, the proposed scheme makes the progress towards adapting to the changing and evolving strategies of DGA domains. Our system is evaluated and compared with the state-of-art system FANCI and two deep-learning methods CNN and LSTM, with data from a large university's network named TUNET. The results suggest that our system outperforms the strong competitors by a large margin on multiple metrics and meanwhile achieves a remarkable speed-up on model updating.

下载PDF全文

下载文献需遵守相关版权规定

论文标题