论文标题
DeepVulseeker:通过代码图结构和预训练机制的新型漏洞识别框架
DeepVulSeeker: A Novel Vulnerability Identification Framework via Code Graph Structure and Pre-training Mechanism
论文作者
论文摘要
软件漏洞会对计算系统造成严重伤害。它们可能导致系统崩溃,隐私泄漏甚至身体损坏。到目前为止,正确识别巨大软件代码之间的漏洞是修补它们的基本先决条件。不幸的是,当前的脆弱性识别方法,无论是经典的方法还是深度学习方法,都有几个关键的缺点,使它们无法满足软件行业提出的当今要求。为了克服缺点,在本文中,我们提出了一种新颖的完全自动化的漏洞识别框架DeepVulseeker,该框架在最近先进的图形表示自我注意力和训练机制的帮助下利用代码图结构和语义特征。我们的实验表明,DeepVulseeker不仅在传统的CWE数据集上达到高达0.99的精度,而且在两个高度复杂的数据集上的所有其他易位方法都优于所有其他易位方法。我们还根据三个案例研究作证了Deepvulseeker,并发现Deepvulseeker能够理解漏洞的含义。我们已经完全实施了Deepvulseeker,并为将来的后续研究开了开源。
Software vulnerabilities can pose severe harms to a computing system. They can lead to system crash, privacy leakage, or even physical damage. Correctly identifying vulnerabilities among enormous software codes in a timely manner is so far the essential prerequisite to patch them. Unfortantely, the current vulnerability identification methods, either the classic ones or the deep-learning-based ones, have several critical drawbacks, making them unable to meet the present-day demands put forward by the software industry. To overcome the drawbacks, in this paper, we propose DeepVulSeeker, a novel fully automated vulnerability identification framework, which leverages both code graph structures and the semantic features with the help of the recently advanced Graph Representation Self-Attention and pre-training mechanisms. Our experiments show that DeepVulSeeker not only reaches an accuracy as high as 0.99 on traditional CWE datasets, but also outperforms all other exisiting methods on two highly-complicated datasets. We also testified DeepVulSeeker based on three case studies, and found that DeepVulSeeker is able to understand the implications of the vulnerbilities. We have fully implemented DeepVulSeeker and open-sourced it for future follow-up research.