论文标题
MODX:通过程序模块化和语义匹配的二进制部分导入的第三方库检测
Modx: Binary Level Partial Imported Third-Party Library Detection through Program Modularization and Semantic Matching
论文作者
论文摘要
随着软件的快速增长,使用第三方库(TPLS)变得越来越流行。图书馆使用的繁荣为软件工程师提供了一些促进和促进计划开发的方法。不幸的是,由于管理大量图书馆变得更加困难,这也带来了巨大的挑战。已经提出了研究和研究来检测和了解软件中的TPL。但是,大多数现有的方法都取决于句法特征,当这些特征更改或故意被对抗方隐藏时,这些特征并不强大。此外,这些方法通常将每个导入的库整体建模,因此不能应用于主机软件仅部分使用库代码段的方案。 为了在语义层面上完全和部分导入的TPL,我们提出了MODX,该框架利用新颖的程序模块化技术将程序分解为基于功能的模块。通过提取句法和语义特征,它可以测量模块之间的距离,以检测程序中相似的库模块。实验结果表明,MODX通过区分更多相干的程序模块,以提高模块质量得分353%并击败其他TPL检测工具,平均精确度提高了17%,回忆中的8%更好。
With the rapid growth of software, using third-party libraries (TPLs) has become increasingly popular. The prosperity of the library usage has provided the software engineers with handful of methods to facilitate and boost the program development. Unfortunately, it also poses great challenges as it becomes much more difficult to manage the large volume of libraries. Researches and studies have been proposed to detect and understand the TPLs in the software. However, most existing approaches rely on syntactic features, which are not robust when these features are changed or deliberately hidden by the adversarial parties. Moreover, these approaches typically model each of the imported libraries as a whole, therefore, cannot be applied to scenarios where the host software only partially uses the library code segments. To detect both fully and partially imported TPLs at the semantic level, we propose ModX, a framework that leverages novel program modularization techniques to decompose the program into finegrained functionality-based modules. By extracting both syntactic and semantic features, it measures the distance between modules to detect similar library module reuse in the program. Experimental results show that ModX outperforms other modularization tools by distinguishing more coherent program modules with 353% higher module quality scores and beats other TPL detection tools with on average 17% better in precision and 8% better in recall.