论文标题

聚宝盆:反馈指导产生二进制的框架

Cornucopia: A Framework for Feedback Guided Generation of Binaries

论文作者

Singhal, Vidush, Pillai, Akul Abhilash, Saumya, Charitha, Kulkarni, Milind, Machiry, Aravind

论文摘要

二进制分析是许多安全和软件工程应用程序所需的重要功能。因此,有许多具有多种功能的二元分析技术和工具。但是,测试这些工具需要具有相应源级信息的大型二进制数据集。在本文中,我们提出了聚宝盆,这是一种架构不可知的自动化框架,可以通过利用编译器优化和反馈指导的学习来从相应的程序来源生成大量的二进制文件。我们的评估表明,聚宝盆能够在每个程序平均为403个二进制文件(X86,X64,ARM,MIPS)中生成309K二进制文件,每个程序平均均为403个二进制文件,并且胜过Bintuner,这是一种类似的技术。我们的实验揭示了LLVM优化调度程序的问题,导致编译器崩溃($ \ sim $ 300)。我们对使用聚宝盆产生的二进制文件的四种流行的二进制分析工具ANGR,GHIDRA,IDAPRO和RADARE的评估揭示了这些工具的各种问题。具体来说,我们在ANGR中发现了263次撞车,在IDAPRO中发现了一个内存腐败问题。我们对分析结果的差异测试揭示了这些工具中的各种语义错误。我们还测试了机器学习工具,即ASMVEC,Safe和Debin,声称捕获二进制语义,并表明它们的表现不佳(例如,在报告的二进制文件中,Debin F1得分从报告的63.1%降至12.9%)。总而言之,我们详尽的评估表明,聚宝盆是产生有效测试二进制分析技术的二进制方法的有效机制。

Binary analysis is an important capability required for many security and software engineering applications. Consequently, there are many binary analysis techniques and tools with varied capabilities. However, testing these tools requires a large, varied binary dataset with corresponding source-level information. In this paper, we present Cornucopia, an architecture agnostic automated framework that can generate a plethora of binaries from corresponding program source by exploiting compiler optimizations and feedback-guided learning. Our evaluation shows that Cornucopia was able to generate 309K binaries across four architectures (x86, x64, ARM, MIPS) with an average of 403 binaries for each program and outperforms Bintuner, a similar technique. Our experiments revealed issues with the LLVM optimization scheduler resulting in compiler crashes ($\sim$300). Our evaluation of four popular binary analysis tools Angr, Ghidra, Idapro, and Radare, using Cornucopia generated binaries, revealed various issues with these tools. Specifically, we found 263 crashes in Angr and one memory corruption issue in Idapro. Our differential testing on the analysis results revealed various semantic bugs in these tools. We also tested machine learning tools, Asmvec, Safe, and Debin, that claim to capture binary semantics and show that they perform poorly (For instance, Debin F1 score dropped to 12.9% from reported 63.1%) on Cornucopia generated binaries. In summary, our exhaustive evaluation shows that Cornucopia is an effective mechanism to generate binaries for testing binary analysis techniques effectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源