Unifuzz：一个整体务实的指标驱动的平台，用于评估模糊器

论文标题

Unifuzz：一个整体务实的指标驱动的平台，用于评估模糊器

UNIFUZZ: A Holistic and Pragmatic Metrics-Driven Platform for Evaluating Fuzzers

论文作者

Li, Yuwei, Ji, Shouling, Chen, Yuan, Liang, Sizhuang, Lee, Wei-Han, Chen, Yueyao, Lyu, Chenyang, Wu, Chunming, Beyah, Raheem, Cheng, Peng, Lu, Kangjie, Wang, Ting

论文摘要

文献中提出了一系列模糊工具（模糊），旨在有效，有效地检测软件漏洞。迄今为止，由于基准，绩效指标和/或评估环境的不一致，比较模糊，这仍然是挑战，这些基准，性能指标和/或环境掩盖了有用的见解，从而阻碍了有前途的模糊基原始人的发现。在本文中，我们设计和开发Unifuzz，这是一个开源和指标驱动的平台，用于以全面和定量的方式评估模糊器。具体而言，迄今为止的Unifuzz已结合了35个可用的模糊器，20个现实世界程序的基准和六种性能指标。我们首先系统地研究了现有模糊器的可用性，查找并修复了许多缺陷，并将其集成到Unifuzz中。根据这项研究，我们提出了一系列务实的性能指标，以从六个互补的角度评估模糊。使用Unifuzz，我们对包括AFL [1]，AFLFAST [2]，Angora [3]，Honggfuzz [4]，MOPT [5]，QSYM [6]，T-Fuuzz [7]和Vuzzer64 [8]等几种突出的爆炸器进行了深入的评估。我们发现，它们都不表现出所有目标程序的表现，并且使用单个指标评估魔力机的性能可能会导致单方面结论，这表明了综合指标的重要性。此外，我们识别并调查了可能会严重影响灾难的性能的先前被忽视的因素，包括仪器方法和崩溃分析工具。我们的经验结果表明，它们对于评估模糊剂至关重要。我们希望我们的发现能够阐明可靠的模糊评估，以便我们可以发现有希望的模糊原始素，以便将来有效地促进模糊设计。

A flurry of fuzzing tools (fuzzers) have been proposed in the literature, aiming at detecting software vulnerabilities effectively and efficiently. To date, it is however still challenging to compare fuzzers due to the inconsistency of the benchmarks, performance metrics, and/or environments for evaluation, which buries the useful insights and thus impedes the discovery of promising fuzzing primitives. In this paper, we design and develop UNIFUZZ, an open-source and metrics-driven platform for assessing fuzzers in a comprehensive and quantitative manner. Specifically, UNIFUZZ to date has incorporated 35 usable fuzzers, a benchmark of 20 real-world programs, and six categories of performance metrics. We first systematically study the usability of existing fuzzers, find and fix a number of flaws, and integrate them into UNIFUZZ. Based on the study, we propose a collection of pragmatic performance metrics to evaluate fuzzers from six complementary perspectives. Using UNIFUZZ, we conduct in-depth evaluations of several prominent fuzzers including AFL [1], AFLFast [2], Angora [3], Honggfuzz [4], MOPT [5], QSYM [6], T-Fuzz [7] and VUzzer64 [8]. We find that none of them outperforms the others across all the target programs, and that using a single metric to assess the performance of a fuzzer may lead to unilateral conclusions, which demonstrates the significance of comprehensive metrics. Moreover, we identify and investigate previously overlooked factors that may significantly affect a fuzzer's performance, including instrumentation methods and crash analysis tools. Our empirical results show that they are critical to the evaluation of a fuzzer. We hope that our findings can shed light on reliable fuzzing evaluation, so that we can discover promising fuzzing primitives to effectively facilitate fuzzer designs in the future.

下载PDF全文

下载文献需遵守相关版权规定

论文标题