关于基于测试套件的计划维修效率：对Java计划的16个自动修理系统的系统评估

论文标题

关于基于测试套件的计划维修效率：对Java计划的16个自动修理系统的系统评估

On the Efficiency of Test Suite based Program Repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs

论文作者

Liu, Kui, Wang, Shangwen, Koyuncu, Anil, Kim, Kisub, Bissyandé, Tegawendé F., Kim, Dongsun, Wu, Peng, Klein, Jacques, Mao, Xiaoguang, Traon, Yves Le

论文摘要

在过去十年中，基于测试的自动化程序维修一直是软件工程研究的多产领域。确实提出了许多方法，这些方法将测试套件作为一种弱但负担得起的计划规格近似。尽管文献定期为可以解决的基准错误数量创造了新的记录，但一些研究越来越引起人们对最新方法的局限性和偏见的担忧。例如，在许多研究中都对生成的斑块的正确性进行了质疑，而其他研究人员指出，评估方案可能会误导故障定位结果的处理。然而，关于程序维修的实用性，几乎没有工作解决补丁生成的效率。在本文中，我们通过对基于测试套件的计划维修的效率进行了广泛的审查来填补文献中的这一空白。我们的目的是评估生成的贴片候选物的数量，因为此信息与（1）有效地穿越搜索空间以选择感官维修尝试的策略相关，（2）最小化测试工作的策略，以确定可见的贴片（3）以及优先考虑正确的贴片的策略。为此，我们就效率进行了一项大规模的实证研究，该研究是根据Java计划的16种开源维修工具的候选贴片候选者的数量。实验是在相同的故障定位配置下仔细进行的，以限制偏见。

Test-based automated program repair has been a prolific field of research in software engineering in the last decade. Many approaches have indeed been proposed, which leverage test suites as a weak, but affordable, approximation to program specifications. Although the literature regularly sets new records on the number of benchmark bugs that can be fixed, several studies increasingly raise concerns about the limitations and biases of state-of-the-art approaches. For example, the correctness of generated patches has been questioned in a number of studies, while other researchers pointed out that evaluation schemes may be misleading with respect to the processing of fault localization results. Nevertheless, there is little work addressing the efficiency of patch generation, with regard to the practicality of program repair. In this paper, we fill this gap in the literature, by providing an extensive review on the efficiency of test suite based program repair. Our objective is to assess the number of generated patch candidates, since this information is correlated to (1) the strategy to traverse the search space efficiently in order to select sensical repair attempts, (2) the strategy to minimize the test effort for identifying a plausible patch, (3) as well as the strategy to prioritize the generation of a correct patch. To that end, we perform a large-scale empirical study on the efficiency, in terms of quantity of generated patch candidates of the 16 open-source repair tools for Java programs. The experiments are carefully conducted under the same fault localization configurations to limit biases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题