论文标题

利用缺陷的生命周期来标记受影响的版本和有缺陷的类

Leveraging the Defects Life Cycle to Label Affected Versions and Defective Classes

论文作者

Vandehei, Bailey, da Costa, Daniel Alencar, Falessi, Davide

论文摘要

最近的两项研究明确地建议使用发行跟踪器中可用的受影响版本(AV)在发行版中标记有缺陷的类。我们的研究的目的是三重:1)测量现实方法可用的缺陷的比例,2)提出一种检索缺陷的AV的方法,从而使AVS不可用时可用的现实方法可用,3)比较提议的方法与三个SZZ实施的准确性。我们提出的方法的假设是,在发现和固定这些缺陷之前,缺陷的生命周期具有稳定的生命周期。结果与来自Apache生态系统的212个开源项目有关,总共约有125,000个缺陷,表明现实方法在大多数缺陷(51%)中不能使用。因此,开发自动化方法以检索AVS很重要。与Apache生态系统的76个开源项目相关的结果,总共约为6,250,000个类,受60,000个缺陷的影响,并分布超过4,000个版本和760,000个投入,表明缺陷发现和固定之间的版本数量相当稳定(STDV <2)是相同的投影量。此外,所提出的方法比(i)检索AVS,(ii)标记类别的所有三个SZZ实现的方法明显更准确,并且在开发缺陷的缺陷中将类别标记为(iii)可以执行特征选择。因此,当现实的方法无法使用时,提出的方法是用于检索缺陷来源的有效自动化替代方案。最后,鉴于SZZ的准确性较低,研究人员应考虑重新执行使用SZZ作为甲骨文的研究,并且通常,应该更喜欢选择具有很高比例可用且一致的AVS的项目。

Two recent studies explicitly recommend labeling defective classes in releases using the affected versions (AV) available in issue trackers. The aim our study is threefold: 1) to measure the proportion of defects for which the realistic method is usable, 2) to propose a method for retrieving the AVs of a defect, thus making the realistic approach usable when AVs are unavailable, 3) to compare the accuracy of the proposed method versus three SZZ implementations. The assumption of our proposed method is that defects have a stable life cycle in terms of the proportion of the number of versions affected by the defects before discovering and fixing these defects. Results related to 212 open-source projects from the Apache ecosystem, featuring a total of about 125,000 defects, reveal that the realistic method cannot be used in the majority (51%) of defects. Therefore, it is important to develop automated methods to retrieve AVs. Results related to 76 open-source projects from the Apache ecosystem, featuring a total of about 6,250,000 classes, affected by 60,000 defects, and spread over 4,000 versions and 760,000 commits, reveal that the proportion of the number of versions between defect discovery and fix is pretty stable (STDV < 2) across the defects of the same project. Moreover, the proposed method resulted significantly more accurate than all three SZZ implementations in (i) retrieving AVs, (ii) labeling classes as defective, and (iii) in developing defects repositories to perform feature selection. Thus, when the realistic method is unusable, the proposed method is a valid automated alternative to SZZ for retrieving the origin of a defect. Finally, given the low accuracy of SZZ, researchers should consider re-executing the studies that have used SZZ as an oracle and, in general, should prefer selecting projects with a high proportion of available and consistent AVs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源