论文标题
重新访问开源软件系统中的重新打开错误
Revisiting reopened bugs in open source software systems
论文作者
论文摘要
重新打开的错误可能会降低软件系统的整体质量,因为它们需要开发人员不必要的返工。此外,重新打开的错误还导致对最终用户对软件质量的信任丧失。因此,预测可能重新打开的错误对于软件开发人员避免返工可能非常有帮助。对重新开放预测的先前研究仅关注三个开源项目(即Apache,Eclipse和OpenOffice),以生成见解。我们观察到三个项目中的一个(即Apache)有数据泄漏问题 - 重新打开的错误状态作为培训数据,以预测重新打开的错误。此外,先前的研究还使用了过时的预测模型管道(即使用用于构建预测模型的旧技术)来预测重新打开的错误。因此,我们在大规模数据集上重新打开了重新开放的错误研究,该数据集由JIRA跟踪的47个项目组成,使用Smote等现代技术以及7种不同的机器学习模型以及7种不同的机器学习模型。我们使用混合方法方法(即定量和定性研究)研究重新打开的错误。我们发现:1)使用更新的重新打开的错误预测模型管道后,只有34%的项目以AUC> = 0.7提供可接受的性能。 2)重新打开错误的主要原因有四个主要原因,即技术(即补丁/集成问题),文档,人类(即由于错误的错误评估)以及错误报告中未显示的原因。 3)在具有可接受的AUC的项目中,有94%的重新打开的错误是由于修补程序问题(即使用不正确的补丁程序)在重新打开之前确定的。我们的研究重新审视了错误,并为开发人员重新开放活动提供了新的见解。
Reopened bugs can degrade the overall quality of a software system since they require unnecessary rework by developers. Moreover, reopened bugs also lead to a loss of trust in the end-users regarding the quality of the software. Thus, predicting bugs that might be reopened could be extremely helpful for software developers to avoid rework. Prior studies on reopened bug prediction focus only on three open source projects (i.e., Apache, Eclipse, and OpenOffice) to generate insights. We observe that one out of the three projects (i.e., Apache) has a data leak issue -- the bug status of reopened was included as training data to predict reopened bugs. In addition, prior studies used an outdated prediction model pipeline (i.e., with old techniques for constructing a prediction model) to predict reopened bugs. Therefore, we revisit the reopened bugs study on a large scale dataset consisting of 47 projects tracked by JIRA using the modern techniques such as SMOTE, permutation importance together with 7 different machine learning models. We study the reopened bugs using a mixed methods approach (i.e., both quantitative and qualitative study). We find that: 1) After using an updated reopened bug prediction model pipeline, only 34% projects give an acceptable performance with AUC >= 0.7. 2) There are four major reasons for a bug getting reopened, that is, technical (i.e., patch/integration issues), documentation, human (i.e., due to incorrect bug assessment), and reasons not shown in the bug reports. 3) In projects with an acceptable AUC, 94% of the reopened bugs are due to patch issues (i.e., the usage of an incorrect patch) identified before bug reopening. Our study revisits reopened bugs and provides new insights into developer's bug reopening activities.