论文标题
利用源代码图的结构属性进行即时错误预测
Leveraging Structural Properties of Source Code Graphs for Just-In-Time Bug Prediction
论文作者
论文摘要
数据可视化的最常见用途是最大程度地减少复杂性,以适当理解。图是用于理解关系数据的最常用表示之一。它产生了简化的数据表示,如果以文本格式保存,则需要理解具有挑战性。在这项研究中,我们提出了一种方法,以图形的形式利用源代码的关系属性,以在软件进化和维护的不同修订期间,在软件系统中识别软件系统中的即时(JIT)错误预测。我们提出了一种将提交补丁的源代码转换为等效图表的方法,并将其命名为源代码图(SCG)。为了了解和比较多个源代码图,我们提取了这些图的几个结构属性,例如密度,周期,节点,边缘等。然后,我们利用这些SCG的属性值可视化和检测buggy软件consits。在这项调查中,我们处理了12个主题系统的246K软件。我们对用C ++和Java编程语言编写的这12个开源软件项目的调查表明,如果我们将SCG的功能与类似研究中使用的常规功能相结合,我们将获得基于机器学习(ML)的Buggy提交提交检测模型的提高。我们还发现,使用Wilcoxon签名的等级测试,在预测越野车和非猎犬的统计上具有统计学意义的F1分数的提高。由于基于SCG的功能值代表源代码更新的样式或结构属性或软件系统中的更改,因此这表明要仔细维护源代码样式或结构以保持软件系统不含错误的重要性。
The most common use of data visualization is to minimize the complexity for proper understanding. A graph is one of the most commonly used representations for understanding relational data. It produces a simplified representation of data that is challenging to comprehend if kept in a textual format. In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph to identify Just-in-Time (JIT) bug prediction in software systems during different revisions of software evolution and maintenance. We presented a method to convert the source codes of commit patches to equivalent graph representations and named it Source Code Graph (SCG). To understand and compare multiple source code graphs, we extracted several structural properties of these graphs, such as the density, number of cycles, nodes, edges, etc. We then utilized the attribute values of those SCGs to visualize and detect buggy software commits. We process more than 246K software commits from 12 subject systems in this investigation. Our investigation on these 12 open-source software projects written in C++ and Java programming languages shows that if we combine the features from SCG with conventional features used in similar studies, we will get the increased performance of Machine Learning (ML) based buggy commit detection models. We also find the increase of F1~Scores in predicting buggy and non-buggy commits statistically significant using the Wilcoxon Signed Rank Test. Since SCG-based feature values represent the style or structural properties of source code updates or changes in the software system, it suggests the importance of careful maintenance of source code style or structure for keeping a software system bug-free.