预测软件存储库中问题报告的目标和优先级

论文标题

预测软件存储库中问题报告的目标和优先级

Predicting the Objective and Priority of Issue Reports in Software Repositories

论文作者

Izadi, Maliheh, Akbari, Kiana, Heydarnoori, Abbas

论文摘要

开发人员协作讨论，实施，使用和共享在软件存储库上托管的软件实体。正确的文档在成功的软件管理和维护中起着重要作用。用户利用问题跟踪系统，软件存储库的设施，以跟踪问题报告，管理工作量和流程，最后记录团队工作的亮点。问题报告是协作策划的软件知识的丰富来源，可以包含报告的问题，新功能的请求，或者只是有关软件产品的问题。随着这些问题的数量增加，手动管理它们变得更加困难。 GitHub提供标签标签，以作为问题管理的手段。但是，Github最高1000个存储库中约有一半的问题没有任何标签。我们旨在自动化管理软件团队发行报告的过程。我们提出了一种两阶段的方法，以使用功能工程方法和最先进的文本分类器来预测打开问题的目标及其优先级别的目标。据我们所知，我们是第一个对发行分类的变压器进行微调的人。我们在基于项目和跨项目的设置中训练和评估我们的模型。后一种方法提供了适用于任何看不见的软件项目或几乎没有历史数据的项目的通用预测模型。我们提出的方法可以成功预测问题报告的客观和优先级，分别为82％和75％的精度。此外，我们对六个看不见的GitHub项目的未标记问题进行了人体标签和评估，以评估跨项目模型在新数据上的性能。该模型的精度达到90％。我们获得了85％的平均百分比一致性，并获得了71％的Randolph的自由 - 边界Kappa，这些Kappa转化为标签者之间的实质性协议。

Developers collaboratively discuss, implement, use, and share software entities hosted on software repositories. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team's effort. An issue report is a rich source of collaboratively curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub's top 1000 repositories do not have any labels. We aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with 82% and 75% accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves 90% accuracy. We obtain 85% average Percent Agreement and 71% Randolph's free-marginal Kappa translating to substantial agreement among labelers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题