敏捷的努力估计：我们解决了问题吗？复制研究的见解

论文标题

敏捷的努力估计：我们解决了问题吗？复制研究的见解

Agile Effort Estimation: Have We Solved the Problem Yet? Insights From A Replication Study

论文作者

Tawosi, Vali, Moussa, Rebecca, Sarro, Federica

论文摘要

在过去的十年中，几项研究探索了自动化技术，以估算敏捷软件开发的努力。我们对开创性的工作进行了密切的复制和扩展，该工作提出了对深度学习的使用进行敏捷努力估算（即深入SE），后者从那以后设定了最新的。具体而言，我们复制了三个最初的研究问题，旨在研究深入对象内和跨项目量估计的有效性。我们对三个基线（即随机，平均值和中位数估计器）进行了深入测试，以及先前提出的方法，用于估计敏捷软件项目开发工作（称为TF/IDF-SVM），如原始研究中所做的那样。为此，我们使用了原始研究中的数据以及从Tawos挖出的31,960个问题的附加数据集，因为使用更多数据使我们能够增强对结果的信心，并进一步减轻外部有效性威胁。复制的结果表明，Deep-Se的表现仅超过基线估计量和TF/IDF-SVM，仅在很少有统计显着性的情况下（分别为8/42和9/32案例），从而使先前关于深入效力的发现混淆了以前的发现。另外两个RQ表明，既不增强训练集，也没有预先训练的深入SSE游戏导致其准确性和收敛速度的提高。这些结果表明，使用语义相似性不足以区分用户故事的故事点；因此，未来的工作尚未探索和找到获得准确的敏捷软件开发估算的新技术和功能。

In the last decade, several studies have explored automated techniques to estimate the effort of agile software development. We perform a close replication and extension of a seminal work proposing the use of Deep Learning for Agile Effort Estimation (namely Deep-SE), which has set the state-of-the-art since. Specifically, we replicate three of the original research questions aiming at investigating the effectiveness of Deep-SE for both within-project and cross-project effort estimation. We benchmark Deep-SE against three baselines (i.e., Random, Mean and Median effort estimators) and a previously proposed method to estimate agile software project development effort (dubbed TF/IDF-SVM), as done in the original study. To this end, we use the data from the original study and an additional dataset of 31,960 issues mined from TAWOS, as using more data allows us to strengthen the confidence in the results, and to further mitigate external validity threats. The results of our replication show that Deep-SE outperforms the Median baseline estimator and TF/IDF-SVM in only very few cases with statistical significance (8/42 and 9/32 cases, respectively), thus confounding previous findings on the efficacy of Deep-SE. The two additional RQs revealed that neither augmenting the training set nor pre-training Deep-SE play lead to an improvement of its accuracy and convergence speed. These results suggest that using semantic similarity is not enough to differentiate user stories with respect to their story points; thus, future work has yet to explore and find new techniques and features that obtain accurate agile software development estimates.

下载PDF全文

下载文献需遵守相关版权规定

论文标题