论文标题

大数据=大见解?在大规模的GitHub数据集中操作Brooks定律

Big Data = Big Insights? Operationalising Brooks' Law in a Massive GitHub Data Set

论文作者

Gote, Christoph, Mavrodiev, Pavlin, Schweitzer, Frank, Scholtes, Ingo

论文摘要

来自软件存储库和协作工具的大量数据被广泛用于研究软件开发中的社会方面。最近几项工作解决的一个问题是软件项目的规模和结构如何影响团队的生产力,这是布鲁克斯定律中著名考虑的一个问题。最近使用大量存储库数据的研究表明,较大团队中的开发人员的生产力往往比较小的团队低。尽管使用了类似的方法和数据,但其他研究还是在团队规模和生产率之间建立了积极的线性甚至超级线性关系,从而使软件经济学的观点与软件项目的观点进行了争论,即软件项目是规模的不经济。在我们的工作中,我们研究了可以解释大量存储库数据中开发人员生产率的最新研究之间的分歧。据我们所知,我们进一步提供了量身定制的GitHub项目的最大,策划的GitHub项目,以研究团队规模和协作模式对个人和集体生产力的影响。我们的工作有助于在对成功软件项目决定因素的假设进行操作中进行有关生产力指标选择的持续讨论。它进一步强调了大数据分析中的一般陷阱,并表明使用较大的数据集并不会自动带来更可靠的见解。

Massive data from software repositories and collaboration tools are widely used to study social aspects in software development. One question that several recent works have addressed is how a software project's size and structure influence team productivity, a question famously considered in Brooks' law. Recent studies using massive repository data suggest that developers in larger teams tend to be less productive than smaller teams. Despite using similar methods and data, other studies argue for a positive linear or even super-linear relationship between team size and productivity, thus contesting the view of software economics that software projects are diseconomies of scale. In our work, we study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data. We further provide, to the best of our knowledge, the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity. Our work contributes to the ongoing discussion on the choice of productivity metrics in the operationalisation of hypotheses about determinants of successful software projects. It further highlights general pitfalls in big data analysis and shows that the use of bigger data sets does not automatically lead to more reliable insights.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源