论文标题
评估软件产品质量指标
Evaluation of Software Product Quality Metrics
论文作者
论文摘要
计算设备和相关的软件控制日常生活,并构成银行,医疗保健,汽车和其他领域的安全关键系统的骨干。增加系统复杂性,快速发展的技术和范式转移使软件质量研究处于最前沿。 ISO的25010之类的标准在诸如可维护性,可靠性和安全性等子字符方面表达了它。大量的文献试图将这些子字体与软件度量值联系起来,最终目标是创建基于指标的软件产品质量模型。但是,研究还确定了最重要的现有障碍。其中我们提到了软件应用程序类型,开发平台和语言的多样性。此外,统一的定义使软件指标真正地语言不可能存在,并且很难实施给定的编程语言级别的多样性。许多现有研究没有详细说明其方法和工具,这使这一事实更加复杂,这使研究人员无法进行调查,无法在更大范围内进行数据分析。在我们的论文中,我们在三个复杂的开源应用程序的背景下对度量值进行了全面研究。我们将方法和工具与现有研究的方法保持一致,并详细介绍它,以促进比较评估。我们在目标应用的整个18年发展历史上研究了度量值,以捕捉我们发现现有文献缺乏的纵向观点。我们确定度量依赖性并检查其在应用程序及其版本之间的一致性。在每个步骤中,我们对现有研究进行了比较评估,并提出结果。
Computing devices and associated software govern everyday life, and form the backbone of safety critical systems in banking, healthcare, automotive and other fields. Increasing system complexity, quickly evolving technologies and paradigm shifts have kept software quality research at the forefront. Standards such as ISO's 25010 express it in terms of sub-characteristics such as maintainability, reliability and security. A significant body of literature attempts to link these subcharacteristics with software metric values, with the end goal of creating a metric-based model of software product quality. However, research also identifies the most important existing barriers. Among them we mention the diversity of software application types, development platforms and languages. Additionally, unified definitions to make software metrics truly language-agnostic do not exist, and would be difficult to implement given programming language levels of variety. This is compounded by the fact that many existing studies do not detail their methodology and tooling, which precludes researchers from creating surveys to enable data analysis on a larger scale. In our paper, we propose a comprehensive study of metric values in the context of three complex, open-source applications. We align our methodology and tooling with that of existing research, and present it in detail in order to facilitate comparative evaluation. We study metric values during the entire 18-year development history of our target applications, in order to capture the longitudinal view that we found lacking in existing literature. We identify metric dependencies and check their consistency across applications and their versions. At each step, we carry out comparative evaluation with existing research and present our results.