Abstract
This paper gives a simple benchmarking procedure for companies wishing to develop measures for software quality attributes of software artefacts. The procedure does not require that a proposed measure is a consistent measure of a quality attribute. It requires only that the measure shows agreement most of the time. The procedure provides summary statistics for measures of quality attributes of a software artefact. These statistics can be used to benchmark subjective direct measurement of a quality attribute by a company’s software developers. Each proposed measure is expressed as a set of error rates for measurement on an ordinal scale and these error rates enable simple benchmarking statistics to be derived. The statistics can also be derived for any proposed objective indirect measure or prediction system for the quality attribute. For an objective measure or prediction system to be of value to the company it must be ‘better’ or ‘more objective’ than the organisation’s current measurement or prediction capability; and thus confidence that the benchmark’s objectivity has been surpassed must be demonstrated. By using Bayesian statistical inference, the paper shows how to decide whether a new measure should be considered ‘more objective’ or whether a prediction system’s predictive capability can be considered ‘better’ than the current benchmark. Furthermore, the Bayesian inferential approach is easy to use and provides clear advantages for quantifying and inferring differences in objectivity.
Similar content being viewed by others
References
Altman, D. G. (1999). Practical statistics for medical research. London: Chapman and Hall/CRC.
Coleman, D., Ash, D., Lowther, D., & Oman. P. (1994). Using metrics to evaluate software systems maintainability. IEEE Computer, August 44–49.
Conte, S. D., Dunsmore, H. D., & Shen, V. Y. (1986). Software engineering metrics and models. Menlo Park, CA: Benjamin-Cummings.
CMMI. (2001). Team capability maturity model integration for systems engineering, software engineering, integrated product and process development, and supplier sourcing (CMMIS-SE/SW/IPPD/SS), version 1.1. Continuous Representation, Software Engineering Institute, Carnegie-Mellon University, Pittsburgh, December.
Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1), 20–28.
Fenton, N. (1994). Software measurement: A necessary scientific basis.IEEE Transactions on Software Engineering, 20(3), 199–206.
Fenton, N. E., Krause, P., & Neil M. (2002). Software measurement: Uncertainty and causal modelling. IEEE Software, 10(4), 116–122.
Fenton, N. E., & Neil, M. (1998). A strategy for improving safety related software engineering standards. IEEE Transactions on Software Engineering, 24(11), 1002–1013.
Fenton, N. E., & Pfleeger, S. L. (1997). Software metrics: A rigorous and practical approach (2nd ed.). PWS Publishing.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1998). Bayesian data analysis. Chapman & Hall.
ISO/IEC 9126-1:2001, Software engineering – Product quality – Part 1: Quality model, International Standardisation.
Khoshgoftaar, T. M., Seliya, N., & Gao, K. (2005). Assessment of a new three-group software quality classification technique: An empirical case study. Journal of Empirical Software Engineering, 10(3), 183–218.
Kitchenham, B., Pfleeger, S. L., & Fenton, N. (1995). Towards a framework for software validation. IEEE Transactions on Software Engineering, 21(12), 929–944.
Kitchenham, B., & Pfleeger, S. L. (2003). Principles of survey research part 6: Data analysis, ACM SIGSOFT. Software Engineering Notes, 28(2), 24–27.
Kyburg, H. E. (1984). Theory and measurement. Cambridge: Cambridge University Press.
Lindley, D. V. (2000). The philosophy of statistics. The Statistician, 49(3), 293–337.
Moses, J. (2000). Bayesian probability distributions for assessing subjectivity in the measurement of subjective software attributes. Information and Software Technology, 42(8), 533–546.
Moses, J. (2001). A consideration of the impact of interactions with module effects on the direct measurement of subjective software attributes. In 7th IEEE symposium on software metrics, London, UK, pp. 112–123, April.
Roberts, F. S. (1979). Measurement theory, encyclopedia of mathematics and its applications (Vol. 7). Massachusetts: Addison-Wesley Publishing Company.
Shepperd, M. (1990). Early life-cycle metrics and software quality models. Information and Software Technology, 32(4), 311–316.
Smith, J. Q. (1992). Decision analysis: A Bayesian approach. Chapman and Hall
Spiegelhalter, D. J., Thomas, A., Best. N., & Gilks, W. (1996). BUGS 0.5, Bayesian inference using Gibbs sampling manual (version ii). MRC Biostatistics Unit, Cambridge, August.
Stevens, W. P., Myers, G. J., & Constantine, L. L. (1974). Structured design. IBM Systems Journal, 13(2), 115–139.
Yourdon, E., & Constantine, L. L. (1979). Structured design. Englewood Cliffs, NJ: Prentice.
Acknowledgements
The author wishes to thank Professor Martin Shepperd of the University of Brunel for access to the maintainability classification data. In addition, the author acknowledges the ESRC and MRC-Cambridge for the use of the BUGS simulation program.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Moses, J. Benchmarking quality measurement. Software Qual J 15, 449–462 (2007). https://doi.org/10.1007/s11219-007-9025-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-007-9025-4