Tests for consistent measurement of external subjective software quality attributes

Moses, John; Farrow, Malcolm

doi:10.1007/s10664-007-9058-0

Tests for consistent measurement of external subjective software quality attributes

Published: 30 January 2008

Volume 13, pages 261–287, (2008)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

John Moses¹ &
Malcolm Farrow²

234 Accesses
5 Citations
Explore all metrics

Abstract

One reason that researchers may wish to demonstrate that an external software quality attribute can be measured consistently is so that they can validate a prediction system for the attribute. However, attempts at validating prediction systems for external subjective quality attributes have tended to rely on experts indicating that the values provided by the prediction systems informally agree with the experts’ intuition about the attribute. These attempts are undertaken without a pre-defined scale on which it is known that the attribute can be measured consistently. Consequently, a valid unbiased estimate of the predictive capability of the prediction system cannot be given because the experts’ measurement process is not independent of the prediction system’s values. Usually, no justification is given for not checking to see if the experts can measure the attribute consistently. It seems to be assumed that: subjective measurement isn’t proper measurement or subjective measurement cannot be quantified or no one knows the true values of the attributes anyway and they cannot be estimated. However, even though the classification of software systems’ or software artefacts’ quality attributes is subjective, it is possible to quantify experts’ measurements in terms of conditional probabilities. It is then possible, using a statistical approach, to assess formally whether the experts’ measurements can be considered consistent. If the measurements are consistent, it is also possible to identify estimates of the true values, which are independent of the prediction system. These values can then be used to assess the predictive capability of the prediction system. In this paper we use Bayesian inference, Markov chain Monte Carlo simulation and missing data imputation to develop statistical tests for consistent measurement of subjective ordinal scale attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Article Open access 01 April 2016

What is Qualitative in Research

Article Open access 28 October 2021

References

Agresti A (1988) A model for agreement between ratings on an ordinal scale. Biometrics 44(2):539–548
Article MATH MathSciNet Google Scholar
Agresti A (2002) Categorical data analysis, 2nd edn. John Wiley and Sons
Albert J (1992) Bayesian estimation of normal ogive item response curves using Gibbs sampling. J Educ Behav Stat 17:251–269
Article MathSciNet Google Scholar
Altman DG (1999) Practical statistics for medical research. Chapman and Hall, London
Google Scholar
Aranda J, Easterbrook S (2005) Anchoring and adjustment in software estimation. Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Lisbon, Portugal, September 05–09, 2005, ESEC/FSE-13. ACM, New York, pp 346–355
Book Google Scholar
Bartholomew D, Knott M (1999) Latent variable models and factor analysis. Kendall’s Library of Statistics, 7, Chapman and Hall
Bland JM, Altman DG (1997) Statistics notes: Cronbach’s alpha. Br Med J 314:572–522 February
Google Scholar
Cartwright MH, Shepperd MJ, Song Q (2003) Dealing with missing software project data, 9th International Software Metrics Symposium (METRICS’03), September, pp. 154–166
Coleman D, Ash D, Lowther D, Oman P (1994) Using metrics to evaluate software systems maintainability. IEEE Computer 27(8):44–49
Google Scholar
Congdon P (2001) Bayesian statistical modelling. Wiley Series in Probability and Statistics, John Wiley and Sons Ltd
MATH Google Scholar
Coniam SW, Diamond AW (1994) Practical pain management, ISBN-13: 978-0-19-262404-8, December
Cronbach LJ (1951) Coefficient alpha and the internal structure of tests. Psychometrika 16(3):297‑334 September
Article Google Scholar
Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM Algorithm. Appl Stat 28(1):20–28
Article Google Scholar
DeMarco T (1982) Controlling software projects. Yourdon, New York
Google Scholar
Domhoff GW (1999) New directions in the study of dream content using the Hall/Van de Castle coding system. Dreaming 9:115–137
Article Google Scholar
Fenton N (1994) Software measurement: a necessary scientific basis. IEEE Trans Softw Eng 20(3):199–206, March
Article Google Scholar
Fenton NE, Neil M (1998) A strategy for improving safety related software engineering standards. IEEE Trans Softw Eng 24(11):1002–1013, November
Article Google Scholar
Fenton NE, Neil M (1999) A critique of software defect prediction models. IEEE Trans Softw Eng 25(5):675–689, August
Article Google Scholar
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychol Bull 76(5):378–382
Article Google Scholar
Gelman A, Carlin JB, Stern HS, Rubin DB (1998) Bayesian data analysis. Chapman & Hall, London
MATH Google Scholar
Geman S, Geman D (1984) Stochastic relaxation. Gibbs distribution and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741
MATH Google Scholar
Gilks WR, Richardson S, Spiegelhalter DJ (1997) Introducing Markov chain Monte Carlo. In: Gilks WR, Richardson S (eds) Markov chain Monte Carlo in practice. Chapman and Hall, Interdisciplinary Series, Spiegelhalter, pp 1–19
Google Scholar
Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109
Article MATH Google Scholar
Hughes RT (1996) Expert judgement as an estimating method. Inf Softw Technol 38:67–75
Article Google Scholar
ISO/IEC 9126-1:2001, 20001, Software engineering—Product quality—Part 1: quality model, International Standardisation.
Kendall MG, Stuart A (1973) The advanced theory of statistics, Volume, Inference and Relationship, 3rd edition. Griffin
Kitchenham B, Pfleeger SL (2003) Principles of survey research part 6: data analysis. ACM SIGSOFT, Software Engineering Notes 28(2):24–27, March
Article Google Scholar
Kyburg HE (1984) Theory and measurement. Cambridge University Press, Cambridge
Google Scholar
Lindley DV (2000) The philosophy of statistics. The Statistician 49(3):293–233
Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. John Wiley, New York
MATH Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machine. J Chem Phys 21:1087–1091
Article Google Scholar
Moses J (2000) Bayesian probability distributions for assessing subjectivity in the measurement of subjective software attributes. Inf Softw Technol 42(8):533–546, May
Article MathSciNet Google Scholar
Moses J (2001) A consideration of the impact of interactions with module effects on the direct measurement of subjective software attributes. 7th IEEE Symposium on Software Metrics, London, UK, pp 112–123, April
Google Scholar
Moses J (2007) Benchmarking quality measurement. Software Quality Journal 15(4)
Moses J, Farrow M (2004) A consideration of the variation in development effort consistency due to function points, 1st Software Measurement European Forum, Istituto di Ricerca Internazionale, 28–30 January, Rome, Italy, ISBN 88-86674-33-3, pp 247–256
Moses J, Farrow M (2005) Assessing variation in development effort consistency using a data source with missing data. Softw Qual J 13:71–89
Article Google Scholar
Myrtveit I, Stensrud E, Olsson UH (2001) Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods. IEEE Transactions on Software Engineering, pp. 999–1013, November
Pendharkar PC, Subramanian GH, Rodger JA (2005) A probabilistic model for predicting software development effort. IEEE Trans Softw Eng 31(7):615–624, July
Article Google Scholar
Roberts FS (1979) Measurement theory, Encyclopedia of mathematics and its applications, Volume 7. Addison-Wesley, Massachusetts
Google Scholar
Rosenberg J (1997) Problems and prospects in quantifying software maintainability. Journal of Empirical Software Engineering 2(2):173–177, June
Article Google Scholar
Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat 12(4):1151–1172, December
Article MATH Google Scholar
Seigel S, Castellan NJ (1998) Nonparametric statistics for the behavioral sciences, 2nd edn. McGraw-Hill, New York
Google Scholar
Shepperd M (1990) Early life-cycle metrics and software quality models. Inf Softw Technol 32(4):311–316, May
Article Google Scholar
Spiegelhalter DJ, Stovin PGI (1983) An analysis of biopsies following cardiac transplantation. Stat Med 2:33–40, Pub. J. Wiley & Sons
Article Google Scholar
Spiegelhalter DJ, Thomas A, Best N, Gilks W (1996) BUGS 0.5, Bayesian Inference Using Gibbs Sampling Manual (version ii). MRC Biostatistics Unit, Cambridge, August
Google Scholar
Strike K, El Emam K, Madhavji N (2001) Software cost estimation with incomplete data. IEEE Trans Softw Eng 27(10):890–908, October
Article Google Scholar
Wilson ME, Williams NB, Baskett PJF, Skene AM (1980) Assessment of fitness for surgical procedures and the variability of anaesthetist’s judgements. Br Med J 23rd February
Yu L, Schach SR, Chen K, Offutt J (2004) Categorization of common coupling and its application to the maintainability of the Linux Kernel. IEEE Trans Softw Eng 30(10):694–706, October
Article Google Scholar

Download references

Acknowledgements

The authors wish to thank Professor Martin Shepperd at the University of Brunel for access to the maintainability classification data. In addition, the authors acknowledge the ESRC and MRC-Cambridge for the use of the BUGS simulation program. The authors also wish to thank the reviewers who have helped improve the clarity of the original manuscript.

Author information

Authors and Affiliations

School of Computing and Technology, University of Sunderland, Sunderland, SR6 0DD, UK
John Moses
Department of Mathematics and Statistics, University of Newcastle-Upon-Tyne, Newcastle-Upon-Tyne, NE1 7RU, UK
Malcolm Farrow

Authors

John Moses
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Farrow
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Moses.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moses, J., Farrow, M. Tests for consistent measurement of external subjective software quality attributes. Empir Software Eng 13, 261–287 (2008). https://doi.org/10.1007/s10664-007-9058-0

Download citation

Received: 29 August 2006
Accepted: 22 November 2007
Published: 30 January 2008
Issue Date: June 2008
DOI: https://doi.org/10.1007/s10664-007-9058-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tests for consistent measurement of external subjective software quality attributes

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

What is Qualitative in Research

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tests for consistent measurement of external subjective software quality attributes

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

What is Qualitative in Research

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation