A statistical study of the relevance of lines of code measures in software projects

Barb, Adrian S.; Neill, Colin J.; Sangwan, Raghvinder S.; Piovoso, Michael J.

doi:10.1007/s11334-014-0231-5

A statistical study of the relevance of lines of code measures in software projects

Original Paper
Published: 07 May 2014

Volume 10, pages 243–260, (2014)
Cite this article

Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Adrian S. Barb¹,
Colin J. Neill¹,
Raghvinder S. Sangwan¹ &
…
Michael J. Piovoso¹

518 Accesses
5 Citations
Explore all metrics

Abstract

Lines of code metrics are routinely used as measures of software system complexity, programmer productivity, and defect density, and are used to predict both effort and cost. The guidelines for using a direct metric, such as lines of code, as a proxy for a quality factor such as complexity or defect density, or in derived metrics such as cost and effort are clear. Amongst other criteria, the direct metric must be linearly related to, and accurately predict, the quality factor and these must be validated through statistical analysis following a rigorous validation methodology. In this paper, we conduct such an analysis to determine the validity and utility of lines of code as a measure using the ISBGS-10 data set. We find that it fails to meet the specified validity tests and, therefore, has limited utility in derived measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the correlation between size and metric validity

Article 17 June 2017

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Article 17 August 2022

Negative results for software effort estimation

Article 21 November 2016

References

Mendes E, Kitchenham B (2004) Further comparison of cross-company and within-company effort estimation models for web applications. In: Proceedings of the 10th International Symposium on Software Metrics, 14–16 Sept 2004, pp 348–357, Chicago, IL, USA
National Instruments Developer Zone Tutorial (2009) Estimating code complexity in labview. http://www.ni.com/white-paper/3324/en. Accessed 16 Mar 2012
Aggarwal KK, Singh Y, Ch P, Puri M (2005) Bayesian regularization in a neural network model to estimate lines of code using function points. J Comput Sci 1(4):505–509
Article Google Scholar
Akiyama F (1971) An example of software system debugging. Inf Process 71(1):353–379
Google Scholar
Albrecht AJ, Gaffney Jr JE (1983) Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans Softw Eng SE-9(6):639–648
Anderson AB, Basilevsky A, Hum DPJ (1983) Missing data: a review of the literature. Handb Surv Res 4:415–494
Article Google Scholar
Antoniol G, Fiutem R, Lokan C (2003) Object-oriented function points: an empirical validation. Empir Softw Eng 8(3):225–254
Article Google Scholar
Armel K (2012) History is the key to estimation success. J Softw Technol 15(1):16–22
Google Scholar
Armour PG (2004) Beware of counting LOC. Commun ACM 47(3):21–24
Article Google Scholar
Attarzadeh I, Ow SH (2009) Software development effort estimation based on a new fuzzy logic model. Int J Comput Theory Eng 1(4):473–476
Article Google Scholar
Bannerman S, Martin A (2011) A multiple comparative study of test-with development product changes and their effects on team speed and product quality. Empir Softw Eng 16(2):177–210
Article Google Scholar
Barb A, Neill C, Sangwan R, Piovoso M (2010) Statistical analysis of the relevance of lines of code measures. In: Proceedings of the 2010 International Conference on Software Engineering Research and Practice, 12–15 July, Las Vegas, NV, USA
de Barcelos Tronto IF, da Silva JDS, Sant’Anna N (2008) An investigation of artificial neural networks based prediction systems in software project management. J Syst Softw 81(3):356–367
Bell RM, Ostrand TJ, Weyuker EJ (2013) The limited impact of individual developer data on software defect prediction. Empir Softw Eng 18(3):478–505
Boehm BW (1984) Software engineering economics. IEEE Trans Softw Eng 10(1):4–21
Article Google Scholar
Booch G (2008) Measuring architectural complexity. IEEE Softw 25(4):14–15
Article Google Scholar
Box GEP, Cox DR (1964) An analysis of transformations (with discussion). J R Stat Soc B26:211–252
MathSciNet Google Scholar
Briand LC, Langley T, Wieczorek I (2000) A replicated assessment and comparison of common software cost modeling techniques. In: Proceedings of the 22nd international conference on software engineering, pp 377–386, ACM, New York
Capretz LF, Marza V (2009) Improving effort estimation by voting software estimation models. Adv Softw Eng 2009:4
Article Google Scholar
Chulani S, Clark B, Boehm BW, Steece B (1998) Calibration approach and results of the COCOMO II post-architecture model. In: Proceedings of the 20th annual conference of the international society of parametric analysts (ISPA), pp 1–5
Cohen J (2003) Applied multiple regression/correlation analysis for the behavioral sciences. In: Inquiry and pedagogy across diverse contexts series. Lawrence Erlbaum Associates, Incorporated, Mahwah
D’Ambros M, Lanza M, Robbes R (2012) Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir Softw Eng 17(4–5):531–577
Article Google Scholar
De Souto M, de Araujo D, Costa I, Soares R, Ludermir T, Schliep A (2008) Comparative study on normalization procedures for cluster analysis of gene expression datasets. In: IEEE international joint conference on neural networks, 2008 (IJCNN 2008) (IEEE world congress on computational intelligence), pp 2792–2798
DeMarco T (1995) Why does software cost so much? Dorset House Publishing, London
Fenton NE, Pfleeger SL (1996) Software metrics—a practical and rigorous approach, 2nd edn. International Thomson, Belmont
Foss T, Stensrud E, Kitchenham B, Myrtveit I (2003) A simulation study of the model evaluation criterion MMRE. IEEE Trans Softw Eng 29(11):985–995
Article Google Scholar
Fox J (1997) Applied regression analysis, linear models, and related methods. SAGE Publications, New York
Frazier TP, Bailey JW, Corso ML (1996) Comparing ada and fortran lines of code: some experimental results. Empir Softw Eng 1(1):45–59
Article Google Scholar
Gaffney JE (1984) Estimating the number of faults in code. IEEE Trans Softw Eng SE-10(4):459–464
Gale EAM (2004) The Hawthorne studies—a fable for our times? QJM 97(7):439–449
Article Google Scholar
Gelman A, Pardoe I (2006) Bayesian measures of explained variance and pooling in multilevel (hierarchical) models. Technometrics 48(2):241–251
Article MathSciNet Google Scholar
Gurka MJ, Edwards LJ, Muller KE, Kupper LL (2006) Extending the Box–Cox transformation to the linear mixed model. J R Stat Soc Ser A (Stat Soc) 169(2):273–288
Article MathSciNet Google Scholar
Harris JW, Stocker H (1998) Maximum likelihood method. Handb Math Comput Sci 1:824
MathSciNet Google Scholar
Heck BS, Wills LM, Vachtsevanos GJ (2009) Software technology for implementing reusable, distributed control systems. In: Applications of intelligent control to engineering systems, pp 267–293. Springer, New York
IEEE (1998) IEEE Standard for a Software Quality Metrics Methodology, IEEE Std. 1061–1998
Jeffery R, Ruhe M, Wieczorek I (2001) Using public domain metrics to estimate software development effort. In: Proceedings of the 7th international software metrics symposium METRICS 2001, pp 16–27
Jiang Y, Cukic B, Ma Y (2008) Techniques for evaluating fault prediction models. Empir Softw Eng 13(5):561–595
Article Google Scholar
Jones C (1997) Software quality, analysis and guidelines for success. Thomson, Boston
Google Scholar
Jorgensen M (2004) Regression models of software development effort estimation accuracy and bias. Empir Softw Eng 9(4):297–314
Article Google Scholar
Kaner C, Bond WP (2004) Software engineering metrics: what do they measure and how do we know. In: 10th International Software Metrics Symposium, METRICS 14–16 Sept 2004, pp 1–12, Chicago, IL, USA
Kim M, Hill RC (1993) General transformation of variables in regression. Empir Econ 18:307–319
Article Google Scholar
Kitchenham B, Mendes E (2009) Why comparative effort prediction studies may be invalid. In: Proceedings of the 5th international conference on predictor models in software engineering, PROMISE ’09, New York, pp 4:1–4:5
Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329
Little RJA, Rubin DB (1987) Statistical analysis with missing data. Wiley, Chichester
MATH Google Scholar
Lokan C, Mendes E (2006) Cross-company and single-company effort models using the ISBSG database: a further replicated study. In: Proceedings of the 2006 ACM/IEEE international symposium on empirical software engineering, pp 75–84. ACM, New York
Lokan C, Mendes E (2009) Investigating the use of chronological split for software effort estimation. IET Softw 3(5):422–434.10.1049/iet-sen.2008.0107. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=5273794. Accessed 16 Mar 2012
Lopez-Martin C, Yáñez Márquez C, Gutierrez-Tornes A (2006) A fuzzy logic model for software development effort estimation at personal level. In: Proceedings of the 5th Mexican international conference on artificial intelligence, MICAI’06. Springer-Verlag, Berlin, pp 122–133
Mair C, Shepperd M, Jørgensen M (2005) An analysis of data sets used to train and validate cost prediction systems. SIGSOFT Softw Eng Notes 30(4):1–6
Article Google Scholar
Marazzi A, Yohai V (2006) Robust Box-Cox transformations based on minimum residual autocorrelation. Comput Stat Data Anal 50(10):2752–2768
Maronna R, Martin D, Yohai V (2006) Robust statistics: theory and methods. In: Wiley series in probability and statistics. Wiley, New York
Mendes E, Lokan C (2008) Replicating studies on cross- vs single-company effort models using the ISBSG database. Empir Softw Eng 13:3–37
Article Google Scholar
Mendes E, Lokan C, Harrison R, Triggs C (2005) A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database. In: Proceedings of the 11th IEEE international software metrics symposium, p 36. IEEE Computer Society, Washington
Menzies T, Greenwald J, Frank A (2007) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13
Article Google Scholar
Parareda B, Pizka M (2007) Measuring productivity using the infamous lines of code metric. In: Proceedings of SPACE 2007 Workshop, Nagoya, Japan
Park RE (1992) Software size measurement: a framework for counting source statements. In: Technical report, DTIC document
Pendharkar PC, Rodger JA (2007) An empirical study of the impact of team size on software development effort. Inf Technol Manag 8(4):253–262
Article Google Scholar
Porter A, Selby RW (1990) Empirically guided software development using metric-based classification trees. IEEE Softw 7(2):46–54
Prasad L, Nagar A (2009) Experimental analysis of different metrics (object-oriented and structural) of software. In: IEEE 1st international conference on computational intelligence, communication systems and networks, CICSYN’09, pp 235–240
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. (ISBN 3-900051-07-0). Accessed 1 June 2010
Rosenberg J (1997) Some misconceptions about lines of code. In: IEEE proceedings of the 4th international software metrics symposium, pp 137–142
Schafer JL (2010) Analysis of incomplete multivariate data, vol 72. Chapman and Hall/CRC, London
Sentas P, Angelis L, Stamelos I, Bleris G (2005) Software productivity and effort prediction with ordinal regression. Inf Softw Technol 47(1):17–29
Article Google Scholar
Stensrud E, Myrtveit I (1998) Human performance estimating with analogy and regression models: an empirical validation. In: Proceedings of the 5th international symposium on software metrics, pp 205
Succi G, Pedrycz W, Stefanovic M, Russo B (2003) An investigation on the occurrence of service requests in commercial software applications. Empir Softw Eng 8(2):197–215
Article Google Scholar
The International Software Benchmarking Standards Group (2008) ISBSG estimating benchmarking and research suite release 10. http://www.isbsg.org/. Accessed 1 June 2010
Tian J, Zelkowitz MV (1995) Complexity measure evaluation and selection. IEEE Trans Softw Eng 21(8):641–650
Article Google Scholar
Walkerden F, Jeffery R (1999) An empirical study of analogy-based software effort estimation. Empir Softw Eng 4(2):135–158
Article Google Scholar
Weyuker EJ, Ostrand TJ, Bell RM (2008) Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models. Empir Softw Eng 13(5):539–559
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1:80–83
Article Google Scholar

Download references

Author information

Authors and Affiliations

Penn State University, 30 E Swedesford Rd, Malvern, PA , 19355, USA
Adrian S. Barb, Colin J. Neill, Raghvinder S. Sangwan & Michael J. Piovoso

Authors

Adrian S. Barb
View author publications
You can also search for this author in PubMed Google Scholar
Colin J. Neill
View author publications
You can also search for this author in PubMed Google Scholar
Raghvinder S. Sangwan
View author publications
You can also search for this author in PubMed Google Scholar
Michael J. Piovoso
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Adrian S. Barb.

Additional information

Data acquisition for this project was supported Research and Development Grant from the School of Graduate Professional Studies, Penn State University.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Barb, A.S., Neill, C.J., Sangwan, R.S. et al. A statistical study of the relevance of lines of code measures in software projects. Innovations Syst Softw Eng 10, 243–260 (2014). https://doi.org/10.1007/s11334-014-0231-5

Download citation

Received: 13 May 2013
Accepted: 04 April 2014
Published: 07 May 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11334-014-0231-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A statistical study of the relevance of lines of code measures in software projects

Abstract

Access this article

Similar content being viewed by others

On the correlation between size and metric validity

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Negative results for software effort estimation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A statistical study of the relevance of lines of code measures in software projects

Abstract

Access this article

Similar content being viewed by others

On the correlation between size and metric validity

Revisiting the debate: Are code metrics useful for measuring maintenance effort?

Negative results for software effort estimation

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation