research-article

Human-level Ordinal Maintainability Prediction Based on Static Code Metrics

Authors:

Markus Schnappinger,

Arnaud Fietzke,

Alexander PretschnerAuthors Info & Claims

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

Pages 160 - 169

https://doi.org/10.1145/3463274.3463315

Published: 21 June 2021 Publication History

Abstract

One of the greatest challenges in software quality control is the efficient and effective measurement of maintainability. Thorough expert assessments are precise yet slow and expensive, whereas automated static analysis yields imprecise yet rapid feedback. Several machine learning approaches aim to integrate the advantages of both concepts.

However, most prior studies did not adhere to expert judgment and predicted the number of changed lines as a proxy for maintainability, or were biased towards a small group of experts. In contrast, the present study builds on a manually labeled and validated dataset. Prediction is done using static code metrics where we found simple structural metrics such as the size of a class and its methods to yield the highest predictive power towards maintainability. Using just a small set of these metrics, our models can distinguish easy from hard to maintain code with an F-score of 91.3% and AUC of 82.3%.

In addition, we perform a more detailed ordinal classification and compare the quality of the classification with the performance of experts. Here, we use the deviations between the individual expert’s ratings and the eventually determined consensus of all experts.

In sum, our models achieve the same level of performance as an average human expert. In fact, the obtained accuracy and mean squared error outperform human performance. We hence argue that our models provide an automated and trustworthy prediction of software maintainability.

References

[1]

Arthur Aron and Elaine Aron. 2002. Statistics for the behavioral and social sciences. Prentice Hall Press.

[2]

Alejandro Barredo Arrieta, Natalia Díaz-Rodríguez, Javier Del Ser, Adrien Bennetot, Siham Tabik, Alberto Barbado, Salvador García, Sergio Gil-López, Daniel Molina, Richard Benjamins, 2020. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion 58(2020), 82–115.

Digital Library

[3]

Hans Christian Benestad, Bente Anda, and Erik Arisholm. 2006. Assessing software product maintainability based on class-level structural measures. In International Conference on Product Focused Software Process Improvement. Springer, 94–111.

Digital Library

[4]

Andrew P Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern recognition 30, 7 (1997), 1145–1159.

Digital Library

[5]

Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.

Digital Library

[6]

G Ann Campbell. 2018. Cognitive complexity: An overview and evaluation. In Proceedings of the 2018 international conference on technical debt. 57–58.

Digital Library

[7]

G Ann Campbell and Patroklos P Papapetrou. 2013. SonarQube in action. Manning Publications Co.

Digital Library

[8]

Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321–357.

[9]

Shyam R Chidamber and Chris F Kemerer. 1994. A metrics suite for object oriented design. IEEE Transactions on software engineering 20, 6 (1994), 476–493.

Digital Library

[10]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.

[11]

D. Coleman, D. Ash, B. Lowther, and P. Oman. 1994. Using metrics to evaluate software system maintainability. IEEE Computer 27, 8 (1994).

[12]

Florian Deissenboeck, Elmar Juergens, Benjamin Hummel, Stefan Wagner, Benedikt Mas y Parareda, and Markus Pizka. 2008. Tool support for continuous quality control. IEEE software 25, 5 (2008), 60–67.

[13]

Carsten F Dormann, Jane Elith, Sven Bacher, Carsten Buchmann, Gudrun Carl, Gabriel Carré, Jaime R García Marquéz, Bernd Gruber, Bruno Lafourcade, Pedro J Leitao, 2013. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance. Ecography 36, 1 (2013), 27–46.

[14]

Georgios Douzas, Fernando Bacao, and Felix Last. 2018. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences 465(2018), 1–20.

Digital Library

[15]

Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters 27, 8 (2006), 861–874.

[16]

Eibe Frank and Mark Hall. 2001. A simple approach to ordinal classification. In European Conference on Machine Learning. Springer, 145–156.

Digital Library

[17]

Jerome H Friedman. 2002. Stochastic gradient boosting. Computational statistics & data analysis 38, 4 (2002), 367–378.

[18]

Pierre Geurts, Damien Ernst, and Louis Wehenkel. 2006. Extremely randomized trees. Machine learning 63, 1 (2006), 3–42.

[19]

Jan Gorodkin. 2004. Comparing two K-category assignments by a K-category correlation coefficient. Computational biology and chemistry 28 (2004), 367–374.

[20]

Rachel Harrison, Steve J Counsell, and Reuben V Nithi. 1998. An evaluation of the MOOD set of object-oriented software metrics. IEEE Transactions on Software Engineering 24, 6 (1998), 491–496.

Digital Library

[21]

Trevor Hastie, Saharon Rosset, Ji Zhu, and Hui Zou. 2009. Multi-class adaboost. Statistics and its Interface 2, 3 (2009), 349–360.

[22]

Péter Hegedűs, Tibor Bakota, László Illés, Gergely Ladányi, Rudolf Ferenc, and Tibor Gyimóthy. 2011. Source code metrics and maintainability: a case study. In International Conference on Advanced Software Engineering and Its Applications. Springer, 272–284.

[23]

Péter Hegedűs, Gergely Ladányi, István Siket, and Rudolf Ferenc. 2012. Towards building method level maintainability models based on expert evaluations. In Computer Applications for Software Engineering, Disaster Recovery, and Business Continuity. Springer, 146–154.

[24]

Lars Heinemann, Benjamin Hummel, and Daniela Steidl. 2014. Teamscale: Software quality control in real-time. In Companion Proceedings of the 36th International Conference on Software Engineering. 592–595.

Digital Library

[25]

David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant. 2013. Applied logistic regression. Vol. 398. John Wiley & Sons.

[26]

Elmar Juergens, Florian Deissenboeck, Benjamin Hummel, and Stefan Wagner. 2009. Do code clones matter?. In 2009 IEEE 31st International Conference on Software Engineering. IEEE, 485–495.

Digital Library

[27]

Arvinder Kaur and Kamaldeep Kaur. 2013. Statistical comparison of modelling methods for software maintainability prediction. International Journal of Software Engineering and Knowledge Engineering 23, 06(2013), 743–774.

[28]

Kenji Kira and Larry Rendell. 1992. A Practical Approach to Feature Selection. In Ninth International Workshop on Machine Learning. Morgan Kaufmann, 249–256.

[29]

György Kovács. 2019. An empirical comparison and evaluation of minority oversampling techniques on a large number of imbalanced datasets. Applied Soft Computing(2019).

[30]

György Kovács. 2019. smote-variants: a Python Implementation of 85 Minority Oversampling Techniques. Neurocomputing 366(2019), 352–354.

Digital Library

[31]

Stefan Kramer, Gerhard Widmer, Bernhard Pfahringer, and Michael De Groeve. 2001. Prediction of ordinal classes using regression trees. Fundamenta Informaticae 47, 1-2 (2001), 1–13.

[32]

Alexander Kraskov, Harald Stögbauer, and Peter Grassberger. 2004. Estimating mutual information. Physical review E 69, 6 (2004).

[33]

Lov Kumar, Debendra Kumar Naik, and Santanu Ku Rath. 2015. Validating the effectiveness of object-oriented metrics for predicting maintainability. Procedia Computer Science 57 (2015), 798–806.

[34]

Wei Li and Sallie Henry. 1993. Object-oriented metrics that predict maintainability. Journal of systems and software 23, 2 (1993), 111–122.

Digital Library

[35]

FrontEndART Software Ltd.2020. Sourcemeter. https://www.sourcemeter.com.

[36]

Ruchika Malhotra and Anuradha Chug. 2012. Software maintainability prediction using machine learning algorithms. Software engineering: an international Journal (SeiJ) 2, 2(2012).

[37]

Ruchika Malhotra and Kusum Lata. 2020. An empirical study on predictability of software maintainability using imbalanced data. Software Quality Journal 28, 4 (2020), 1581–1614.

Digital Library

[38]

Thomas J McCabe. 1976. A complexity measure. IEEE Transactions on software Engineering4 (1976), 308–320.

Digital Library

[39]

Mohammad Y Mhawish and Manjari Gupta. 2020. Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics. Journal of Computer Science and Technology 35, 6 (2020), 1428–1445.

Digital Library

[40]

Marvin Muñoz Barón, Marvin Wyrich, and Stefan Wagner. 2020. An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability. In Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). ACM.

Digital Library

[41]

Mayra Nilson, Vard Antinyan, and Lucas Gren. 2019. Do internal software quality tools measure validated metrics?. In International Conference on Product-Focused Software Process Improvement. Springer, 637–648.

[42]

Paul Oman and Jack Hagemeister. 1994. Construction and testing of polynomials predicting software maintainability. Journal of Systems and Software 24, 3 (1994), 251–266.

Digital Library

[43]

Karl Pearson. 1895. VII. Note on regression and inheritance in the case of two parents. proceedings of the royal society of London 58, 347-352 (1895), 240–242.

[44]

Fabiano Pecorelli, Dario Di Nucci, Coen De Roover, and Andrea De Lucia. 2019. On the role of data balancing for machine learning-based code smell detection. In Proceedings of the 3rd ACM SIGSOFT international workshop on machine learning techniques for software quality evaluation. 19–24.

Digital Library

[45]

Fabiano Pecorelli, Fabio Palomba, Dario Di Nucci, and Andrea De Lucia. 2019. Comparing heuristic and machine learning approaches for metric-based code smell detection. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, 93–104.

Digital Library

[46]

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.

Digital Library

[47]

M. Pizka and T. Panas. 2009. Establishing economic effectiveness through software health-management. In 1st International Workshop on Software Health Management, Pasadena.

[48]

Nicolino J Pizzi, Arthur R Summers, and Witold Pedrycz. 2002. Software quality prediction using median-adjusted class labels. In Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN’02, Vol. 3. IEEE, 2405–2409.

[49]

John A Rice. 2006. Mathematical statistics and data analysis. Nelson Education.

[50]

Tony Rosqvist, Mika Koskela, and Hannu Harju. 2003. Software quality evaluation based on expert judgement. Software Quality Journal 11, 1 (2003), 39–55.

Digital Library

[51]

Brian C Ross. 2014. Mutual information between discrete and continuous data sets. PloS one 9, 2 (2014).

[52]

Chanchal Roy and James R Cordy. 2007. A survey on software clone detection research. Queen’s School of Computing TR 541 (2007), 64–68.

[53]

Markus Schnappinger, Arnaud Fietzke, and Alexander Pretschner. 2020. Defining a Software Maintainability Dataset: Collecting, Aggregating and Analysing Expert Evaluations of Software Maintainability. In 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 278–289.

[54]

Markus Schnappinger, Arnaud Fietzke, and Alexander Pretschner. 2020. A Software Maintainability Dataset. https://doi.org/10.6084/m9.figshare.12801215

[55]

Markus Schnappinger, Mohd Hafeez Osman, Alexander Pretschner, and Arnaud Fietzke. 2019. Learning a classifier for prediction of maintainability based on static analysis tools. In Proceedings of the 27th International Conference on Program Comprehension. IEEE, 243–248.

Digital Library

[56]

Markus Schnappinger, Mohd Hafeez Osman, Alexander Pretschner, Markus Pizka, and Arnaud Fietzke. 2018. Software quality assessment in practice: a hypothesis-driven framework. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. ACM, 40.

Digital Library

[57]

Tushar Sharma, Pratibha Mishra, and Rohit Tiwari. 2016. Designite: A software design quality assessment tool. In Proceedings of the 1st International Workshop on Bringing Architectural Design Thinking into Developers’ Daily Activities. 1–4.

Digital Library

[58]

Dag IK Sjøberg, Bente Anda, and Audris Mockus. 2012. Questioning software maintenance metrics: a comparative case study. In Proceedings of the 2012 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement. IEEE, 107–110.

Digital Library

[59]

Dag IK Sjøberg, Aiko Yamashita, Bente CD Anda, Audris Mockus, and Tore Dybå. 2012. Quantifying the effect of code smells on maintenance effort. IEEE Transactions on Software Engineering 39, 8 (2012), 1144–1156.

Digital Library

[60]

Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Information processing & management 45, 4 (2009), 427–437.

[61]

Charles Spearman. 1961. ”General Intelligence” Objectively Determined and Measured.(1961).

[62]

Giancarlo Succi, Witold Pedrycz, Snezana Djokic, Paolo Zuliani, and Barbara Russo. 2005. An empirical exploration of the distributions of the chidamber and kemerer object-oriented metrics suite. Empirical Software Engineering 10, 1 (2005), 81–104.

Digital Library

[63]

Barbara G Tabachnick and Linda S Fidell. 2007. Using multivariate statistics. Vol. 5.

[64]

Chikako Van Koten and AR Gray. 2006. An application of Bayesian network for predicting object-oriented software maintainability. Information and Software Technology 48, 1 (2006), 59–67.

Digital Library

[65]

W Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa. 2016. A survey on software fault localization. IEEE Transactions on Software Engineering 42, 8 (2016), 707–740.

Digital Library

[66]

Jürgen Wüst. 2018. SDMetrics: The software design metrics tool for UML. https://www.sdmetrics.com/.

[67]

Yuming Zhou and Hareton Leung. 2007. Predicting object-oriented software maintainability using multivariate adaptive regression splines. Journal of systems and software 80, 8 (2007), 1349–1361.

Digital Library

Cited By

Borg MEzzouhri MTornhill A(2024)Ghost Echoes Revealed: Benchmarking Maintainability Metrics and Machine Learning Predictions Against Human Assessments2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00072(678-688)Online publication date: 6-Oct-2024
https://doi.org/10.1109/ICSME58944.2024.00072
Yu DYang QChen XChen JWang SXu Y(2024)Actionable code smell identification with fusion learning of metrics and semanticsScience of Computer Programming10.1016/j.scico.2024.103110236:COnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.scico.2024.103110
Gregor LSchnappinger MPretschner A(2023)Revisiting Inter-Class Maintainability Indicators2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00093(805-814)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00093
Show More Cited By

Recommendations

Replication and Extension of Schnappinger’s Study on Human-level Ordinal Maintainability Prediction Based on Static Code Metrics
EASE '23: Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering

As a part of a research project concerning software maintainability assessment in collaboration with the development team, we wanted to explore dissensions between developers and the confounding effect of size. To this end, this study replicated and ...
Software Maintainability Estimation Made Easy: A Comprehensive Tool COIN
ICCCT '15: Proceedings of the Sixth International Conference on Computer and Communication Technology 2015

This paper presents a tool called COIN (COhesion, INheritance) for evaluating Cohesion, Inheritance and Size metrics and Maintainability factors of class hierarchies in Java projects. The tool enables analyzing classes as well as class hierarchies on ...
SMPLearner: learning to predict software maintainability

Accurate and practical software maintainability prediction enables organizations to effectively manage their maintenance resources and guide maintenance-related decision making. This paper presents SMPLearner, an automated learning-based approach to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '21: Proceedings of the 25th International Conference on Evaluation and Assessment in Software Engineering

June 2021

417 pages

ISBN:9781450390538

DOI:10.1145/3463274

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 June 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

EASE 2021

EASE 2021: Evaluation and Assessment in Software Engineering

June 21 - 23, 2021

Trondheim, Norway

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
145
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Borg MEzzouhri MTornhill A(2024)Ghost Echoes Revealed: Benchmarking Maintainability Metrics and Machine Learning Predictions Against Human Assessments2024 IEEE International Conference on Software Maintenance and Evolution (ICSME)10.1109/ICSME58944.2024.00072(678-688)Online publication date: 6-Oct-2024
https://doi.org/10.1109/ICSME58944.2024.00072
Yu DYang QChen XChen JWang SXu Y(2024)Actionable code smell identification with fusion learning of metrics and semanticsScience of Computer Programming10.1016/j.scico.2024.103110236:COnline publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1016/j.scico.2024.103110
Gregor LSchnappinger MPretschner A(2023)Revisiting Inter-Class Maintainability Indicators2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00093(805-814)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00093
Schnappinger MZachau SFietzke APretschner A(2022)A Preliminary Study on Using Text- and Image-Based Machine Learning to Predict Software MaintainabilitySoftware Quality: The Next Big Thing in Software Engineering and Quality10.1007/978-3-031-04115-0_4(41-60)Online publication date: 12-Apr-2022
https://doi.org/10.1007/978-3-031-04115-0_4
Karanikiotis TPapamichail MSymeonidis A(2021)Analyzing Static Analysis Metric Trends towards Early Identification of Non-Maintainable Software ComponentsSustainability10.3390/su13221284813:22(12848)Online publication date: 20-Nov-2021
https://doi.org/10.3390/su132212848

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten