ABSTRACT
Previous research shows that developers spend most of their time understanding code. Despite the importance of code understandability for maintenance-related activities, an objective measure of it remains an elusive goal. Recently, Scalabrino et al. reported on an experiment with 46 Java developers designed to evaluate metrics for code understandability. The authors collected and analyzed data on more than a hundred features describing the code snippets, the developers' experience, and the developers' performance on a quiz designed to assess understanding. They concluded that none of the metrics considered can individually capture understandability. Expecting that understandability is better captured by a combination of multiple features, we present a reanalysis of the data from the Scalabrino et al. study, in which we use different statistical modeling techniques. Our models suggest that some computed features of code, such as those arising from syntactic structure and documentation, have a small but significant correlation with understandability. Further, we construct a binary classifier of understandability based on various interpretable code features, which has a small amount of discriminating power. Our encouraging results, based on a small data set, suggest that a useful metric of understandability could feasibly be created, but more data is needed.
- Krishan K Aggarwal, Yogesh Singh, and Jitender Kumar Chhabra. 2002. An integrated measure of software maintainability. In Proc. Annual Reliability and Maintainability Symposium. IEEE, 235--241.Google ScholarCross Ref
- Paul D Allison. 1999. Multiple regression: A primer. Pine Forge Press.Google Scholar
- Marc Bartsch and Rachel Harrison. 2008. An exploratory study of the effect of aspect-oriented programming on maintainability. Software Quality Journal 16, 1 (2008), 23--44. Google ScholarDigital Library
- Douglas M. Bates. 2010. lme4: Mixed-effects modeling with R. http://lme4.r-forge.r-project.org/book/Google Scholar
- Richard Berk, Lawrence Brown, and Linda Zhao. 2010. Statistical inference after model selection. Journal of Quantitative Criminology 26, 2 (2010), 217--236.Google ScholarCross Ref
- Raymond PL Buse and Westley R Weimer. 2010. Learning a metric for code readability. IEEE Transactions on Software Engineering 36, 4 (2010), 546--558. Google ScholarDigital Library
- Andrea Capiluppi, Maurizio Morisio, and Patricia Lago. 2004. Evolution of understandability in OSS projects. In Proc. European Conference on Software Maintenance and Reengineering (CSMR). IEEE, 58--66. Google ScholarDigital Library
- Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer. 2015. Modeling readability to improve unit tests. In Proc. Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 107--118. Google ScholarDigital Library
- Jonathan Dorn. 2012. A general software readability model. MCS Thesis available from (http://www.cs.virginia.edu/~weimer/students/dorn-mcs-paper.pdf) (2012).Google Scholar
- Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2017. An Introduction to Statistical Learning with Applications in R. Springer. Google ScholarDigital Library
- Natalia Juristo and Omar S Gómez. 2012. Replication of software engineering experiments. In Empirical Software Engineering and Verification. Springer, 60--88. Google ScholarDigital Library
- Jin-Cherng Lin and Kuo-Chiang Wu. 2006. A model for measuring software understandability. In Proc. International Conference on Computer and Information Technology (CIT). IEEE, 192--192. Google ScholarDigital Library
- Jin-Cherng Lin and Kuo-Chiang Wu. 2008. Evaluation of software understandability based on fuzzy matrix. In Proc. International Conference on Fuzzy Systems. IEEE, 887--892.Google Scholar
- Roberto Minelli, Andrea Mocci, and Michele Lanza. 2015. I know what you did last summer: an investigation of how developers spend their time. In Proc. International Conference on Program Comprehension (ICPC). IEEE, 25--35. Google ScholarDigital Library
- Shinichi Nakagawa and Holger Schielzeth. 2013. A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution 4, 2 (2013), 133--142.Google ScholarCross Ref
- Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A simpler model of software readability. In Proc. International Conference on Mining Software Repositories. ACM, 73--82. Google ScholarDigital Library
- Simone Scalabrino, Gabriele Bavota, Christopher Vendome, Mario Linares-Vásquez, Denys Poshyvanyk, and Rocco Oliveto. 2017. Automatically Assessing Code Understandability: How Far Are We?. In Proc. International Conference on Automated Software Engineering (ASE). IEEE. Google ScholarDigital Library
- Simone Scalabrino, Mario Linares-Vásquez, Denys Poshyvanyk, and Rocco Oliveto. 2016. Improving code readability models with textual features. In Proc. International Conference on Program Comprehension (ICPC). IEEE, 1--10.Google ScholarCross Ref
- D Srinivasulu, Adepu Sridhar, and Durga Prasad Mohapatra. 2014. Evaluation of Software Understandability Using Rough Sets. In Intelligent Computing, Networking, and Informatics. Springer, 939--946.Google Scholar
- M-A Storey. 2005. Theories, methods and tools in program comprehension: Past, present and future. In Proc. International Conference on Program Comprehension (ICPC). IEEE, 181--191. Google ScholarDigital Library
- M-AD Storey, Kenny Wong, and Hausi A Müller. 2000. How do program understanding tools affect how programmers understand programs? Science of Computer Programming 36, 2-3 (2000), 183--207. Google ScholarDigital Library
- Asher Trockman, Shurui Zhou, Christian Kästner, and Bogdan Vasilescu. 2018. Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem. In Proc. International Conference on Software Engineering (ICSE). ACM. Google ScholarDigital Library
Recommendations
An Empirical Validation of Cognitive Complexity as a Measure of Source Code Understandability
ESEM '20: Proceedings of the 14th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)Background: Developers spend a lot of their time on understanding source code. Static code analysis tools can draw attention to code that is difficult for developers to understand. However, most of the findings are based on non-validated metrics, which ...
Automatically assessing code understandability: how far are we?
ASE '17: Proceedings of the 32nd IEEE/ACM International Conference on Automated Software EngineeringProgram understanding plays a pivotal role in software maintenance and evolution: a deep understanding of code is the stepping stone for most software-related activities, such as bug fixing or testing. Being able to measure the understandability of a ...
An empirical study on software understandability and its dependence on code characteristics
AbstractContextInsufficient code understandability makes software difficult to inspect and maintain and is a primary cause of software development cost. Several source code measures may be used to identify difficult-to-understand code, including well-...
Comments