Abstract
Maintaining a program is a time-consuming and expensive task in software engineering. Consequently, several approaches have been proposed to improve the comprehensibility of source code. One of such approaches are comments in the code that enable developers to explain the program with their own words or predefined tags. Some empirical studies indicate benefits of comments in certain situations, while others find no benefits at all. Thus, the real effect of comments on software development remains uncertain. In this article, we describe an experiment in which 277 participants, mainly professional software developers, performed small programming tasks on differently commented code. Based on quantitative and qualitative feedback, we i) partly replicate previous studies, ii) investigate performances of differently experienced participants when confronted with varying types of comments, and iii) discuss the opinions of developers on comments. Our results indicate that comments seem to be considered more important in previous studies and by our participants than they are for small programming tasks. While other mechanisms, such as proper identifiers, are considered more helpful by our participants, they also emphasize the necessity of comments in certain situations.
Similar content being viewed by others
Notes
We did not count lines such as or ⋆⋆/ that do not contain any natural words.
We applied this test since we want to compare two distributions from the same population, but we cannot assume a normal distribution.
References
Ali N, Sharafi Z, Guéhéneuc YG, Antoniol G (2015) An empirical study on the importance of source code entities for requirements traceability. Empir Softw Eng (EMSE) 20(2):442–478
Anquetil N, Lethbridge T (1998) Assessing the relevance of identifier names in a legacy software system. In: Proceedings of the 8th conference of the centre for advanced studies on collaborative research (CASCON). IBM, pp 213–222
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng (TSE) 28(10):970–983
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng (TSE) 25(4):456–473
Beck F, Moseler O, Diehl S, Rey GD (2013) In situ understanding of performance bottlenecks through visually augmented code. In: Proceedings of the 21st international conference on program comprehension (ICPC). IEEE, pp 63–72
Bezerra RMM, da Silva FQB, Santana AM, Magalhaes CVC, Santos RES (2015) Replication of empirical studies in software engineering: an update of a systematic mapping study. In: Proceedings of the 9th international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 1–4
Boehm BW (1981) Software engineering economics. Prentice-Hall
Börstler J, Paech B (2016) The role of method chains and comments in software readability and comprehension—an experiment. IEEE Trans Softw Eng (TSE) 42(9):886–898
Briand L, Bunse C, Daly J, Differding C (1997) An experimental comparison of the maintainability of object-oriented and structured design documents. In: Proceedings of the 5th international conference on software maintenance (ICSM). IEEE, pp 130–138
Buse RP, Weimer WR (2010) Learning a metric for code readability. IEEE Trans Softw Eng (TSE) 36(4):546–558
Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. In: Proceedings of the 1st international workshop on replication in empirical software engineering (RESER)
Chikofsky EJ, Cross JH (1990) Reverse engineering and design recovery: a taxonomy. IEEE Soft 7(1):13–17
Cook TD, Campbell DT (1979) Quasi-experimentation: design & analysis issues for field settings. Houghton Mifflin
Corazza A, Maggio V, Scanniello G (2015) On the coherence between comments and implementations in source code. In: Proceedings of the 41st euromicro conference on software engineering and advanced applications (SEAA). IEEE, pp 76–83
Cornelissen B, Holten D, Zaidman A, Moonen L, van Wijk JJ, Van Deursen A (2007) Understanding execution traces using massive sequence and circular bundle views. In: Proceedings of the 15th international conference on program comprehension (ICPC). IEEE, pp 49–58
Dinno A (2015) Nonparametric pairwise multiple comparisons in independent groups using dunn’s test. Stata J 15(1):292–300
Dunsmore HE (1985) The effect of comments, mnemonic names, and modularity: some university experiment results. In: Empirical foundations of information and software science. Springer, pp 189–196
Elshoff JL, Marcotty M (1982) Improving computer program readability to aid modification. Commun ACM 25(8):512–521
Feigenspan J, Kästner C, Liebig J, Apel S, Hanenberg S (2012) Measuring programming experience. In: Proceedings of the 20th international conference on program comprehension (ICPC). IEEE, pp 73-82
Feigenspan J, Kästner C, Apel S, Liebig J, Schulze M, Dachselt R, Papendieck M, Leich T, Saake G (2013) Do background colors improve program comprehension in the# ifdef hell? Empir Softw Eng (EMSE) 18(4):699–745
Fisher RA (1936) Statistical methods for research workers, 6th edn. Oliver and Boyd Edinbrug, Tweeddale Court London: 33 Paternoste R Row, E.C.
Fluri B, Wursch M, Gall HC (2007) Do code and comments co-evolve? On the relation between source code and comment changes. In: Proceedings of the 14th working conference on reverse engineering (WCRE). IEEE, pp 70–79
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley, Reading
Gosling SD, Vazire S, Srivastava S, John OP (2004) Should we trust web-based studies? a comparative analysis of six preconceptions about internet questionnaires. Am Psychol 59(2):93
Hanenberg S, Kleinschmager S, Robbes R, Tanter É, Stefik A (2014) An empirical study on the impact of static typing on software maintainability. Empir Softw Eng (EMSE) 19(5):1335–1382
Hofmeister J, Siegmund J, Holt DV (2017) Shorter identifier names take longer to comprehend. In: Proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 217–227
Höst M, Regnell B, Wohlin C (2000) Using students as subjects - a comparative study of students and professionals in lead-time impact assessment. Empir Softw Eng (EMSE) 5(3):201–214
Jalali S, Wohlin C (2012) Systematic literature studies: database searches vs. backward snowballing. In: Proceedings of the 6th international symposium on empirical software engineering and measurement (ESEM). ACM, pp 29–38
Jbara A, Feitelson DG (2015) How programmers read regular code: a controlled experiment using eye tracking. In: Proceedings of the 23rd international conference on program comprehension (ICPC). IEEE, pp 244–254
Ji W, Berger T, Antkiewicz M, Czarnecki K (2015) Maintaining feature traceability with embedded annotations. In: Proceedings of the 19th international software product line conference (SPLC). ACM, pp 61–70
Jiang ZM, Hassan AE (2006) Examining the evolution of code comments in PostgreSQL. In: Proceedings of the 3rd working conference on mining software repositories (MSR). ACM, pp 179–180
Juristo N, Vegas S (2009) Using differences among replications of software engineering experiments to gain knowledge. In: Proceedings of the 24th international symposium on empirical software engineering and measurement (ESEM). IEEE, pp 356–366
Khamis N, Witte R, Rilling J (2010) Automatic quality assessment of source code comments: the JavadocMiner. In: Proceedings of the 9th international conference on natural language processing and information system (NLDB). Springer, pp 68–79
Knuth DE (1984) Literate programming. Comput J 27(2):97–111
Kobayashi K, Kamimura M, Yano K, Kato K, Matsuo A (2013) SArF Map: visualizing software architecture from feature and layer viewpoints. In: Proceedings of the 21st international conference on program comprehension (ICPC). IEEE, pp 43–52
Koenemann J, Robertson SP (1991) Expert problem solving strategies for program comprehension. In: Proceedings of the 9th conference on human factors in computing systems (CHI). ACM, pp 125–130
Kosar T, Mernik M, Carver JC (2012) Program comprehension of domain-specific and general-purpose languages: comparison using a family of experiments. Empir Softw Eng (EMSE) 17(3):276–304
Kramer D (1999) API documentation from source code comments: a case study of Javadoc. In: Proceedings of the 17th annual international conference on computer documentation (SIGDOC). ACM, pp 147–153
Krüger J, Gu W, Shen H, Mukelabai M, Hebig R, Berger T (2018) Towards a better understanding of software features and their characteristics: a case study of marlin. In: Proceedings of the 12th workshop on variability modelling of software-intensive systems (VaMoS). ACM, pp 105–112
Krüger J, Wiemann J, Fenske W, Saake G, Leich T (2018) Do you remember this source code?. In: Proceedings of the 40th international conference on software engineering (ICSE). ACM, pp 764–775
Kruskal WH, Wallis WA (1952) Use of ranks in one-criterion variance analysis. J Am Stat Assoc 47(260):583–621
Lawrie D, Morrell C, Feild H, Binkley D (2007) Effective identifier names for comprehension and memory. Innov Syst Softw Eng 3(4):303–318
Mäder P, Egyed A (2015) Do developers benefit from requirements traceability when evolving and maintaining a software system? Empir Softw Eng (EMSE) 20 (2):413–441
Martin RC (2009) Clean code: a handbook of agile software craftsmanship. Pearson Education
Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Soft Eng (EMSE) 20(1):176–205
von Mayrhauser A, Vans AM (1995) Program comprehension during software maintenance and evolution. IEEE Comput 28(8):44–55
McBurney PW, Mcmillan C (2014) Automatic documentation generation via source code summarization of method context. In: Proceedings of the 22nd international conference on program comprehension (ICPC). ACM, pp 279–290
McBurney PW, McMillan C (2016) An empirical study of the textual similarity between source code and source code summaries. Empir Soft Eng (EMSE) 21(1):17–42
Norcio AF (1982) Indentation, documentation and programmer comprehension. In: Proceedings of the 1st conference on human factors in computing systems (CHI). ACM, pp 118–120
Nurvitadhi E, Leung WW, Cook C (2003) Do class comments aid Java program understanding?. In: Proceedings of the 33rd annual frontiers in education, vol 1. IEEE, pp T3C–T3C
Perry DE, Porter AA, Votta LG (2000) Empirical studies of software engineering: a roadmap. In: Proceedings of the conference on the future of software engineering. ACM, pp 345–355
Rahman MM, Roy CK, Keivanloo I (2015) Recommending insightful comments for source code using crowdsourced knowledge. In: Proceedings of the 15th international working conference on source code analysis and manipulation (SCAM). IEEE, pp 81–90
Ratol IK (2017) Detecting fragile comments. In: Proceedings of the 32nd international conference on automated software engineering (ASE). IEEE Press, pp 112–122
Runeson P (2003) Using students as experiment subjects – an analysis on graduate and freshmen student data. In: Proceedings of the 7th international conference on evaluation and assessment in software engineering (EASE). Lund University, pp 95–102
Salviulo F, Scanniello G (2014) Dealing with identifiers and comments in source code comprehension and maintenance: results from an ethnographically-informed study with students and professionals. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering (EASE). ACM, pp 48
Schröter I, Krüger J, Siegmund J, Leich T (2017) Comprehending studies on program comprehension. In: Proceedings of the 25th international conference on program comprehension (ICPC). IEEE, pp 308–311
Seiler M, Paech B (2017) Using tags to support feature management across issue tracking systems and version control systems. In: Requirements engineering: foundation for software quality. Springer, pp 174–180
Shakeel Y, Krüger J, von Nostitz-Wallwitz I, Lausberger C, Campero Durand G, Saake G, Leich T (2018) Automated literature analysis - threats and experiences. In: Proceedings of the international workshop on software engineering for science (SE4Science). ACM, pp 20–27
Sharon D (1996) Meeting the challenge of software maintenance. IEEE Soft 13 (1):122–125
Sheppard S, Borst M, Curtis B, Love L (1978) Predicting programmers’ ability to modify software. Tech Rep TR—7a—3B8100 3, General electric
Siegel S (1956) Nonparametric statistics for the behavioral sciences. McGraw-Hill Kogakusha LTD, Tokyo
Siegmund J (2016) Program comprehension: past, present, and future. In: Proceedings of the 23rd international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 13–20
Siegmund J, Siegmund N, Apel S (2015) Views on internal and external validity in empirical software engineering. In: Proceedings of the 37th international conference on software engineering (ICSE), vol 1. IEEE, pp 9–19
Sommerlad P, Zgraggen G, Corbat T, Felber L (2008) Retaining comments when refactoring code. In: Proceedings of the 23rd conference on object-oriented programming, systems, languages and applications (OOPSLA). ACM, pp 653–662
Sridhara G (2016) Automatically detecting the up-to-date status of todo comments in Java programs. In: Proceedings of the 9th India software engineering conference (ISEC). ACM, pp 16–25
Sridhara G, Hill E, Muppaneni D, Pollock L (2010) Towards automatically generating summary comments for Java methods. In: Proceedings of the 25th international conference on automated software engineering (ASE). ACM, pp 43–52
Standish TA (1984) An essay on software reuse. IEEE Trans Soft Eng (TSE) SE-10(5):494–497. https://ieeexplore.ieee.org/abstract/document/5010272
Steidl D, Hummel B, Juergens E (2013) Quality analysis of source code comments. In: Proceedings of the 21st international conference on program comprehension (ICPC). IEEE, pp 83–92
Storey MAD (2005) Theories, methods and tools in program comprehension: past, present and future. In: Proceedings of the 13th international workshop on program comprehension (IWPC). IEEE, pp 181–191
Storey MAD, Wong K, Muller HH (1997) How do program understanding tools affect how programmers understand programs?. In: Proceedings of the 4th working conference on reverse engineering (WCRE). IEEE, pp 12–21
Svahnberg M, Aurum A, Wohlin C (2008) Using students as subjects - an empirical evaluation. In: Proceedings of the 2nd international symposium on empirical software engineering and measurement (ESEM). ACM, pp 288–290
Takang AA, Grubb PA, Macredie RD (1996) The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang (JPL) 4(3):143–167
Tan SH, Marinov D, Tan L, Leavens GT (2012) @tComment: testing JavaDoc comments to detect comment-code inconsistencies. In: Proceedings of the 5th international conference on software testing, verification and validation (ICST). IEEE, pp 260–269. https://www.computer.org/csdl/proceedings/icst/2012/4670/00/4670a260-abs.html
Tenny T (1985) Procedures and comments vs. the banker’s algorithm. ACM SIGCSE Bulletin 17(3):44–53
Tenny T (1988) Program readability: procedures versus comments. IEEE Trans Soft Eng (TSE) 14(9):1271–1279
Tiarks R (2011) What maintenance programmers really do: an observational study. In: Proceedings of the 13th workshop on software reengineering, pp 36–37
Trochim WM, Donnelly JP, Arora K (2016) Research methods the essential knowledge base, 2nd edn. Cengage Learning, Boston
Trumper J, Dollner J, Telea A (2013) Multiscale visual comparison of execution traces. In: Proceedings of the 21st international conference on program comprehension (ICPC). IEEE, pp 53–62
Vermeulen A (2000) The elements of Java (tm) style. Cambridge University Press, Cambridge
Wohlin C (2014) Guidelines for snowballing in systematic literature studies and a replication in software engineering. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering (EASE). ACM, pp 1–10
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer, Berlin
Wong E, Yang J, Tan L (2013) Autocomment: mining question and answer sites for automatic comment generation. In: Proceedings of the 28th international conference on automated software engineering (ASE). IEEE, pp 562–567
Woodfield SN, Dunsmore HE, Shen VY (1981) The effect of modularization and comments on program comprehension. In: Proceedings of the 5th international conference on software engineering (ICSE). IEEE, pp 215–223
Ying AT, Wright JL, Abrams S (2005) Source code that talks: an exploration of eclipse task comments and their implication to repository mining. ACM SIGSOFT Soft Eng Notes (SEN) 30(4):1–5
Acknowledgments
This research is supported by DFG grant LE 3382/2-1.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Christoph Treude
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In the following, we present our 9 tasks and their solutions. To indicate the comment type, we always use single-line marks for implementation comments and multi-line marks for documentation comments. Versions with no comments contained none of the lines marked this way. The solutions were used as exemplary sketches, but we checked each solution individually.
1.1 A.1 Apply Code (Tasks 1–3)
Task 1
Call method foo() in such a way that it returns 7.
Task 2
Call method foo() in such a way that it returns .
Task 3
Change (only) the list objectList in the method bar so that the call of this method returns .
1.2 A.2 Bug Fixing (Tasks 4-6)
Task 4
The foo() method throws a runtime exception for the following input. Fix the error so that the expected result [3, 8] is returned.
foo(new int[]{1, 3, 4, 5, 8, 11, 13}, new int[]{2, 3, 5, 7, 8, 9}); >> [3,8]
Task 5
The foo() method contains an error. Fix it so that the expected results are returned:
foo("abcd", "acbd") >> 1 foo("abcd", "badc") >> 2 foo("abcdef", "defabc") >> 3
It can be assumed that both strings have the same length and are not .
After the study was completed, we noticed that our solution handles only strings with an even number of characters, for example, for "abc" and "cab" the code would calculate only one transposition due to integer division. However, all participants seemed to be unaware of this problem as none of the given answers handles this scenario. Thus, we accepted all answers that solve the task for strings with an even number of characters.
Please notice, that this task does not use the Hamming distance, as a transposition is defined as a single switch of two characters. Thus, we considered "abcd" and "efgh" as invalid input, because no chars within the string are switched.
Task 6
Fix the compile-time error in foo().
1.3 A.3 Extend Code (Tasks 7–9)
Task 7
Extend the method foo with an parameter, which is returned if number equals 0. Example:
ᅟ foo(new int[]{5,13,31}, 7); >> 7
Task 8
Extend method foo() to ignore Strings for the output.
ᅟ String[] input = {"Tic", null, "Tac", "Toe"}; join(input, ","); ᅟ >> Tic,Tac,Toe
Task 9
Extend Class2 with a method bar() of the return type Integer that reverses the operation of foo().
Rights and permissions
About this article
Cite this article
Nielebock, S., Krolikowski, D., Krüger, J. et al. Commenting source code: is it worth it for small programming tasks?. Empir Software Eng 24, 1418–1457 (2019). https://doi.org/10.1007/s10664-018-9664-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-018-9664-z