skip to main content
10.1145/3338906.3342494acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
extended-abstract

Understanding source code comments at large-scale

Published: 12 August 2019 Publication History

Abstract

Source code comments are important for any software, but the basic patterns of writing comments across domains and programming languages remain unclear. In this paper, we take a first step toward understanding differences in commenting practices by analyzing the comment density of 150 projects in 5 different programming languages. We have found that there are noticeable differences in comment density, which may be related to the programming language used in the project and the purpose of the project.

References

[1]
Oliver Arafat and Dirk Riehle. 2009. The comment density of open source software code. In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Companion Volume. 195–198.
[2]
Qingying Chen and Minghui Zhou. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018. 826–831.
[3]
Beat Fluri, Michael Würsch, Emanuel Giger, and Harald C. Gall. 2009. Analyzing the Co-evolution of Comments and Source Code. Software Quality Journal 17, 4 (Dec. 2009), 367–394.
[4]
Georgios Gousios and Diomidis Spinellis. 2012. GHTorrent: Github’s data from a firehose. In 9th IEEE Working Conference of Mining Software Repositories, MSR 2012, June 2-3, 2012, Zurich, Switzerland. 12–21. 2012.6224294
[5]
Dorsaf Haouari, Houari A. Sahraoui, and Philippe Langlais. 2011. How Good is Your Comment? A Study of Comments in Java Programs. In Proceedings of the 5th International Symposium on Empirical Software Engineering and Measurement, ESEM 2011, Banff, AB, Canada, September 22-23, 2011. 137–146.
[6]
Hideaki Hata, Christoph Treude, Raula Gaikovina Kula, and Takashi Ishio. 2019. 9.6 Million Links in Source Code Comments: Purpose, Evolution, and Decay. CoRR abs/1901.07440 (2019). arXiv: 1901.07440 http://arxiv.org/abs/1901.07440
[7]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018. 200–210. 1145/3196321.3196334
[8]
Xing Hu, Ge Li, Xin Xia, David Lo, Shuai Lu, and Zhi Jin. 2018. Summarizing Source Code with Transferred API Knowledge. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. 2269–2275.
[9]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers. http://aclweb.org/ anthology/P/P16/P16-1195.pdf
[10]
Yuxing Ma, Christopher Bogart, Sadika Amreen, Russell Zaretzki, and Audris Mockus. 2019. World of Code: An Infrastructure for Mining the Universe of Open Source VCS Data. In 16th International Conference on Mining Software Repositories, MSR 2019.
[11]
P. Oman and J. Hagemeister. 1992. Metrics for assessing a software system’s maintainability. In Proceedings Conference on Software Maintenance 1992. 337–344.
[12]
Oracle. 2019. Javadoc. https://docs.oracle.com/javase/8/docs/technotes/tools/ windows/javadoc.html. Accessed: 2019-06-05.
[13]
Yoann Padioleau, Lin Tan, and Yuanyuan Zhou. 2009. Listening to programmers - Taxonomies and characteristics of comments in operating system code. In 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings. 331–341.
[14]
[15]
Luca Pascarella and Alberto Bacchelli. 2017. Classifying code comments in Java open-source software systems. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017. 227–237.
[16]
Ioannis Stamelos, Lefteris Angelis, Apostolos Oikonomou, and Georgios L. Bleris. 2002. Code Quality Analysis in Open Source Software Development. Information System Journal 12, 1 (2002), 43–60. 00117.x
[17]
Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /*Icomment: Bugs or Bad Comments?*/. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles (SOSP ’07). ACM, New York, NY, USA, 145–158.
[18]
The Sphinx team. 2019. Sphinx. http://www.sphinx-doc.org/en/master/. Accessed: 2019-06-05.
[19]
T. Tenny. 1988. Program Readability: Procedures Versus Comments. IEEE Trans. Softw. Eng. 14, 9 (Sept. 1988), 1271–1279.
[20]
Edmund Wong, Jinqiu Yang, and Lin Tan. 2013. AutoComment: Mining question and answer sites for automatic comment generation. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013. 562–567. 2013.6693113
[21]
S. N. Woodfield, H. E. Dunsmore, and V. Y. Shen. 1981. The Effect of Modularization and Comments on Program Comprehension. In Proceedings of the 5th International Conference on Software Engineering (ICSE ’81). IEEE Press, Piscataway, NJ, USA, 215–223. http://dl.acm.org/citation.cfm?id=800078.802534 Abstract 1 Problem and Motivation 2 Background and Related Work 3 Approach 3.1 Selection of Open Source Projects 3.2 Analysis of Comment Density 4 Results 5 Conclusion References

Cited By

View all
  • (2024)Software sustainability of global impact modelsGeoscientific Model Development10.5194/gmd-17-8593-202417:23(8593-8611)Online publication date: 5-Dec-2024
  • (2024)Purpose enhanced reasoning through iterative promptingProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/720(6513-6521)Online publication date: 3-Aug-2024
  • (2024)Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?Proceedings of the ACM on Software Engineering10.1145/36607911:FSE(1889-1912)Online publication date: 12-Jul-2024
  • Show More Cited By

Index Terms

  1. Understanding source code comments at large-scale

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering
    August 2019
    1264 pages
    ISBN:9781450355728
    DOI:10.1145/3338906
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2019

    Check for updates

    Author Tags

    1. Comment Density
    2. Empirical Study
    3. Source Code Comments

    Qualifiers

    • Extended-abstract

    Conference

    ESEC/FSE '19
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 112 of 543 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)25
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Software sustainability of global impact modelsGeoscientific Model Development10.5194/gmd-17-8593-202417:23(8593-8611)Online publication date: 5-Dec-2024
    • (2024)Purpose enhanced reasoning through iterative promptingProceedings of the Thirty-Third International Joint Conference on Artificial Intelligence10.24963/ijcai.2024/720(6513-6521)Online publication date: 3-Aug-2024
    • (2024)Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?Proceedings of the ACM on Software Engineering10.1145/36607911:FSE(1889-1912)Online publication date: 12-Jul-2024
    • (2024)Do Code Summarization Models Process Too Much Information? Function Signature May Be All That Is NeededACM Transactions on Software Engineering and Methodology10.1145/365215633:6(1-35)Online publication date: 27-Jun-2024
    • (2024)Beyond code: Is there a difference between comments in visual and textual languages?Journal of Systems and Software10.1016/j.jss.2024.112087215(112087)Online publication date: Sep-2024
    • (2024)Bash comment generation via data augmentation and semantic-aware CodeBERTAutomated Software Engineering10.1007/s10515-024-00431-231:1Online publication date: 26-Mar-2024
    • (2023)PENTACET data - 23 Million Contextual Code Comments and 250,000 SATD comments2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00063(412-416)Online publication date: May-2023
    • (2023)ALSI-Transformer: Transformer-Based Code Comment Generation With Aligned Lexical and Syntactic InformationIEEE Access10.1109/ACCESS.2023.326863811(39037-39047)Online publication date: 2023
    • (2023)An Approach of Code Summary Generation Using Multi-Feature Fusion Based on TransformerWeb Information Systems and Applications10.1007/978-981-99-6222-8_23(271-283)Online publication date: 9-Sep-2023
    • (2022)Suboptimal Comments in Java Projects: From Independent Comment Changes to Commenting PracticesACM Transactions on Software Engineering and Methodology10.1145/354694932:2(1-33)Online publication date: 8-Jul-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media