skip to main content
10.1145/2597008.2597793acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
Article

Improving topic model source code summarization

Published: 02 June 2014 Publication History

Abstract

In this paper, we present an emerging source code summarization technique that uses topic modeling to select keywords and topics as summaries for source code. Our approach organizes the topics in source code into a hierarchy, with more general topics near the top of the hierarchy. In this way, we present the software's highest-level functionality first, before lower-level details. This is an advantage over previous approaches based on topic models, that only present groups of related keywords without a hierarchy. We conducted a preliminary user study that found our approach selects keywords and topics that the participants found to be accurate in a majority of cases.

References

[1]
P. F. Baldi, C. V. Lopes, E. J. Linstead, and S. K. Bajracharya. A theory of aspects as latent topics. OOPSLA ’08
[2]
H. Burden and R. Heldal. Natural language generation from class diagrams. In MoDeVVa ’11. Proceedings.
[3]
F. Chen, K. Han, and G. Chen. An approach to sentence-selection-based text summarization. TENCON ’02.
[4]
M. E. Crosby and J. Stelovsky. How do we read algorithms? a case study. Computer, 23(1):24–35, Jan. 1990.
[5]
A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella. Using ir methods for labeling source code artifacts: Is it worthwhile? In ICPC ’12. Proceedings
[6]
S. C. B. de Souza, N. Anquetil, and K. M. de Oliveira. A study of the documentation essential to software maintenance. In SIGDOC ’05, Proceedings.
[7]
B. Eddy, J. Robinson, N. Kraft, and J. Carver. Evaluating source code summarization techniques: Replication and expansion. In ICPC ’13, Proceedings.
[8]
B. Fluri, M. Wursch, and H. C. Gall. Do code and comments co-evolve? on the relation between source code and comment changes. In WCRE ’07, Proceedings.
[9]
J. Goldstein, M. Kantrowitz, V. Mittal, and J. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In SIGIR ’99, Proceedings.
[10]
S. Haiduc, J. Aponte, L. Moreno, and A. Marcus. On the use of automated text summarization techniques for summarizing source code. In WCRE ’10, Proceedings.
[11]
S. Haiduc, J. Aponte, and A. Marcus, Supporting program comprehension with source code summarization ICSE ’10
[12]
R. Holmes, R. J. Walker. Systematizing pragmatic software reuse. ACM TOSEM., 21(4):20:1–20:44, Feb. 2013.
[13]
M. Kajko-Mattsson. A survey of documentation practice within corrective maintenance. ESE. Jan. 2005.
[14]
A. J. Ko, B. A. Myers, M. J. Coblenz, and H. H. Aung. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE TSE. Dec. 2006.
[15]
D. Kramer. Api documentation from source code comments: a case study of javadoc. In SIGDOC ’99, Proceedings.
[16]
E. Linstead, P. Rigor, S. Bajracharya, C. Lopes, and P. Baldi. Mining concepts from code with probabilistic topic models. In ASE ’07, Proceedings.
[17]
S. Mani, R. Catherine, V. S. Sinha, and A. Dubey. Ausum: approach for unsupervised bug report summarization. FSE ’12
[18]
L. Moreno, J. Aponte, S. Giriprasad, A. Marcus, L. Pollock, and K. Vijay-Shanker. Automatic generation of natural language summaries for java classes. In ICPC ’13, Proceedings.
[19]
A. Panichella, B. Dit, R. Oliveto, M. Di Penta, D. Poshyvanyk, and A. De Lucia. How to effectively use topic models for software engineering tasks? an approach based on genetic algorithms. In ICSE ’13, Proceedings.
[20]
D. R. Radev, H. Jing, and M. Budzikowska. Centroid-based summarization of multiple documents: Sentence extraction, utility-based evaluation, and user studies. In NAACL-ANLP-AutoSum ’00, Proceedings.
[21]
P. Rodeghero, C. McMillan, P. W. McBurney, N. Bosch, and S. D’Mello. Improving automated source code summarization via an eye-tracking study of programmers. In ICSE ’14, Proceedings. To appear.
[22]
J. Sillito, G. C. Murphy, K. De Volder. Asking and answering questions during a programming change task. IEEE TSE
[23]
K. Spärck Jones. Automatic summarising: The state of the art. Inf. Process. Manage.
[24]
G. Sridhara. Automatic Generation of Descriptive Summary Comments for Methods in Object-oriented Programs. PhD thesis, University of Delaware, Jan. 2012.
[25]
G. Sridhara, E. Hill, D. Muppaneni, L. Pollock, and K. Vijay-Shanker. Towards automatically generating summary comments for java methods. In ASE ’10, Proceedings.
[26]
G. Sridhara, L. Pollock, and K. Vijay-Shanker. Generating parameter comments and integrating with method summaries. In ICPC ’11, Proceedings.
[27]
D. Wang, S. Zhu, T. Li, and Y. Gong. Comparative document summarization via discriminative sentence selection. ACM Trans. Knowl. Discov. Data
[28]
T. Wang, G. Yin, X. Li, and H. Wang. Labeled topic detection of open source software from mining mass textual project profiles. In SoftwareMining ’12, Proceedings.
[29]
T. Weninger, Y. Bisk, and J. Han. Document-topic hierarchies from document graphs. In CIKM ’12, Proceedings.

Cited By

View all
  • (2024)Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/363197533:3(1-37)Online publication date: 15-Mar-2024
  • (2024)Enhancing code summarization with action word predictionNeurocomputing10.1016/j.neucom.2023.126777563(126777)Online publication date: Jan-2024
  • (2024)A review of automatic source code summarizationEmpirical Software Engineering10.1007/s10664-024-10553-629:6Online publication date: 7-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC 2014: Proceedings of the 22nd International Conference on Program Comprehension
June 2014
325 pages
ISBN:9781450328791
DOI:10.1145/2597008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • TCSE: IEEE Computer Society's Tech. Council on Software Engin.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Source code summarization
  2. software topic models

Qualifiers

  • Article

Conference

ICSE '14
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Deep Is Better? An Empirical Comparison of Information Retrieval and Deep Learning Approaches to Code SummarizationACM Transactions on Software Engineering and Methodology10.1145/363197533:3(1-37)Online publication date: 15-Mar-2024
  • (2024)Enhancing code summarization with action word predictionNeurocomputing10.1016/j.neucom.2023.126777563(126777)Online publication date: Jan-2024
  • (2024)A review of automatic source code summarizationEmpirical Software Engineering10.1007/s10664-024-10553-629:6Online publication date: 7-Oct-2024
  • (2023)COME: Commit Message Generation with Modification EmbeddingProceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3597926.3598096(792-803)Online publication date: 12-Jul-2023
  • (2023)An interview study about the use of logs in embedded software engineeringEmpirical Software Engineering10.1007/s10664-022-10258-828:2Online publication date: 11-Feb-2023
  • (2022)A Review on Source Code DocumentationACM Transactions on Intelligent Systems and Technology10.1145/351931213:5(1-44)Online publication date: 21-Jun-2022
  • (2022)ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid RankingIEEE Transactions on Software Engineering10.1109/TSE.2020.303868148:5(1800-1817)Online publication date: 1-May-2022
  • (2021)Why My Code Summarization Model Does Not WorkACM Transactions on Software Engineering and Methodology10.1145/343428030:2(1-29)Online publication date: 10-Feb-2021
  • (2021)Emoji-powered Sentiment and Emotion Detection from Software Developers’ Communication DataACM Transactions on Software Engineering and Methodology10.1145/342430830:2(1-48)Online publication date: 27-Jan-2021
  • (2021)Adversarial Specification MiningACM Transactions on Software Engineering and Methodology10.1145/342430730:2(1-40)Online publication date: 3-Jan-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media