ABSTRACT
Source code retrieval techniques show efficacy in the automation of software understanding activities, but the literature provides no guidance regarding the impact of comments on the performance of these techniques. In this paper we present an initial investigation of the effects of using comments in the source code retrieval process. We address our research question using a case study of six open source Java projects. The results indicate that the inclusion of comments significantly affects the average keyword density for a project. Future work includes analyzing the extent to which comments affect the average keyword density of domain terms and non-domain terms.
- S. Abebe, S. Haiduc, A. Marcus, P. Tonella, and G. Antoniol. Analyzing the evolution of the source code vocabulary. In Proceedings of the 13th European Conference on Software Maintenance and Reengineering, pages 189--198, 2009. Google ScholarDigital Library
- B. Boehm. Software Engineering Economics. Prentice Hall, 1981. Google ScholarDigital Library
- B. Fluri, M. Wursch, and H. Gall. Do code and comments co-evolve? on the relation between source code and comment changes. In Proceedings of the 14th Working Conference on Reverse Engineering, pages 70--79, 2007. Google ScholarDigital Library
- S. Haiduc and A. Marcus. On the use of domain terms in source code. In Proceedings of the 16th International Conference on Program Comprehension, pages 113--122, 2008. Google ScholarDigital Library
- H. Müller, J. Jahnke, D. Smith, M.-A. Storey, S. Tilley, and K. Wong. Reverse engineering: A roadmap. In Proceedings of the Future of Software Engineering, pages 47--60, June 2000. Google ScholarDigital Library
Index Terms
- Toward an understanding of the relationship between the identifier and comment lexicons
Recommendations
Quantifying the similiarities between source code lexicons
ACM-SE '11: Proceedings of the 49th Annual Southeast Regional ConferenceSeveral recent static analysis techniques automate software understanding activities by extracting textual information from source code and applying information retrieval models to the extracted corpora. These source code retrieval techniques show ...
The effects of identifier retention and stop word removal on a latent Dirichlet allocation based feature location technique
ACM-SE '12: Proceedings of the 50th Annual Southeast Regional ConferenceFeature location, an important task in program comprehension, occurs when the developer identifies the source code entity or entities responsible for implementing a functionality. Researchers have applied static analysis techniques to multiple software ...
Improving program comprehension by combining code understanding with comment understanding
Existing source code-based program comprehension approaches analyze either the code itself or the comments/identifiers but not both. In this research, we combine code understanding with comment and identifier understanding. This synergistic approach ...
Comments