skip to main content
10.1145/1985441.1985466acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Automated topic naming to support cross-project analysis of software maintenance activities

Published:21 May 2011Publication History

ABSTRACT

Researchers have employed a variety of techniques to extract underlying topics that relate to software development artifacts. Typically, these techniques use semi-unsupervised machine-learning algorithms to suggest candidate word-lists. However, word-lists are difficult to interpret in the absence of meaningful summary labels. Current topic modeling techniques assume manual labelling and do not use domainspecific knowledge to improve, contextualize, or describe results for the developers. We propose a solution: automated labelled topic extraction. Topics are extracted using Latent Dirichlet Allocation (LDA) from commit-log comments recovered from source control systems such as CVS and Bit-Keeper. These topics are given labels from a generalizable cross-project taxonomy, consisting of non-functional requirements. Our approach was evaluated with experiments and case studies on two large-scale RDBMS projects: MySQL and MaxDB. The case studies show that labelled topic extraction can produce appropriate, context-sensitive labels relevant to these projects, which provides fresh insight into their evolving software development activities.

References

  1. J. Aranda and G. Venolia. The secret life of bugs: Going past the errors and omissions in software repositories. In International Conference on Software Engineering, pages 298--308. IEEE, Sep 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. F. Baldi, C. V. Lopes, E. J. Linstead, and S. K. Bajracharya. A theory of aspects as latent topics. In Conference on Object Oriented Programming Systems Languages and Applications, pages 543--562, Nashville, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3(4--5):993--1022, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Boehm, J. R. Brown, and M. Lipow. Quantitative Evaluation of Software Quality. In International Conference on Software Engineering, pages 592--605, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Cleland-Huang, R. Settimi, X. Zou, and P. Solc. The Detection and Classification of Non-Functional Requirements with Application to Early Aspects. In International Requirements Engineering Conference, pages 39--48, Minneapolis, Minnesota, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N. A. Ernst and J. Mylopoulos. On the perception of software quality requirements during the project lifecycle. In International Working Conference on Requirements Engineering: Foundation for Software Quality, Essen, Germany, June 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861--874, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  9. S. Few. Information Dashboard Design: The Effective Visual Communication of Data. O'Reilly Media, 1 edition, Jan. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA Data Mining Software: An Update. SIGKDD Explorations, 11(1):10--18, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Hindle, M. W. Godfrey, and R. C. Holt. What's hot and what's not: Windowed developer topic analysis. In International Conference on Software Maintenance, pages 339--348, Edmonton, Alberta, Canada, September 2009.Google ScholarGoogle ScholarCross RefCross Ref
  12. Software engineering -- Product quality -- Part 1: Quality model. Technical report, International Standards Organization - JTC 1/SC 7, 2001.Google ScholarGoogle Scholar
  13. A. Kayed, N. Hirzalla, A. Samhan, and M. Alfayoumi. Towards an ontology for software product quality attributes. In International Conference on Internet and Web Applications and Services, pages 200--204, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In International Joint Conference On Artificial Intelligence, pages 1137--1143, Toronto, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Marcus, A. Sergeyev, V. Rajlich, and J. Maletic. An information retrieval approach to concept location in source code. In 11th Working Conference on Reverse Engineering, pages 214--223, November 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. McCall. Factors in Software Quality: Preliminary Handbook on Software Quality for an Acquisiton Manager, volume 1--3. General Electric, November 1977.Google ScholarGoogle ScholarCross RefCross Ref
  17. Q. Mei, X. Shen, and C. Zhai. Automatic labeling of multinomial topic models. In International Conference on Knowledge Discovery and Data Mining, pages 490--499, San Jose, California, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Mockus and L. Votta. Identifying reasons for software changes using historic databases. In International Conference on Software Maintenance, pages 120--130, San Jose, CA, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Treude and M.-A. Storey. ConcernLines: A timeline view of co-occurring concerns. In International Conference on Software Engineering, pages 575--578, Vancouver, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Tsoumakas, I. Katakis, and I. Vlahavas. Mining multi-label data. In O. Maimon and L. Rokach, editors, Data Mining and Knowledge Discovery Handbook. Spring, 2nd edition, 2010.Google ScholarGoogle Scholar

Index Terms

  1. Automated topic naming to support cross-project analysis of software maintenance activities

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            MSR '11: Proceedings of the 8th Working Conference on Mining Software Repositories
            May 2011
            260 pages
            ISBN:9781450305747
            DOI:10.1145/1985441

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 21 May 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Upcoming Conference

            ICSE 2025

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader