skip to main content
10.1145/1985441.1985451acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Retrieval from software libraries for bug localization: a comparative study of generic and composite text models

Published:21 May 2011Publication History

ABSTRACT

From the standpoint of retrieval from large software libraries for the purpose of bug localization, we compare five generic text models and certain composite variations thereof. The generic models are: the Unigram Model (UM), the Vector Space Model (VSM), the Latent Semantic Analysis Model (LSA), the Latent Dirichlet Allocation Model (LDA), and the Cluster Based Document Model (CBDM). The task is to locate the files that are relevant to a bug reported in the form of a textual description by a software developer. We use for our study iBUGS, a benchmarked bug localization dataset with 75 KLOC and a large number of bugs (291). A major conclusion of our comparative study is that simple text models such as UM and VSM are more effective at correctly retrieving the relevant files from a library as compared to the more sophisticated models such as LDA. The retrieval effectiveness for the various models was measured using the following two metrics: (1) Mean Average Precision; and (2) Rank-based metrics. Using the SCORE metric, we also compare the retrieval effectiveness of the models in our study with some other bug localization tools.

References

  1. J. Chang. R-lda. http://cran.r-project.org/web/packages/lda/.Google ScholarGoogle Scholar
  2. B. Cleary, C. Exton, J. Buckley, and M. English. An Empirical Analysis of Information Retrieval based Concept Location Techniques in Software Comprehension. Empirical Softw. Engg., 14(1):93--130, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. V. Dallmeier, C. Lindig, and A. Zeller. Lightweight Bug Localization with AMPLE. In Proceedings of the sixth international symposium on Automated analysis-driven debugging, AADEBUG'05, pages 99--104, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Dallmeier and T. Zimmermann. Automatic Extraction of Bug Localization Benchmarks from History. Technical report, Universiät des Saarlandes, Saarbrücken, Germany, June 2007.Google ScholarGoogle Scholar
  5. V. Dallmeier and T. Zimmermann. Extraction of Bug Localization Benchmarks from History. In ASE '07: Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 433--436, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. T. Devanbu, R. J. Brachman, P. G. Selfridge, and B. W. Ballard. Lassie--A Knowledge-Based Software Information System. In ICSE '90: Proceedings of the 12th international conference on Software engineering, pages 249--261, Los Alamitos, CA, USA, 1990. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Engler, D. Y. Chen, S. Hallem, A. Chou, and B. Chelf. Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code. In Proceedings of the eighteenth ACM symposium on Operating systems principles, SOSP '01, pages 57--72, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. E. Enslen, E. Hill, L. Pollock, and K. Vijay-Shanker. Mining Source Code to Automatically Split Identifiers for Software Analysis. In Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, MSR '09, pages 71--80, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. B. Frakes and B. A. Nejmeh. Software Reuse through Information Retrieval. SIGIR Forum, 21(1--2):30--36, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. B. H. Field and D. Lawrie. An Empirical Comparison of Techniques for Extracting Concept Abbreviations from Identifiers. In Proceedings of IASTED International Conference on Software Engineering and Applications, 2006.Google ScholarGoogle Scholar
  11. D. Hovemeyer and W. Pugh. Finding Bugs is Easy. SIGPLAN Not., 39:92--106, December 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. A. Jones and M. J. Harrold. Empirical Evaluation of the Tarantula Automatic Fault-Localization Technique. In Automated Software Engineering, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Kuhn, S. Ducasse, and T. Girba. Semantic Clustering: Identifying Topics in Source Code. Source Information and Software Technology archive, 49:230--243, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Liu, X. Yan, L. Fei, J. Han, and S. P. Midkiff. SOBER: Statistical Model-Based Bug Localization. SIGSOFT Softw. Eng. Notes, 30:286--295, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. Liu and T. C. Lethbridge. Intelligent Search Techniques for Large Software Systems. In CASCON '01: Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research, page 10. IBM Press, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Liu and W. B. Croft. Cluster-Based Retrieval using Language Models. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '04, pages 186--193, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. K. Lukins, N. A. Karft, and E. H. Letha. Source Code Retrieval for Bug Localization using Latent Dirichlet Allocation. In 15th Working Conference on Reverse Engineering, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Y. S. Maarek, D. M. Berry, and G. E. Kaiser. An Information Retrieval Approach for Automatically Constructing Software Libraries. IEEE Trans. Softw. Eng., 17(8):800--813, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Marcus and J. I. Maletic. Recovering Documentation-to-Source-Code Traceability Links using Latent Semantic Indexing. In ICSE '03: Proceedings of the 25th International Conference on Software Engineering, pages 125--135, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. An Information Retrieval Approach to Concept Location in Source code. In In Proceedings of the 11th Working Conference on Reverse Engineering (WCRE 2004, pages 214--223. IEEE Computer Society, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Mishne and M. D. Rijke. Source Code Retrieval using Conceptual Similarity. In Proc. 2004 Conf. Computer Assisted Information Retrieval (RIAO aAZ04, pages 539--554, 2004.Google ScholarGoogle Scholar
  22. M. Renieres and S. Reiss. Fault Localization with Nearest Neighbor Queries. In Proceedings. 18th IEEE International Conference on Automated Software Engineering, ASE'03, pages 30--39, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  23. I. Ruthven and M. Lalmas. A Survey on the Use of Relevance Feedback for Information Access Systems. The Knowledge Engineering Review, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Retrieval from software libraries for bug localization: a comparative study of generic and composite text models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          MSR '11: Proceedings of the 8th Working Conference on Mining Software Repositories
          May 2011
          260 pages
          ISBN:9781450305747
          DOI:10.1145/1985441

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 21 May 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Upcoming Conference

          ICSE 2025

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader