skip to main content
10.1145/1882291.1882316acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Leveraging usage similarity for effective retrieval of examples in code repositories

Published: 07 November 2010 Publication History

Abstract

Developers often learn to use APIs (Application Programming Interfaces) by looking at existing examples of API usage. Code repositories contain many instances of such usage of APIs. However, conventional information retrieval techniques fail to perform well in retrieving API usage examples from code repositories. This paper presents Structural Semantic Indexing (SSI), a technique to associate words to source code entities based on similarities of API usage. The heuristic behind this technique is that entities (classes, methods, etc.) that show similar uses of APIs are semantically related because they do similar things. We evaluate the effectiveness of SSI in code retrieval by comparing three SSI based retrieval schemes with two conventional baseline schemes. We evaluate the performance of the retrieval schemes by running a set of 20 candidate queries against a repository containing 222,397 source code entities from 346 jars belonging to the Eclipse framework. The results of the evaluation show that SSI is effective in improving the retrieval of examples in code repositories.

References

[1]
Stackoverflow Web Site. http://stackoverflow.com.
[2]
Java2s Web Site. http://java2s.com/.
[3]
Apache lucene - scoring web page http://lucene.apache.org/java/2_4_0/scoring.html, Mar 2010.
[4]
Eclipse faqs web site http://wiki.eclipse.org/index.php/Eclipse_FAQs, Jan 2010.
[5]
Lucene web site. http://lucene.apache.org, Jan 2010.
[6]
Sourcerer wiki page on api location http://wiki.github.com/sourcerer/Sourcerer/locating, Jan 2010.
[7]
Swt snippets example web site http://www.eclipse.org/swt/snippets/, Jan. 2010.
[8]
J. Arthorne and C. Laffra. Official Eclipse 3.0 FAQs. Addison-Wesley Professional, July 2004.
[9]
S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: a search engine for open source code supporting structure-based search. pages 681--682, New York, NY, USA, 2006. ACM Press.
[10]
S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An internet-scale software repository. In First Intl. Workshop on Search Driven Development -- Users, Infrastructure, Tools and Evaluation. ICSE 2009, 2009.
[11]
M. Balabanović and Y. Shoham. Fab: content-based, collaborative recommendation. Commun. ACM, 40(3):66--72, 1997.
[12]
M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proceegings of FSE, pages 213--222, Amsterdam, The Netherlands, 2009. ACM.
[13]
S. Chatterjee, S. Juvekar, and K. Sen. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering, pages 385--400. 2009.
[14]
C. W. Cleverdon. Factors determining the performance of indexing systems. 1966.
[15]
B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison Wesley, 1 edition, Feb. 2009.
[16]
B. Dagenais and H. Ossher. Automatically locating framework extension examples. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 203--213, Atlanta, Georgia, 2008. ACM.
[17]
J. D'Anjou, S. Fairbrother, D. Kehn, J. Kellerman, and P. McCarthy. The Java Developer's Guide to Eclipse, 2nd Edition. Addison-Wesley Professional, 2 edition, Nov. 2004.
[18]
G. Fischer, S. Henninger, and D. Redmiles. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th international conference on Software engineering, pages 318--328, Austin, Texas, United States, 1991. IEEE Computer Society Press.
[19]
G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Commun. ACM, 30:964--971, 1987.
[20]
M. Grechanik, K. M. Conroy, and K. A. Probst. Finding Relevant Applications for Prototyping. In Proceedings of the Fourth International Workshop on Mining Software Repositories, page 12. IEEE Computer Society, 2007.
[21]
M. Grechanik and D. Poshyvanyk. Evaluating recommended applications. In Proceedings of the 2008 international workshop on Recommendation systems for software engineering, pages 33--35, Atlanta, Georgia, 2008. ACM.
[22]
S. Henninger. An evolutionary approach to constructing effective software reuse repositories. ACM Trans. Softw. Eng. Methodol., 6(2):111--140, 1997.
[23]
S. R. Henninger. Locating relevant examples for example-based software design. PhD thesis, University of Colorado at Boulder, 1993.
[24]
R. Hoffmann, J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th annual ACM symposium on User interface software and technology, pages 13--22, Newport, Rhode Island, USA, 2007. ACM.
[25]
M. Hollander and D. A. Wolfe. Nonparametric Statistical Methods, 2nd Edition. Wiley-Interscience, 2 edition, Jan. 1999.
[26]
R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In ICSE '05: Proceedings of the 27th international conference on Software engineering, pages 117--125, New York, NY, USA, 2005. ACM Press.
[27]
E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300--336, Apr. 2009.
[28]
G. Little and R. C. Miller. Keyword programming in java. Automated Software Engg., 16(1):37--71, 2009.
[29]
D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In PLDI '05, pages 48--61, New York, NY, USA, 2005. ACM Press.
[30]
C. D. Manning, P. Raghavan, and H. Schufitze. Introduction to Information Retrieval. Cambridge University Press, 1 edition, July 2008.
[31]
F. McCarey, M. O. Cinneide, and N. Kushmerick. A recommender agent for software libraries: An evaluation of memory-based and model-based collaborative filtering. pages 154--162. IEEE Computer Society, 2006.
[32]
J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proceedings of the 20th annual international conference on Computer documentation, pages 133--141, Toronto, Ontario, Canada, 2002. ACM.
[33]
J. Ossher, S. Bajracharya, and C. Lopes. SourcererDB: An aggregated repository of statically analyzed and cross-linked open source java projects. In MSR 2009: 6th IEEE Working Conference on Mining Software Repositories, 2009.
[34]
D. F. Redmiles. Reducing the variability of programmers' performance through explained examples. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing systems, pages 67--73, Amsterdam, The Netherlands, 1993. ACM.
[35]
M. P. Robillard. What Makes APIs Hard to Learn? Answers from Developers. IEEE Softw., 26(6):27--34, 2009.
[36]
M. B. Rosson and J. M. Carroll. The reuse of uses in smalltalk programming. ACM Trans. Comput.-Hum. Interact., 3(3):219--253, 1996.
[37]
N. Sahavechaphan and K. Claypool. Xsnippet: mining for sample code. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 413--430, New York, NY, USA, 2006. ACM Press.
[38]
G. Salton. The state of retrieval system evaluation. Inf. Process. Manage., 28(4):441--449, 1992.
[39]
F. Shull, F. Lanubile, and V. R. Basili. Investigating Reading Techniques for Object-Oriented Framework Learning. IEEE Trans. Softw. Eng., 26(11):1101--1118, 2000.
[40]
J. Stylos and B. A. Myers. Mica: A Web-Search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing, pages 195--202. IEEE Computer Society, 2006.
[41]
J. Stylos, B. A. Myers, and Z. Yang. Jadeite: improving API documentation using usage information. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, pages 4429--4434, Boston, MA, USA, 2009. ACM.
[42]
S. Thummalapenta and T. Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 204--213, Atlanta, Georgia, USA, 2007. ACM.
[43]
S. Thummalapenta and T. Xie. SpotWeb: detecting framework hotspots and coldspots via mining open source code on the web. In Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on, pages 327--336, 2008.
[44]
M. Umarji, S. Sim, and C. Lopes. Archetypal Internet-Scale source code searching. In Open Source Development, Communities and Quality, volume 275/2008 of IFIP International Federation for Information Processing, pages 257--263. Springer Boston, 2008.
[45]
Web site for Google Code Search. http://www.google.com/codesearch, 2010.
[46]
Web site for Koders. http://www.koders.com, 2010.
[47]
Web site for Krugle. http://www.krugle.com, 2010.
[48]
P. Willett, J. M. Barnard, and G. M. Downs. Chemical Similarity Searching. Journal of Chemical Information and Computer Sciences, 38(6):983--996, Nov. 1998.
[49]
Y. Ye and G. Fischer. Reuse-conducive development environments. Automated Software Engg., 12:199--235, 2005.
[50]
Y. Ye, G. Fischer, and B. Reeves. Integrating active information delivery and reuse repository systems. pages 60--68, New York, NY, USA, 2000. ACM Press.
[51]
Y. Ye, Y. Yamamoto, K. Nakakoji, Y. Nishinaka, and M. Asada. Searching the library and asking the peers: learning to use java APIs on demand. In Proceedings of the 5th international symposium on Principles and practice of programming in Java, pages 41--50, Lisboa, Portugal, 2007. ACM.

Cited By

View all
  • (2024)Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code FragmentsProceedings of the ACM on Programming Languages10.1145/36564608:PLDI(2051-2072)Online publication date: 20-Jun-2024
  • (2024)The Role of Data Filtering in Open Source Software Ranking and SelectionProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648210(7-12)Online publication date: 16-Apr-2024
  • (2024)CPLCS: Contrastive Prompt Learning-based Code Search with Cross-modal Interaction Mechanism2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650201(1-10)Online publication date: 30-Jun-2024
  • Show More Cited By

Index Terms

  1. Leveraging usage similarity for effective retrieval of examples in code repositories

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
      November 2010
      302 pages
      ISBN:9781605587912
      DOI:10.1145/1882291
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 07 November 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. api usage
      2. code search
      3. software information retrieval
      4. ssi
      5. structural semantic indexing

      Qualifiers

      • Research-article

      Conference

      SIGSOFT/FSE'10
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 17 of 128 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)14
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 28 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code FragmentsProceedings of the ACM on Programming Languages10.1145/36564608:PLDI(2051-2072)Online publication date: 20-Jun-2024
      • (2024)The Role of Data Filtering in Open Source Software Ranking and SelectionProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648210(7-12)Online publication date: 16-Apr-2024
      • (2024)CPLCS: Contrastive Prompt Learning-based Code Search with Cross-modal Interaction Mechanism2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650201(1-10)Online publication date: 30-Jun-2024
      • (2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
      • (2023)CSSAM: Code Search via Attention Matching of Code Semantics and Structures2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00045(402-413)Online publication date: Mar-2023
      • (2023)Context-aware API recommendation using tensor factorizationScience China Information Sciences10.1007/s11432-021-3529-966:2Online publication date: 12-Jan-2023
      • (2022)Lighting up supervised learning in user review-based code localization: dataset and benchmarkProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549141(533-545)Online publication date: 7-Nov-2022
      • (2022)ARSeekProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527918(331-342)Online publication date: 16-May-2022
      • (2022)Mining Similar Methods for Test AdaptationIEEE Transactions on Software Engineering10.1109/TSE.2021.305716348:7(2262-2276)Online publication date: 1-Jul-2022
      • (2021)Searching a database of source codes using contextualized code searchProceedings of the VLDB Endowment10.14778/3401960.340197213:10(1765-1778)Online publication date: 10-Mar-2021
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media