research-article

Leveraging usage similarity for effective retrieval of examples in code repositories

Authors:

Sushil K. Bajracharya,

Cristina V. LopesAuthors Info & Claims

FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering

Pages 157 - 166

https://doi.org/10.1145/1882291.1882316

Published: 07 November 2010 Publication History

Abstract

Developers often learn to use APIs (Application Programming Interfaces) by looking at existing examples of API usage. Code repositories contain many instances of such usage of APIs. However, conventional information retrieval techniques fail to perform well in retrieving API usage examples from code repositories. This paper presents Structural Semantic Indexing (SSI), a technique to associate words to source code entities based on similarities of API usage. The heuristic behind this technique is that entities (classes, methods, etc.) that show similar uses of APIs are semantically related because they do similar things. We evaluate the effectiveness of SSI in code retrieval by comparing three SSI based retrieval schemes with two conventional baseline schemes. We evaluate the performance of the retrieval schemes by running a set of 20 candidate queries against a repository containing 222,397 source code entities from 346 jars belonging to the Eclipse framework. The results of the evaluation show that SSI is effective in improving the retrieval of examples in code repositories.

References

[1]

Stackoverflow Web Site. http://stackoverflow.com.

[2]

Java2s Web Site. http://java2s.com/.

[3]

Apache lucene - scoring web page http://lucene.apache.org/java/2_4_0/scoring.html, Mar 2010.

[4]

Eclipse faqs web site http://wiki.eclipse.org/index.php/Eclipse_FAQs, Jan 2010.

[5]

Lucene web site. http://lucene.apache.org, Jan 2010.

[6]

Sourcerer wiki page on api location http://wiki.github.com/sourcerer/Sourcerer/locating, Jan 2010.

[7]

Swt snippets example web site http://www.eclipse.org/swt/snippets/, Jan. 2010.

[8]

J. Arthorne and C. Laffra. Official Eclipse 3.0 FAQs. Addison-Wesley Professional, July 2004.

Digital Library

[9]

S. Bajracharya, T. Ngo, E. Linstead, Y. Dou, P. Rigor, P. Baldi, and C. Lopes. Sourcerer: a search engine for open source code supporting structure-based search. pages 681--682, New York, NY, USA, 2006. ACM Press.

Digital Library

[10]

S. Bajracharya, J. Ossher, and C. Lopes. Sourcerer: An internet-scale software repository. In First Intl. Workshop on Search Driven Development -- Users, Infrastructure, Tools and Evaluation. ICSE 2009, 2009.

Digital Library

[11]

M. Balabanović and Y. Shoham. Fab: content-based, collaborative recommendation. Commun. ACM, 40(3):66--72, 1997.

Digital Library

[12]

M. Bruch, M. Monperrus, and M. Mezini. Learning from examples to improve code completion systems. In Proceegings of FSE, pages 213--222, Amsterdam, The Netherlands, 2009. ACM.

Digital Library

[13]

S. Chatterjee, S. Juvekar, and K. Sen. SNIFF: A Search Engine for Java Using Free-Form Queries. In Fundamental Approaches to Software Engineering, pages 385--400. 2009.

Digital Library

[14]

C. W. Cleverdon. Factors determining the performance of indexing systems. 1966.

[15]

B. Croft, D. Metzler, and T. Strohman. Search Engines: Information Retrieval in Practice. Addison Wesley, 1 edition, Feb. 2009.

Digital Library

[16]

B. Dagenais and H. Ossher. Automatically locating framework extension examples. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering, pages 203--213, Atlanta, Georgia, 2008. ACM.

Digital Library

[17]

J. D'Anjou, S. Fairbrother, D. Kehn, J. Kellerman, and P. McCarthy. The Java Developer's Guide to Eclipse, 2nd Edition. Addison-Wesley Professional, 2 edition, Nov. 2004.

Digital Library

[18]

G. Fischer, S. Henninger, and D. Redmiles. Cognitive tools for locating and comprehending software objects for reuse. In Proceedings of the 13th international conference on Software engineering, pages 318--328, Austin, Texas, United States, 1991. IEEE Computer Society Press.

Digital Library

[19]

G. W. Furnas, T. K. Landauer, L. M. Gomez, and S. T. Dumais. The vocabulary problem in human-system communication. Commun. ACM, 30:964--971, 1987.

Digital Library

[20]

M. Grechanik, K. M. Conroy, and K. A. Probst. Finding Relevant Applications for Prototyping. In Proceedings of the Fourth International Workshop on Mining Software Repositories, page 12. IEEE Computer Society, 2007.

Digital Library

[21]

M. Grechanik and D. Poshyvanyk. Evaluating recommended applications. In Proceedings of the 2008 international workshop on Recommendation systems for software engineering, pages 33--35, Atlanta, Georgia, 2008. ACM.

Digital Library

[22]

S. Henninger. An evolutionary approach to constructing effective software reuse repositories. ACM Trans. Softw. Eng. Methodol., 6(2):111--140, 1997.

Digital Library

[23]

S. R. Henninger. Locating relevant examples for example-based software design. PhD thesis, University of Colorado at Boulder, 1993.

Digital Library

[24]

R. Hoffmann, J. Fogarty, and D. S. Weld. Assieme: finding and leveraging implicit references in a web search interface for programmers. In Proceedings of the 20th annual ACM symposium on User interface software and technology, pages 13--22, Newport, Rhode Island, USA, 2007. ACM.

Digital Library

[25]

M. Hollander and D. A. Wolfe. Nonparametric Statistical Methods, 2nd Edition. Wiley-Interscience, 2 edition, Jan. 1999.

[26]

R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In ICSE '05: Proceedings of the 27th international conference on Software engineering, pages 117--125, New York, NY, USA, 2005. ACM Press.

Digital Library

[27]

E. Linstead, S. Bajracharya, T. Ngo, P. Rigor, C. Lopes, and P. Baldi. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery, 18(2):300--336, Apr. 2009.

Digital Library

[28]

G. Little and R. C. Miller. Keyword programming in java. Automated Software Engg., 16(1):37--71, 2009.

Digital Library

[29]

D. Mandelin, L. Xu, R. Bodík, and D. Kimelman. Jungloid mining: helping to navigate the api jungle. In PLDI '05, pages 48--61, New York, NY, USA, 2005. ACM Press.

Digital Library

[30]

C. D. Manning, P. Raghavan, and H. Schufitze. Introduction to Information Retrieval. Cambridge University Press, 1 edition, July 2008.

Digital Library

[31]

F. McCarey, M. O. Cinneide, and N. Kushmerick. A recommender agent for software libraries: An evaluation of memory-based and model-based collaborative filtering. pages 154--162. IEEE Computer Society, 2006.

Digital Library

[32]

J. Nykaza, R. Messinger, F. Boehme, C. L. Norman, M. Mace, and M. Gordon. What programmers really want: results of a needs assessment for SDK documentation. In Proceedings of the 20th annual international conference on Computer documentation, pages 133--141, Toronto, Ontario, Canada, 2002. ACM.

Digital Library

[33]

J. Ossher, S. Bajracharya, and C. Lopes. SourcererDB: An aggregated repository of statically analyzed and cross-linked open source java projects. In MSR 2009: 6th IEEE Working Conference on Mining Software Repositories, 2009.

Digital Library

[34]

D. F. Redmiles. Reducing the variability of programmers' performance through explained examples. In Proceedings of the INTERACT '93 and CHI '93 conference on Human factors in computing systems, pages 67--73, Amsterdam, The Netherlands, 1993. ACM.

Digital Library

[35]

M. P. Robillard. What Makes APIs Hard to Learn? Answers from Developers. IEEE Softw., 26(6):27--34, 2009.

Digital Library

[36]

M. B. Rosson and J. M. Carroll. The reuse of uses in smalltalk programming. ACM Trans. Comput.-Hum. Interact., 3(3):219--253, 1996.

Digital Library

[37]

N. Sahavechaphan and K. Claypool. Xsnippet: mining for sample code. In OOPSLA '06: Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, pages 413--430, New York, NY, USA, 2006. ACM Press.

Digital Library

[38]

G. Salton. The state of retrieval system evaluation. Inf. Process. Manage., 28(4):441--449, 1992.

Digital Library

[39]

F. Shull, F. Lanubile, and V. R. Basili. Investigating Reading Techniques for Object-Oriented Framework Learning. IEEE Trans. Softw. Eng., 26(11):1101--1118, 2000.

Digital Library

[40]

J. Stylos and B. A. Myers. Mica: A Web-Search tool for finding API components and examples. In Proceedings of the Visual Languages and Human-Centric Computing, pages 195--202. IEEE Computer Society, 2006.

Digital Library

[41]

J. Stylos, B. A. Myers, and Z. Yang. Jadeite: improving API documentation using usage information. In Proceedings of the 27th international conference extended abstracts on Human factors in computing systems, pages 4429--4434, Boston, MA, USA, 2009. ACM.

Digital Library

[42]

S. Thummalapenta and T. Xie. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering, pages 204--213, Atlanta, Georgia, USA, 2007. ACM.

Digital Library

[43]

S. Thummalapenta and T. Xie. SpotWeb: detecting framework hotspots and coldspots via mining open source code on the web. In Automated Software Engineering, 2008. ASE 2008. 23rd IEEE/ACM International Conference on, pages 327--336, 2008.

Digital Library

[44]

M. Umarji, S. Sim, and C. Lopes. Archetypal Internet-Scale source code searching. In Open Source Development, Communities and Quality, volume 275/2008 of IFIP International Federation for Information Processing, pages 257--263. Springer Boston, 2008.

[45]

Web site for Google Code Search. http://www.google.com/codesearch, 2010.

[46]

Web site for Koders. http://www.koders.com, 2010.

[47]

Web site for Krugle. http://www.krugle.com, 2010.

[48]

P. Willett, J. M. Barnard, and G. M. Downs. Chemical Similarity Searching. Journal of Chemical Information and Computer Sciences, 38(6):983--996, Nov. 1998.

[49]

Y. Ye and G. Fischer. Reuse-conducive development environments. Automated Software Engg., 12:199--235, 2005.

Digital Library

[50]

Y. Ye, G. Fischer, and B. Reeves. Integrating active information delivery and reuse repository systems. pages 60--68, New York, NY, USA, 2000. ACM Press.

Digital Library

[51]

Y. Ye, Y. Yamamoto, K. Nakakoji, Y. Nishinaka, and M. Asada. Searching the library and asking the peers: learning to use java APIs on demand. In Proceedings of the 5th international symposium on Principles and practice of programming in Java, pages 41--50, Lisboa, Portugal, 2007. ACM.

Digital Library

Cited By

Matute GNi WBarik TCheung AChasins S(2024)Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code FragmentsProceedings of the ACM on Programming Languages10.1145/36564608:PLDI(2051-2072)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656460
Malviya-Thakur AMockus A(2024)The Role of Data Filtering in Open Source Software Ranking and SelectionProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648210(7-12)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643664.3648210
Zhang YLiu YFan XLu Y(2024)CPLCS: Contrastive Prompt Learning-based Code Search with Cross-modal Interaction Mechanism2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650201(1-10)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650201
Show More Cited By

Index Terms

Leveraging usage similarity for effective retrieval of examples in code repositories
1. Information systems
  1. Information retrieval
2. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments

Recommendations

Searching API usage examples in code repositories with sourcerer API search
SUITE '10: Proceedings of 2010 ICSE Workshop on Search-driven Development: Users, Infrastructure, Tools and Evaluation

We present Sourcerer API Search (SAS), a search interface to find API usage examples in large code repositories. SAS facilitates finding API usage examples by providing three unique features: (i) code snippets view for each result that shows the ...
Eclipse API usage: the good and the bad

Today, when constructing software systems, many developers build their systems on top of frameworks. Eclipse is such a framework that has been in existence for over a decade. Like many other evolving software systems, the Eclipse platform has both ...
Improving API Usage through Automatic Detection of Redundant Code
ASE '09: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering

Software projects often rely on third-party libraries made accessible through Application Programming Interfaces (APIs). We have observed many cases where APIs are used in ways that are not the most effective. We developed a technique and tool support ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FSE '10: Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering

November 2010

302 pages

ISBN:9781605587912

DOI:10.1145/1882291

General Chair:
Gruia-Catalin Roman
Washington University in St. Louis, USA
,
Program Chair:
André van der Hoek
University of California, Irvine, USA

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGSOFT/FSE'10

Sponsor:

SIGSOFT

SIGSOFT/FSE'10: 18th ACM SIGSOFT Symposium on the Foundations of Software Engineering

November 7 - 11, 2010

New Mexico, Santa Fe, USA

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
1,012
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Matute GNi WBarik TCheung AChasins S(2024)Syntactic Code Search with Sequence-to-Tree Matching: Supporting Syntactic Search with Incomplete Code FragmentsProceedings of the ACM on Programming Languages10.1145/36564608:PLDI(2051-2072)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656460
Malviya-Thakur AMockus A(2024)The Role of Data Filtering in Open Source Software Ranking and SelectionProceedings of the 1st IEEE/ACM International Workshop on Methodological Issues with Empirical Studies in Software Engineering10.1145/3643664.3648210(7-12)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3643664.3648210
Zhang YLiu YFan XLu Y(2024)CPLCS: Contrastive Prompt Learning-based Code Search with Cross-modal Interaction Mechanism2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650201(1-10)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10650201
Di Grazia LPradel M(2023)Code Search: A Survey of Techniques for Finding CodeACM Computing Surveys10.1145/356597155:11(1-31)Online publication date: 9-Feb-2023
https://dl.acm.org/doi/10.1145/3565971
Cai BYu YHu Y(2023)CSSAM: Code Search via Attention Matching of Code Semantics and Structures2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER56733.2023.00045(402-413)Online publication date: Mar-2023
https://doi.org/10.1109/SANER56733.2023.00045
Zhou YChen CWang YHan TChen T(2023)Context-aware API recommendation using tensor factorizationScience China Information Sciences10.1007/s11432-021-3529-966:2Online publication date: 12-Jan-2023
https://doi.org/10.1007/s11432-021-3529-9
Hu XGuo YLu JZhu ZLi CGe JHuang LLuo BRoychoudhury ACadar CKim M(2022)Lighting up supervised learning in user review-based code localization: dataset and benchmarkProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3549141(533-545)Online publication date: 7-Nov-2022
https://dl.acm.org/doi/10.1145/3540250.3549141
Luong KHadi MThung FFard FLo DRastogi ATufano RBavota GArnaoudova VHaiduc S(2022)ARSeekProceedings of the 30th IEEE/ACM International Conference on Program Comprehension10.1145/3524610.3527918(331-342)Online publication date: 16-May-2022
https://dl.acm.org/doi/10.1145/3524610.3527918
Sondhi DJobanputra MRani DPurandare SSharma SPurandare R(2022)Mining Similar Methods for Test AdaptationIEEE Transactions on Software Engineering10.1109/TSE.2021.305716348:7(2262-2276)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TSE.2021.3057163
Mukherjee RChaudhuri SJermaine C(2021)Searching a database of source codes using contextualized code searchProceedings of the VLDB Endowment10.14778/3401960.340197213:10(1765-1778)Online publication date: 10-Mar-2021
https://dl.acm.org/doi/10.14778/3401960.3401972
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten