skip to main content
10.1145/2678015.2682537acmconferencesArticle/Chapter ViewAbstractPublication PagespepmConference Proceedingsconference-collections
short-paper

Structurally Heterogeneous Source Code Examples from Unstructured Knowledge Sources

Published: 13 January 2015 Publication History

Abstract

Software developers rarely write code from scratch. With the existence of Wikipedia, discussion forums, books and blogs, it is hard to imagine a software developer not looking up these sites for sample code while building any non-trivial software system. While researchers have proposed approaches to retrieve relevant posts and code snippets, the need for finding variant implementations of functionally similar code snippets has been ignored. In this work, we propose an approach to automatically create a repository of structurally heterogeneous but functionally similar source code examples from unstructured sources. We evaluate the approach on stackoverflow, a discussion forum that has approximately 19 million posts. The results of our evaluation indicates that the approach extracts structurally different snippets with a precision of 83%. A repository of such heterogeneous source code examples will be useful to programmers in learning different implementation strategies and for researchers working on problems such as program comprehension, semantic clones and code search.

References

[1]
R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.
[2]
Sushil Bajracharya, Trung Ngo, Erik Linstead, Yimeng Dou, Paul Rigor, Pierre Baldi, and Cristina Lopes. Sourcerer: A search engine for open source code supporting structure-based search. In Companion to the 21st ACM SIGPLAN Symposium on Object-oriented Programming Systems, Languages, and Applications, OOPSLA '06, pages 681--682, New York, NY, USA, 2006. ACM.
[3]
S. K. Bajracharya, J. Ossher, and C. V. Lopes. Leveraging usage similarity for effective retrieval of examples in code repositories. In Proceedings of the Eighteenth ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE '10, pages 157--166, New York, NY, USA, 2010. ACM.
[4]
J. Borstler, M. S. Hall, M. Nordström, J. H. Paterson, K. Sanders, C. Schulte, and L. Thomas. An evaluation of object oriented example programs in introductory programming textbooks. SIGCSE Bull., 41(4):126--143, Jan. 2010.
[5]
J. Cordeiro, B. Antunes, and P. Gomes. Context-based recommendation to support problem solving in software development. In Recommendation Systems for Software Engineering (RSSE), 2012 Third International Workshop on, pages 85--89, June 2012.
[6]
B. Dagenais and L. Hendren. Enabling static analysis for partial java programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications, OOPSLA '08, pages 313--328, New York, NY, USA, 2008. ACM.
[7]
S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In Proceedings of the 38th Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL '11, pages 317--330, New York, NY, USA, 2011. ACM.
[8]
R. Holmes and G. C. Murphy. Using structural context to recommend source code examples. In Proceedings of the 27th International Conference on Software Engineering, ICSE '05, pages 117--125, New York, NY, USA, 2005. ACM.
[9]
R. Holmes, R. J. Walker, and G. C. Murphy. Strathcona example recommendation tool. In Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pages 237--240, New York, NY, USA, 2005. ACM.
[10]
C. McMillan, M. Grechanik, D. Poshyvanyk, Q. Xie, and C. Fu. Portfolio: Finding relevant functions and their usage. In Proceedings of the 33rd International Conference on Software Engineering, ICSE '11, pages 111--120, New York, NY, USA, 2011. ACM.
[11]
L. Ponzanelli, A. Bacchelli, and M. Lanza. Leveraging crowd knowledge for software comprehension and development. In Software Maintenance and Reengineering (CSMR), 2013 17th European Conference on, pages 57--66, March 2013.
[12]
L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, and M. Lanza. Mining stackoverflow to turn the IDE into a self-confident programming prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 102--111, New York, NY, USA, 2014. ACM.
[13]
S. P. Reiss. Semantics-based code search. In Proceedings of the 31st International Conference on Software Engineering, ICSE '09, pages 243--253, Washington, DC, USA, 2009. IEEE Computer Society.
[14]
P. C. Rigby and M. P. Robillard. Discovering essential code elements in informal documentation. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 832--841, Piscataway, NJ, USA, 2013. IEEE Press.
[15]
S. Sim, C. Clarke, and R. Holt. Archetypal source code searches: a survey of software developers and maintainers. In Program Comprehension, 1998. IWPC '98. Proceedings., 6th International Workshop on, pages 180--187, Jun 1998.
[16]
R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '13, pages 15--26, New York, NY, USA, 2013. ACM.
[17]
G. Sridhara, L. Pollock, and K. Vijay-Shanker. Automatically detecting and describing high level actions within methods. In Proceedings of the 33rd International Conference on Software Engineering, ICSE '11, pages 101--110, New York, NY, USA, 2011. ACM.
[18]
S. Tuarob, P. Mitra, and C. L. Giles. "Building a Search Engine for Algorithms" by Suppawong Tuarob, Prasenjit Mitra, and C. Lee Giles with Martin Vesely As Coordinator SIGWEB Newsl., (Winter):5:1--5:9, Jan. 2014.
[19]
F. Zhang, Y.-C. Jhi, D. Wu, P. Liu, and S. Zhu. A first step towards algorithm plagiarism detection. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, ISSTA 2012, pages 111--121, New York, NY, USA, 2012. ACM.
[20]
A. Singhal, C. Buckley, and M. Mitra. Pivoted document length normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '96, pages 21--29, New York, NY, USA, 1996. ACM.
[21]
D. Lawrie, D. Binkley, and C. Morrell. Normalizing source code vocabulary. In Reverse Engineering (WCRE), 2010 17th Working Conference on, pages 3--12, Oct 2010.

Cited By

View all
  • (2023)Graph-based code semantics learning for efficient semantic code clone detectionInformation and Software Technology10.1016/j.infsof.2022.107130156:COnline publication date: 1-Apr-2023
  • (2018)A survey on mining stack overflow: question and answering (Q&A) communityData Technologies and Applications10.1108/DTA-07-2017-005452:2(190-247)Online publication date: 3-Apr-2018
  • (2017)Modeling Source Code to Support Retrieval-Based ApplicationsProceedings of the Tenth ACM International Conference on Web Search and Data Mining10.1145/3018661.3022749(833-833)Online publication date: 2-Feb-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEPM '15: Proceedings of the 2015 Workshop on Partial Evaluation and Program Manipulation
January 2015
152 pages
ISBN:9781450332972
DOI:10.1145/2678015
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2015

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. example retrieval

Qualifiers

  • Short-paper

Conference

POPL '15
Sponsor:

Acceptance Rates

PEPM '15 Paper Acceptance Rate 14 of 27 submissions, 52%;
Overall Acceptance Rate 66 of 120 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Graph-based code semantics learning for efficient semantic code clone detectionInformation and Software Technology10.1016/j.infsof.2022.107130156:COnline publication date: 1-Apr-2023
  • (2018)A survey on mining stack overflow: question and answering (Q&A) communityData Technologies and Applications10.1108/DTA-07-2017-005452:2(190-247)Online publication date: 3-Apr-2018
  • (2017)Modeling Source Code to Support Retrieval-Based ApplicationsProceedings of the Tenth ACM International Conference on Web Search and Data Mining10.1145/3018661.3022749(833-833)Online publication date: 2-Feb-2017
  • (2016)Leveraging a corpus of natural language descriptions for program similarityProceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software10.1145/2986012.2986013(197-211)Online publication date: 20-Oct-2016
  • (2015)Spotting familiar code snippet structures for program comprehensionProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering10.1145/2786805.2807560(1054-1056)Online publication date: 30-Aug-2015

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media