research-article

Codecatch: extracting source code snippets from online sources

Authors:
Themistoklis Diamantopoulos

Aristotle University of Thessaloniki, Thessaloniki, Greece

Aristotle University of Thessaloniki, Thessaloniki, Greece
View Profile

,
Georgios Karagiannopoulos

Aristotle University of Thessaloniki, Thessaloniki, Greece

Aristotle University of Thessaloniki, Thessaloniki, Greece
View Profile

,
Andreas L. Symeonidis

Aristotle University of Thessaloniki, Thessaloniki, Greece

Aristotle University of Thessaloniki, Thessaloniki, Greece
View Profile

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software EngineeringMay 2018Pages 21–27https://doi.org/10.1145/3194104.3194107

Published:28 May 2018Publication History

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

Pages 21–27

ABSTRACT

Nowadays, developers rely on online sources to find example snippets that address the programming problems they are trying to solve. However, contemporary API usage mining methods are not suitable for locating easily reusable snippets, as they provide usage examples for specific APIs, thus requiring the developer to know which library to use beforehand. On the other hand, the approaches that retrieve snippets from online sources usually output a list of examples, without aiding the developer to distinguish among different implementations and without offering any insight on the quality and the reusability of the proposed snippets. In this work, we present CodeCatch, a system that receives queries in natural language and extracts snippets from multiple online sources. The snippets are assessed both for their quality and for their usefulness/preference by the developers, while they are also clustered according to their API calls to allow the developer to select among the different implementations. Preliminary evaluation of CodeCatch in a set of indicative programming problems indicates that it can be a useful tool for the developer.

References

Charu C. Aggarwal and ChengXiang Zhai. 2012. A Survey of Text Clustering Algorithms. Springer US, Boston, MA, 77--128.Google Scholar
Karan Aggarwal, Abram Hindle, and Eleni Stroulia. 2014. Co-evolution of Project Documentation and Popularity Within Github. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR '14). ACM, New York, NY, USA, 360--363. Google ScholarDigital Library
Joel Brandt, Mira Dontcheva, Marcos Weskamp, and Scott R. Klemmer. 2010. Example-centric Programming: Integrating Web Search into the Development Environment. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '10). ACM, New York, NY, USA, 513--522. Google ScholarDigital Library
Raymond P. L. Buse and Westley Weimer. 2012. Synthesizing API Usage Examples. In Proceedings of the 34th International Conference on Software Engineering (ICSE '12). IEEE Press, Piscataway, NJ, USA, 782--792. Google ScholarDigital Library
Raymond P. L. Buse and Westley R. Weimer. 2010. Learning a Metric for Code Readability. IEEE Trans. Softw. Eng. 36, 4 (2010), 546--558. Google ScholarDigital Library
Themistoklis Diamantopoulos and Andreas L. Symeonidis. 2015. Employing Source Code Information to Improve Question-answering in Stack Overflow. In Proceedings of the 12th Working Conference on Mining Software Repositories (MSR '15). IEEE Press, Piscataway, NJ, USA, 454--457. Google ScholarDigital Library
Valasia Dimaridou, Alexandros-Charalampos Kyprianidis, Michail Papamichail, Themistoklis Diamantopoulos, and Andreas Symeonidis. 2017. Towards Modeling the User-Perceived Quality of Source Code using Static Analysis Metrics. In Proceedings of the 12th International Joint Conference on Software Technologies (ICSOFT 2017). SciTePress, Setúbal, Portugal, 73--84.Google ScholarCross Ref
Jaroslav Fowkes and Charles Sutton. 2016. Parameter-free probabilistic API mining across GitHub. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, New York, NY, USA, 254--265. Google ScholarDigital Library
Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, and Stephane Glondu. 2007. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones. In Proceedings of the 29th International Conference on Software Engineering (ICSE '07). IEEE Computer Society, Washington, DC, USA, 96--105. Google ScholarDigital Library
Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting Working Code Examples. In Proceedings of the 36th International Conference on Software Engineering (ICSE '14). ACM, New York, NY, USA, 664--675. Google ScholarDigital Library
Jinhan Kim, Sanghoon Lee, Seung-won Hwang, and Sunghun Kim. 2010. Towards an Intelligent Code Search Engine. In Proceedings of the 24th AAAI Conference on Artificial Intelligence (AAAI '10). AAAI Press, Palo Alto, CA, USA, 1358--1363. Google ScholarDigital Library
David Mandelin, Lin Xu, Rastislav Bodík, and Doug Kimelman. 2005. Jungloid Mining: Helping to Navigate the API Jungle. SIGPLAN Not. 40, 6 (2005), 48--61. Google ScholarDigital Library
João Eduardo Montandon, Hudson Borges, Daniel Felix, and Marco Tulio Valente. 2013. Documenting APIs with examples: Lessons learned with the APIMiner platform. In Proceedings of the 20th Working Conference on Reverse Engineering (WCRE 2013). IEEE Computer Society, Piscataway, NJ, USA, 401--408.Google ScholarCross Ref
Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Andrian Marcus. 2015. How Can I Use This Method?. In Proceedings of the 37th International Conference on Software Engineering - Volume 1 (ICSE '15). IEEE Press, Piscataway, NJ, USA, 880--890. Google ScholarDigital Library
Michail Papamichail, Themistoklis Diamantopoulos, and Andreas L. Symeonidis. 2016. User-Perceived Source Code Quality Estimation based on Static Analysis Metrics. In Proceedings of the 2016 IEEE International Conference on Software Quality, Reliability and Security (QRS 2016). IEEE, Piscataway, NJ, USA, 100--107.Google ScholarCross Ref
Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to Turn the IDE into a Self-confident Programming Prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories (MSR '14). ACM, New York, NY, USA, 102--111. Google ScholarDigital Library
Suresh Thummalapenta and Tao Xie. 2007. PARSEWeb: A Programmer Assistant for Reusing Open Source Code on the Web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE 07). ACM, New York, NY, USA, 204--213. Google ScholarDigital Library
Jue Wang, Yingnong Dang, Hongyu Zhang, Kai Chen, Tao Xie, and Dongmei Zhang. 2013. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 319--328. Google ScholarDigital Library
Jianyong Wang and Jiawei Han. 2004. BIDE: Efficient Mining of Frequent Closed Sequences. In Proceedings of the 20th International Conference on Data Engineering (ICDE '04). IEEE Computer Society, Washington, DC, USA, 79--90. Google ScholarDigital Library
Yi Wei, Nirupama Chandrasekaran, Sumit Gulwani, and Youssef Hamadi. 2015. Building Bing Developer Assistant. Technical Report MSR-TR-2015--36. Microsoft Research.Google Scholar
Doug Wightman, Zi Ye, Joel Brandt, and Roel Vertegaal. 2012. SnipMatch: Using Source Code Context to Enhance Snippet Retrieval and Parameterization. In Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology (UIST '12). ACM, New York, NY, USA, 219--228. Google ScholarDigital Library
Tao Xie and Jian Pei. 2006. MAPO: Mining API Usages from Open Source Repositories. In Proceedings of the 2006 International Workshop on MiningSoftware Repositories (MSR '06). ACM, New York, NY, USA, 54--57. Google ScholarDigital Library

Index Terms

Codecatch: extracting source code snippets from online sources
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Recommender systems
  2. Information systems applications
    1. Data mining
      1. Clustering
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Reusability

Recommendations

Exploiting code search engines to improve programmer productivity
OOPSLA '07: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion

Code Search Engines (CSE) can serve as powerful resources of open source code, as they can search in billions of lines of open source code available on the web. The strength of CSEs can be used for several tasks like searching relevant code samples, ...
Read More
Identifying the Concepts That Are Searchable with Keywords in Code Search Engines
New Frontiers in Artificial Intelligence
Abstract
The (extended position) paper discusses the reason why keyword-based search engines may not be effective in code search, and shows an case study where which kind of concepts in source code can be effectively searched by keyword code search ...
Read More
SpotWeb: detecting framework hotspots via mining open source repositories on the web
MSR '08: Proceedings of the 2008 international working conference on Mining software repositories

The essentials of modern software development (such as low cost and high efficiency) demand software developers to make intensive reuse of existing open source frameworks or libraries (generally referred as frameworks) available on the web. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering
May 2018
67 pages
ISBN:9781450357234
DOI:10.1145/3194104
Conference Chairs:
Walter F. Tichy
Karlsruhe Institute of Technology, Germany
,
Leandro Minku
University of Leicester, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 May 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
API usage mining
code reuse
snippet mining
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 163
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Codecatch: extracting source code snippets from online sources

RAISE '18: Proceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting code search engines to improve programmer productivity

Identifying the Concepts That Are Searchable with Keywords in Code Search Engines

SpotWeb: detecting framework hotspots via mining open source repositories on the web