ABSTRACT
Code snippets are prevalent on websites such as Stack Overflow and are effective in demonstrating API usages concisely. However they are usually difficult to be used directly because most code snippets not only are syntactically incomplete but also lack dependency information, and thus do not compile. For example, Java snippets usually do not have import statements or required library names; only 6.88% of Java snippets on Stack Overflow include import statements necessary for compilation.
This paper proposes SnR, a precise, efficient, constraint-based technique to automatically infer the exact types used in code snippets and the libraries containing the inferred types, to compile and therefore reuse the code snippets. Initially, SnR builds a knowledge base of APIs, i.e., various facts about the available APIs, from a corpus of Java libraries. Given a code snippet with missing import statements, SnR automatically extracts typing constraints from the snippet, solves the constraints against the knowledge base, and returns a set of APIs that satisfies the constraints to be imported into the snippet.
We have evaluated SnR on a benchmark of 267 code snippets from Stack Overflow. SnR significantly outperforms the state-of-the-art tool Coster. SnR correctly infers 91.0% of the import statements, which makes 73.8% of the snippets compile, compared to 36.0% of the import statements and 9.0% of the snippets by Coster.
- Rabe Abdalkareem, Emad Shihab, and Juergen Rilling. 2017. On code reuse from StackOverflow: An exploratory study on Android apps. Information and Software Technology 88 (2017), 148--158. Google ScholarDigital Library
- Alexander Aiken and Edward L. Wimmers. 1993. Type Inclusion Constraints and Type Inference. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture (Copenhagen, Denmark) (FPCA '93). Association for Computing Machinery, New York, NY, USA, 31--41. Google ScholarDigital Library
- Nicholas Allen, Padmanabhan Krishnan, and Bernhard Scholz. 2015. Combining Type-Analysis with Points-to Analysis for Analyzing Java Library Source-Code. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (Portland, OR, USA) (SOAP 2015). Association for Computing Machinery, New York, NY, USA, 13--18. Google ScholarDigital Library
- Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259--1295.Google ScholarDigital Library
- Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (Orlando, Florida, USA) (OOPSLA '09). Association for Computing Machinery, New York, NY, USA, 243--262. Google ScholarDigital Library
- cianBuckley. 2013. java - Joda Time converting time zoned date time to milliseconds - Stack Overflow. Retrieved December 22, 2020 from https://web.archive.org/web/20170227042935/http://stackoverflow.com/questions/18274902/jodatime-converting-time-zoned-date-time-to-millisGoogle Scholar
- Barthélémy Dagenais and Martin P. Robillard. 2012. Recovering Traceability Links between an API and Its Learning Resources. In Proceedings of the 34th International Conference on Software Engineering (Zurich, Switzerland) (ICSE '12). IEEE Press, 47--57.Google Scholar
- Steven Dawson, C. R. Ramakrishnan, and David S. Warren. 1996. Practical Program Analysis Using General Purpose Logic Programming Systems---a Case Study. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (Philadelphia, Pennsylvania, USA) (PLDI '96). Association for Computing Machinery, New York, NY, USA, 117--126. Google ScholarDigital Library
- Oege De Moor, Georg Gottlob, Tim Furche, and Andrew Sellers. 2012. Datalog Reloaded: First International Workshop, Datalog 2010, Oxford, UK, March 16--19, 2010. Revised Selected Papers. Vol. 6702. Springer.Google Scholar
- David Greenfieldboyce and Jeffrey S. Foster. 2007. Type Qualifier Inference for Java. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications (Montreal, Quebec, Canada) (OOPSLA '07). Association for Computing Machinery, New York, NY, USA, 321--336. Google ScholarDigital Library
- Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 152--162. Google ScholarDigital Library
- Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. 2011. Datalog and Emerging Applications: An Interactive Tutorial. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 1213--1216. Google ScholarDigital Library
- Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. In Computer Aided Verification, Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer International Publishing, Cham, 422--430.Google Scholar
- Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting Working Code Examples. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 664--675. Google ScholarDigital Library
- C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54--72. Google ScholarDigital Library
- R. S. Malik, J. Patra, and M. Pradel. 2019. NL2Type: Inferring JavaScript Function Types from Natural Language Information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 304--315.Google Scholar
- S. S. Manes and O. Baysal. 2019. How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 235--239. Google ScholarDigital Library
- Pedro Martins, Rohan Achar, and Cristina V. Lopes. 2018. 50K-C: A Dataset of Compilable, and Compiled, Java Projects. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR '18). Association for Computing Machinery, New York, NY, USA, 1--5. Google ScholarDigital Library
- Ali Mesbah, Andrew Rice, Emily Johnston, Nick Glorioso, and Eddie Aftandilian. 2019. DeepDelta: Learning to Repair Compilation Errors.Google Scholar
- Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective Static Race Detection for Java. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (Ottawa, Ontario, Canada) (PLDI '06). Association for Computing Machinery, New York, NY, USA, 308--319. Google ScholarDigital Library
- S. M. Nasehi, J. Sillito, F. Maurer, and C. Burns. 2012. What makes a good code example?: A study of programming Q A in StackOverflow. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). 25--34. Google ScholarDigital Library
- Nicholas Oxhøj, Jens Palsberg, and Michael I. Schwartzbach. 1992. Making type inference practical. In ECOOP '92 European Conference on Object-Oriented Programming, Ole Lehrmann Madsen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 329--349.Google Scholar
- Jens Palsberg and Michael I. Schwartzbach. 1991. Object-Oriented Type Inference. In Conference Proceedings on Object-Oriented Programming Systems, Languages, and Applications (Phoenix, Arizona, USA) (OOPSLA '91). Association for Computing Machinery, New York, NY, USA, 146--161. Google ScholarDigital Library
- Y. Pei, C. A. Furia, M. Nordio, Y. Wei, B. Meyer, and A. Zeller. 2014. Automated Fixing of Programs with Contracts. IEEE Transactions on Software Engineering 40, 5 (2014), 427--449. Google ScholarDigital Library
- H. Phan, H. A. Nguyen, N. M. Tran, L. H. Truong, A. T. Nguyen, and T. N. Nguyen. 2018. Statistical Learning of API Fully Qualified Names in Code Snippets of Online Forums. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 632--642.Google Scholar
- Luca Ponzanelli, Alberto Bacchelli, and Michele Lanza. 2013. Seahawk: Stack overflow in the ide. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 1295--1298.Google ScholarCross Ref
- Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining stackoverflow to turn the ide into a self-confident programming prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories. 102--111.Google ScholarDigital Library
- C. Ragkhitwetsagul, J. Krinke, M. Paixao, G. Bianco, and R. Oliveto. 2019. Toxic Code Snippets on Stack Overflow. IEEE Transactions on Software Engineering (2019), 1--1. Google ScholarCross Ref
- C. M. K. Saifullah. 2020. COSTER. Retrieved May 18, 2020 from https://github.com/khaledkucse/COSTERGoogle Scholar
- C. M. K. Saifullah. 2020. COSTER: A Tool for Finding Fully Qualified Names of API Elements in Online Code Snippets. Retrieved December 22, 2020 from https://youtu.be/oDZtw9MzUWM?t=208Google Scholar
- C M Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal Roy. 2021. COSTER: A Tool for Finding Fully Qualified Names of API Elements in Online Code Snippets (ICSE '21 DEMO).Google Scholar
- C. M. K. Saifullah, M. Asaduzzaman, and C. K. Roy. 2019. Learning from Examples to Find Fully Qualified Names of API Elements in Code Snippets. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 243--254.Google Scholar
- Daniel Smith and Robert Cartwright. 2008. Java Type Inference is Broken: Can We Fix It?. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications (Nashville, TN, USA) (OOPSLA '08). Association for Computing Machinery, New York, NY, USA, 505--524. Google ScholarDigital Library
- Michael Stonebraker. 1988. Readings in database systems. Morgan Kaufmann Publishers Inc.Google Scholar
- Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API Documentation. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 643--652. Google ScholarDigital Library
- Valerio Terragni, Yepang Liu, and Shing-Chi Cheung. 2016. CSNIPPEX: Automated Synthesis of Compilable Code Snippets from Q&A Sites. In Proceedings of the 25th International Symposium on Software Testing and Analysis (Saarbrücken, Germany) (ISSTA 2016). Association for Computing Machinery, New York, NY, USA, 118--129. Google ScholarDigital Library
- Valerio Terragni and Pasquale Salza. 2021. APIzation: Generating Reusable APIs from StackOverflow Code Snippets. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 542--554. Google ScholarDigital Library
- Tiejun Wang and Scott F. Smith. 2001. Precise Constraint-Based Type Inference for Java. In ECOOP 2001 --- Object-Oriented Programming, Jørgen Lindskov Knudsen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 99--117.Google Scholar
- Ying Wang, Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, Bo Yang, Hai Yu, Zhiliang Zhu, and Shing-Chi Cheung. 2018. Do the Dependency Conflicts in My Project Matter?. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 319--330. Google ScholarDigital Library
- A. W. Wong, A. Salimi, S. Chowdhury, and A. Hindle. 2019. Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 318--322.Google Scholar
- E. Wong, Jinqiu Yang, and Lin Tan. 2013. AutoComment: Mining question and answer sites for automatic comment generation. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 562--567. Google ScholarDigital Library
- Di Yang, Aftab Hussain, and Cristina Videira Lopes. 2016. From Query to Usable Code: An Analysis of Stack Overflow Code Snippets. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR '16). Association for Computing Machinery, New York, NY, USA, 391--402. Google ScholarDigital Library
- D. Yang, P. Martins, V. Saini, and C. Lopes. 2017. Stack Overflow in Github: Any Snippets There?. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 280--290. Google ScholarDigital Library
- T. Zhang, D. Yang, C. Lopes, and M. Kim. 2019. Analyzing and Supporting Adaptation of Online Code Examples. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 316--327. Google ScholarDigital Library
Index Terms
- SnR: constraint-based type inference for incomplete Java code snippets
Recommendations
Datalog and constraint satisfaction with infinite templates
On finite structures, there is a well-known connection between the expressive power of Datalog, finite variable logics, the existential pebble game, and bounded hypertree duality. We study this connection for infinite structures. This has applications ...
Java bytecode as a typed term calculus
PPDP '02: Proceedings of the 4th ACM SIGPLAN international conference on Principles and practice of declarative programmingWe propose a type system for the Java bytecode language, prove the type soundness, and develop a type inference algorithm. In contrast to the existing proposals, our type system yields a typed term calculus similar to type systems of lambda calculi. ...
Type elaboration and subtype completion for Java bytecode
Java source code is strongly typed, but the translation from Java source to bytecode omits much of the type information originally contained within methods. Type elaboration is a technique for reconstructing strongly typed programs from incompletely ...
Comments