research-article

SnR: constraint-based type inference for incomplete Java code snippets

Authors:
Yiwen Dong

University of Waterloo, Waterloo, Ontario, Canada

University of Waterloo, Waterloo, Ontario, Canada
View Profile

,
Tianxiao Gu

Alibaba Group, China

Alibaba Group, China
View Profile

,
Yongqiang Tian

University of Waterloo, Waterloo, Ontario, Canada

University of Waterloo, Waterloo, Ontario, Canada
View Profile

,
Chengnian Sun

University of Waterloo, Waterloo, Ontario, Canada

University of Waterloo, Waterloo, Ontario, Canada
View Profile

ICSE '22: Proceedings of the 44th International Conference on Software EngineeringMay 2022Pages 1982–1993https://doi.org/10.1145/3510003.3510061

Published:05 July 2022Publication History

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

Pages 1982–1993

ABSTRACT

Code snippets are prevalent on websites such as Stack Overflow and are effective in demonstrating API usages concisely. However they are usually difficult to be used directly because most code snippets not only are syntactically incomplete but also lack dependency information, and thus do not compile. For example, Java snippets usually do not have import statements or required library names; only 6.88% of Java snippets on Stack Overflow include import statements necessary for compilation.

This paper proposes SnR, a precise, efficient, constraint-based technique to automatically infer the exact types used in code snippets and the libraries containing the inferred types, to compile and therefore reuse the code snippets. Initially, SnR builds a knowledge base of APIs, i.e., various facts about the available APIs, from a corpus of Java libraries. Given a code snippet with missing import statements, SnR automatically extracts typing constraints from the snippet, solves the constraints against the knowledge base, and returns a set of APIs that satisfies the constraints to be imported into the snippet.

We have evaluated SnR on a benchmark of 267 code snippets from Stack Overflow. SnR significantly outperforms the state-of-the-art tool Coster. SnR correctly infers 91.0% of the import statements, which makes 73.8% of the snippets compile, compared to 36.0% of the import statements and 9.0% of the snippets by Coster.

References

Rabe Abdalkareem, Emad Shihab, and Juergen Rilling. 2017. On code reuse from StackOverflow: An exploratory study on Android apps. Information and Software Technology 88 (2017), 148--158. Google ScholarDigital Library
Alexander Aiken and Edward L. Wimmers. 1993. Type Inclusion Constraints and Type Inference. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture (Copenhagen, Denmark) (FPCA '93). Association for Computing Machinery, New York, NY, USA, 31--41. Google ScholarDigital Library
Nicholas Allen, Padmanabhan Krishnan, and Bernhard Scholz. 2015. Combining Type-Analysis with Points-to Analysis for Analyzing Java Library Source-Code. In Proceedings of the 4th ACM SIGPLAN International Workshop on State Of the Art in Program Analysis (Portland, OR, USA) (SOAP 2015). Association for Computing Machinery, New York, NY, USA, 13--18. Google ScholarDigital Library
Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259--1295.Google ScholarDigital Library
Martin Bravenboer and Yannis Smaragdakis. 2009. Strictly Declarative Specification of Sophisticated Points-to Analyses. In Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications (Orlando, Florida, USA) (OOPSLA '09). Association for Computing Machinery, New York, NY, USA, 243--262. Google ScholarDigital Library
cianBuckley. 2013. java - Joda Time converting time zoned date time to milliseconds - Stack Overflow. Retrieved December 22, 2020 from https://web.archive.org/web/20170227042935/http://stackoverflow.com/questions/18274902/jodatime-converting-time-zoned-date-time-to-millisGoogle Scholar
Barthélémy Dagenais and Martin P. Robillard. 2012. Recovering Traceability Links between an API and Its Learning Resources. In Proceedings of the 34th International Conference on Software Engineering (Zurich, Switzerland) (ICSE '12). IEEE Press, 47--57.Google Scholar
Steven Dawson, C. R. Ramakrishnan, and David S. Warren. 1996. Practical Program Analysis Using General Purpose Logic Programming Systems---a Case Study. In Proceedings of the ACM SIGPLAN 1996 Conference on Programming Language Design and Implementation (Philadelphia, Pennsylvania, USA) (PLDI '96). Association for Computing Machinery, New York, NY, USA, 117--126. Google ScholarDigital Library
Oege De Moor, Georg Gottlob, Tim Furche, and Andrew Sellers. 2012. Datalog Reloaded: First International Workshop, Datalog 2010, Oxford, UK, March 16--19, 2010. Revised Selected Papers. Vol. 6702. Springer.Google Scholar
David Greenfieldboyce and Jeffrey S. Foster. 2007. Type Qualifier Inference for Java. In Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications (Montreal, Quebec, Canada) (OOPSLA '07). Association for Computing Machinery, New York, NY, USA, 321--336. Google ScholarDigital Library
Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep Learning Type Inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 152--162. Google ScholarDigital Library
Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. 2011. Datalog and Emerging Applications: An Interactive Tutorial. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 1213--1216. Google ScholarDigital Library
Herbert Jordan, Bernhard Scholz, and Pavle Subotić. 2016. Soufflé: On Synthesis of Program Analyzers. In Computer Aided Verification, Swarat Chaudhuri and Azadeh Farzan (Eds.). Springer International Publishing, Cham, 422--430.Google Scholar
Iman Keivanloo, Juergen Rilling, and Ying Zou. 2014. Spotting Working Code Examples. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 664--675. Google ScholarDigital Library
C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer. 2012. GenProg: A Generic Method for Automatic Software Repair. IEEE Transactions on Software Engineering 38, 1 (2012), 54--72. Google ScholarDigital Library
R. S. Malik, J. Patra, and M. Pradel. 2019. NL2Type: Inferring JavaScript Function Types from Natural Language Information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 304--315.Google Scholar
S. S. Manes and O. Baysal. 2019. How Often and What StackOverflow Posts Do Developers Reference in Their GitHub Projects?. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR). 235--239. Google ScholarDigital Library
Pedro Martins, Rohan Achar, and Cristina V. Lopes. 2018. 50K-C: A Dataset of Compilable, and Compiled, Java Projects. In Proceedings of the 15th International Conference on Mining Software Repositories (Gothenburg, Sweden) (MSR '18). Association for Computing Machinery, New York, NY, USA, 1--5. Google ScholarDigital Library
Ali Mesbah, Andrew Rice, Emily Johnston, Nick Glorioso, and Eddie Aftandilian. 2019. DeepDelta: Learning to Repair Compilation Errors.Google Scholar
Mayur Naik, Alex Aiken, and John Whaley. 2006. Effective Static Race Detection for Java. In Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation (Ottawa, Ontario, Canada) (PLDI '06). Association for Computing Machinery, New York, NY, USA, 308--319. Google ScholarDigital Library
S. M. Nasehi, J. Sillito, F. Maurer, and C. Burns. 2012. What makes a good code example?: A study of programming Q A in StackOverflow. In 2012 28th IEEE International Conference on Software Maintenance (ICSM). 25--34. Google ScholarDigital Library
Nicholas Oxhøj, Jens Palsberg, and Michael I. Schwartzbach. 1992. Making type inference practical. In ECOOP '92 European Conference on Object-Oriented Programming, Ole Lehrmann Madsen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 329--349.Google Scholar
Jens Palsberg and Michael I. Schwartzbach. 1991. Object-Oriented Type Inference. In Conference Proceedings on Object-Oriented Programming Systems, Languages, and Applications (Phoenix, Arizona, USA) (OOPSLA '91). Association for Computing Machinery, New York, NY, USA, 146--161. Google ScholarDigital Library
Y. Pei, C. A. Furia, M. Nordio, Y. Wei, B. Meyer, and A. Zeller. 2014. Automated Fixing of Programs with Contracts. IEEE Transactions on Software Engineering 40, 5 (2014), 427--449. Google ScholarDigital Library
H. Phan, H. A. Nguyen, N. M. Tran, L. H. Truong, A. T. Nguyen, and T. N. Nguyen. 2018. Statistical Learning of API Fully Qualified Names in Code Snippets of Online Forums. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 632--642.Google Scholar
Luca Ponzanelli, Alberto Bacchelli, and Michele Lanza. 2013. Seahawk: Stack overflow in the ide. In 2013 35th International Conference on Software Engineering (ICSE). IEEE, 1295--1298.Google ScholarCross Ref
Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining stackoverflow to turn the ide into a self-confident programming prompter. In Proceedings of the 11th Working Conference on Mining Software Repositories. 102--111.Google ScholarDigital Library
C. Ragkhitwetsagul, J. Krinke, M. Paixao, G. Bianco, and R. Oliveto. 2019. Toxic Code Snippets on Stack Overflow. IEEE Transactions on Software Engineering (2019), 1--1. Google ScholarCross Ref
C. M. K. Saifullah. 2020. COSTER. Retrieved May 18, 2020 from https://github.com/khaledkucse/COSTERGoogle Scholar
C. M. K. Saifullah. 2020. COSTER: A Tool for Finding Fully Qualified Names of API Elements in Online Code Snippets. Retrieved December 22, 2020 from https://youtu.be/oDZtw9MzUWM?t=208Google Scholar
C M Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal Roy. 2021. COSTER: A Tool for Finding Fully Qualified Names of API Elements in Online Code Snippets (ICSE '21 DEMO).Google Scholar
C. M. K. Saifullah, M. Asaduzzaman, and C. K. Roy. 2019. Learning from Examples to Find Fully Qualified Names of API Elements in Code Snippets. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). 243--254.Google Scholar
Daniel Smith and Robert Cartwright. 2008. Java Type Inference is Broken: Can We Fix It?. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-Oriented Programming Systems Languages and Applications (Nashville, TN, USA) (OOPSLA '08). Association for Computing Machinery, New York, NY, USA, 505--524. Google ScholarDigital Library
Michael Stonebraker. 1988. Readings in database systems. Morgan Kaufmann Publishers Inc.Google Scholar
Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API Documentation. In Proceedings of the 36th International Conference on Software Engineering (Hyderabad, India) (ICSE 2014). Association for Computing Machinery, New York, NY, USA, 643--652. Google ScholarDigital Library
Valerio Terragni, Yepang Liu, and Shing-Chi Cheung. 2016. CSNIPPEX: Automated Synthesis of Compilable Code Snippets from Q&A Sites. In Proceedings of the 25th International Symposium on Software Testing and Analysis (Saarbrücken, Germany) (ISSTA 2016). Association for Computing Machinery, New York, NY, USA, 118--129. Google ScholarDigital Library
Valerio Terragni and Pasquale Salza. 2021. APIzation: Generating Reusable APIs from StackOverflow Code Snippets. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). 542--554. Google ScholarDigital Library
Tiejun Wang and Scott F. Smith. 2001. Precise Constraint-Based Type Inference for Java. In ECOOP 2001 --- Object-Oriented Programming, Jørgen Lindskov Knudsen (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 99--117.Google Scholar
Ying Wang, Ming Wen, Zhenwei Liu, Rongxin Wu, Rui Wang, Bo Yang, Hai Yu, Zhiliang Zhu, and Shing-Chi Cheung. 2018. Do the Dependency Conflicts in My Project Matter?. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for Computing Machinery, New York, NY, USA, 319--330. Google ScholarDigital Library
A. W. Wong, A. Salimi, S. Chowdhury, and A. Hindle. 2019. Syntax and Stack Overflow: A Methodology for Extracting a Corpus of Syntax Errors and Fixes. In 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME). 318--322.Google Scholar
E. Wong, Jinqiu Yang, and Lin Tan. 2013. AutoComment: Mining question and answer sites for automatic comment generation. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 562--567. Google ScholarDigital Library
Di Yang, Aftab Hussain, and Cristina Videira Lopes. 2016. From Query to Usable Code: An Analysis of Stack Overflow Code Snippets. In Proceedings of the 13th International Conference on Mining Software Repositories (Austin, Texas) (MSR '16). Association for Computing Machinery, New York, NY, USA, 391--402. Google ScholarDigital Library
D. Yang, P. Martins, V. Saini, and C. Lopes. 2017. Stack Overflow in Github: Any Snippets There?. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 280--290. Google ScholarDigital Library
T. Zhang, D. Yang, C. Lopes, and M. Kim. 2019. Analyzing and Supporting Adaptation of Online Code Examples. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). 316--327. Google ScholarDigital Library

Index Terms

SnR: constraint-based type inference for incomplete Java code snippets
1. Software and its engineering
  1. Software notations and tools
  2. Software organization and properties
    1. Software functional properties
      1. Formal methods
        Automated static analysis
2. Theory of computation
  1. Semantics and reasoning
    1. Program constructs
      1. Type structures

Recommendations

Datalog and constraint satisfaction with infinite templates

On finite structures, there is a well-known connection between the expressive power of Datalog, finite variable logics, the existential pebble game, and bounded hypertree duality. We study this connection for infinite structures. This has applications ...
Read More
Java bytecode as a typed term calculus
PPDP '02: Proceedings of the 4th ACM SIGPLAN international conference on Principles and practice of declarative programming

We propose a type system for the Java bytecode language, prove the type soundness, and develop a type inference algorithm. In contrast to the existing proposals, our type system yields a typed term calculus similar to type systems of lambda calculi. ...
Read More
Type elaboration and subtype completion for Java bytecode

Java source code is strongly typed, but the translation from Java source to bytecode omits much of the type information originally contained within methods. Type elaboration is a technique for reconstructing strongly typed programs from incompletely ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICSE '22: Proceedings of the 44th International Conference on Software Engineering
May 2022
2508 pages
ISBN:9781450392211
DOI:10.1145/3510003
General Chair:
Matthew B Dwyer
University of Virginia
,
Program Chairs:
Daniela Damian
University of Victoria, Canada
,
Andreas Zeller
CISPA, Germany
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 July 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional / v1.1
Author Tags
automated repair
constraint satisfaction
datalog
type inference
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate276of1,856submissions,15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 156
  Total Downloads
- Downloads (Last 12 months)80
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

SnR: constraint-based type inference for incomplete Java code snippets

ICSE '22: Proceedings of the 44th International Conference on Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Datalog and constraint satisfaction with infinite templates

Java bytecode as a typed term calculus

Type elaboration and subtype completion for Java bytecode