research-article

Statistical Type Inference for Incomplete Programs

Authors:

Mengting YuanAuthors Info & Claims

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 720 - 732

https://doi.org/10.1145/3611643.3616283

Published: 30 November 2023 Publication History

Get Access

Abstract

We propose a novel two-stage approach, Stir, for inferring types in incomplete programs that may be ill-formed, where whole-program syntactic analysis often fails. In the first stage, Stir predicts a type tag for each token by using neural networks, and consequently, infers all the simple types in the program. In the second stage, Stir refines the complex types for the tokens with predicted complex type tags. Unlike existing machine-learning-based approaches, which solve type inference as a classification problem, Stir reduces it to a sequence-to-graph parsing problem. According to our experimental results, Stir achieves an accuracy of 97.37 % for simple types. By representing complex types as directed graphs (type graphs), Stir achieves a type similarity score of 77.36 % and 59.61 % for complex types and zero-shot complex types, respectively.

Supplementary Material

Video (fse23main-p337-p-video.mp4)

"Given a target program state (or statement) $s$, what is the probability that an input reaches $s$? This is the quantitative reachability analysis problem. For instance, Quantitative reachability analysis can be used to approximate the reliability of a program (where $s$ is a bad state). Traditionally, quantitative reachability analysis is solved as a model counting problem for a formal constraint that represents the (approximate) reachability of $s$ along paths in the program, i.e., probabilistic reachability analysis. However, in preliminary experiments, we failed to run state-of-the-art probabilistic reachability analysis on reasonably large programs. In this paper, we explore statistical methods to estimate reachability probability. An advantage of statistical reasoning is that the size and composition of the program are insubstantial as long as the program can be executed. We are particularly interested in the error compared to the state-of-the-art probabilistic reachability analysis. We realize that existing estimators do not exploit the inherent structure of the program and develop structure-aware estimators to further reduce the estimation error given the same number of samples. Our empirical evaluation on previous and new benchmark programs shows that (i) our statistical reachability analysis outperforms state-of-the-art probabilistic reachability analysis tools in terms of accuracy, efficiency, and scalability, and (ii) our structure-aware estimators further outperform (blackbox) estimators that do not exploit the inherent program structure. We also identify multiple program properties that limit the applicability of the existing probabilistic analysis techniques."

Download
60.76 MB

References

[1]

Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: neural type hints. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2020, London, UK, June 15-20, 2020, Alastair F. Donaldson and Emina Torlak (Eds.). ACM, 91–105. https://doi.org/10.1145/3385412.3385997

Abstract

Supplementary Material

References

Index Terms

Recommendations

Principal Type Schemes for Gradual Programs

Deep learning type inference

Polymorphic type inference and abstract data types

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Badges

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations