research-article

Generating Python Type Annotations from Type Inference: How Far Are We?

Authors:

Baowen XuAuthors Info & Claims

ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 5

Article No.: 123, Pages 1 - 38

https://doi.org/10.1145/3652153

Published: 03 June 2024 Publication History

Abstract

In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go.

In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.

References

[1]

Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 91–105.

Digital Library

[2]

Jong-hoon An, Avik Chaudhuri, Jeffrey S. Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices 46, 1 (2011), 459–472.

Digital Library

[3]

Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In ECOOP, Vol. 5. Springer, 428–452.

Digital Library

[4]

Justus Bogner and Manuel Merkel. 2022. To type or not to type? A systematic comparison of the software quality of JavaScript and typescript applications on GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. 658–669.

Digital Library

[5]

Brett Cannon. 2005. Localized Type Inference of Atomic Types in Python. California Polytechnic State University.

[6]

Satish Chandra, Colin S. Gordon, Jean-Baptiste Jeannin, Cole Schlesinger, Manu Sridharan, Frank Tip, and Youngil Choi. 2016. Type inference for static compilation of JavaScript. ACM SIGPLAN Notices 51, 10 (2016), 410–429.

Digital Library

[7]

Collin Winter and Tony Lownds. 2006. PEP 3107 - Function Annotations. https://peps.python.org/pep-3107/

[8]

Bas Cornelissen, Andy Zaidman, and Arie van Deursen. 2010. A controlled experiment for program comprehension through trace visualization. IEEE Transactions on Software Engineering 37, 3 (2010), 341–355.

Digital Library

[9]

Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. PYInfer: Deep learning semantic type inference for Python variables. arXiv preprint arXiv:2106.14316 (2021).

[10]

Santanu Kumar Dash, Miltiadis Allamanis, and Earl T. Barr. 2018. RefiNym: Using names to refine types. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107–117.

Digital Library

[11]

Luca Di Grazia and Michael Pradel. 2022. The evolution of type annotations in Python: An empirical study. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.

Digital Library

[12]

Michael Furr, Jong-hoon An, Jeffrey S. Foster, and Michael Hicks. 2009. Static type inference for Ruby. In Proceedings of the 2009 ACM Symposium on Applied Computing. 1859–1866.

Digital Library

[13]

Zheng Gao, Christian Bird, and Earl T. Barr. 2017. To type or not to type: Quantifying detectable bugs in JavaScript. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 758–769.

Digital Library

[14]

Google. 2018. Pytype. https://github.com/google/pytype

[15]

Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, and Andreas Stefik. 2014. An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering 19, 5 (2014), 1335–1382.

Digital Library

[16]

Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMT-based type inference for Python 3. In Computer Aided Verification: 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II 30. Springer, 12–19.

[17]

Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 152–162.

Digital Library

[18]

Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for JavaScript. In SAS, Vol. 9. Springer, 238–255.

Digital Library

[19]

Kevin Jesse, Premkumar T. Devanbu, and Toufique Ahmed. 2021. Learning type annotation: Is big data enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1483–1486.

Digital Library

[20]

Jetbrains. 2020. Python developer survey conducted by Jetbrains and Python software foundation. https://www.jetbrains.com/lp/python-developers-survey-2020/

[21]

Wuxia Jin, Dinghong Zhong, Zifan Ding, Ming Fan, and Ting Liu. 2021. Where to start: Studying type annotation practices in Python. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 529–541.

Digital Library

[22]

Faizan Khan, Boqi Chen, Daniel Varro, and Shane Mcintosh. 2021. An empirical study of type-related defects in Python projects. IEEE Transactions on Software Engineering 48, 8 (2021), 3145–3158.

Digital Library

[23]

Robert V. Krejcie and Daryle W. Morgan. 1970. Determining sample size for research activities. Educational and Psychological Measurement 30, 3 (1970), 607–610.

[24]

Triet H. M. Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.

Digital Library

[25]

Jukka Lehtosalo. 2019. PEP 589 – TypedDict: Type hints for dictionaries with a fixed set of keys. https://www.python.org/dev/peps/pep-0589/

[26]

Python. 2014. mypy. https://mypy-lang.org/

[27]

Ivan Levkivskyi, Jukka Lehtosalo, and Łukasz Langa. 2017. PEP 544 – Protocols: Structural subtyping (static duck typing). https://www.python.org/dev/peps/pep-0544/

[28]

Magnus Madsen. 2015. Static analysis of dynamic languages. https://pure.au.dk/ws/files/85299449/Thesis.pdf (2015).

[29]

Eva Maia, Nelma Moreira, and Rogério Reis. 2012. A static type inference for Python. Proc. of DYLA 5, 1 (2012), 1.

[30]

Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: Inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 304–315.

Digital Library

[31]

Microsoft. 2019. Pyright. https://github.com/microsoft/pyright

[32]

Amir M. Mir, Evaldas Latoškinas, and Georgios Gousios. 2021. ManyTypes4Py: A benchmark Python dataset for machine learning-based type inference. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 585–589.

[33]

Amir M. Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.

Digital Library

[34]

Yusuke Miyazaki, Taro Sekiyama, and Atsushi Igarashi. 2019. Dynamic type inference for gradual Hindley–Milner typing. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.

Digital Library

[35]

GitHub Octoverse. 2022. The 2022 state of open source software. https://octoverse.github.com/

[36]

John-Paul Ore, Carrick Detweiler, and Sebastian Elbaum. 2021. An empirical study on type annotations: Accuracy, speed, and suggestion effectiveness. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1–29.

Digital Library

[37]

John-Paul Ore, Sebastian Elbaum, Carrick Detweiler, and Lambros Karkazis. 2018. Assessing the type annotation burden. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 190–201.

Digital Library

[38]

Francisco Ortin, Jose Baltasar Garcia Perez-Schofield, and Jose Manuel Redondo. 2015. Towards a static type checker for Python. In European Conference on Object-Oriented Programming (ECOOP), Scripts to Programs Workshop, STOP, Vol. 15. 1–2.

[39]

Irene Vlassi Pandi, Earl T. Barr, Andrew D. Gordon, and Charles Sutton. 2020. OptTyper: Probabilistic type inference by optimising logical and natural constraints. arXiv preprint arXiv:2004.00348 (2020).

[40]

Jibesh Patra and Michael Pradel. 2022. Nalin: Learning from runtime behavior to find name-value inconsistencies in Jupyter Notebooks. In Proceedings of the 44th International Conference on Software Engineering. 1469–1481.

Digital Library

[41]

Zvonimir Pavlinovic. 2019. Leveraging Program Analysis for Type Inference. Ph. D. Dissertation. New York University.

[42]

Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: A hybrid type inference approach for Python. In Proceedings of the 44th International Conference on Software Engineering. 2019–2030.

Digital Library

[43]

Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. TypeWriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.

Digital Library

[44]

Facebook. 2017. mypy. https://pyre-check.org/

[45]

Jochen Quante. 2008. Do Dynamic Object Process Graphs Support Program Understanding?-A Controlled Experiment. In 2008 16th IEEE International Conference on Program Comprehension. IEEE, 73–82.

Digital Library

[46]

Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 155–165.

Digital Library

[47]

Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from ”big code”. ACM SIGPLAN Notices 50, 1 (2015), 111–124.

Digital Library

[48]

Brianna M. Ren, John Toman, T Stephen Strickland, and Jeffrey S. Foster. 2013. The ruby type checker. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. 1565–1572.

Digital Library

[49]

Michael Salib. 2004. Starkiller: A Static Type Inferencer and Compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.

[50]

Guido Salvaneschi and Mira Mezini. 2016. Debugging for reactive programming. In Proceedings of the 38th International Conference on Software Engineering. 796–807.

Digital Library

[51]

Sandro Schulze, Jörg Liebig, Janet Siegmund, and Sven Apel. 2013. Does the discipline of preprocessor annotations matter? A controlled experiment. In Proceedings of the 12th International Conference on Generative Programming: Concepts & Experiences. 65–74.

Digital Library

[52]

Hinrich Schütze, Christopher D. Manning, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Vol. 39. Cambridge University Press Cambridge.

[53]

Jeremy Siek and Walid Taha. 2007. Gradual typing for objects. In ECOOP 2007–Object-Oriented Programming: 21st European Conference, Berlin, Germany, July 30-August 3, 2007. Proceedings 21. Springer, 2–27.

Digital Library

[54]

IEEE Spectrum. 2022. Top Programming Languages 2022. https://spectrum.ieee.org/top-programming-languages

[55]

Andreas Stuchlik and Stefan Hanenberg. 2011. Static vs. dynamic type systems: An empirical study about the relationship between type casts and development time. In Proceedings of the 7th Symposium on Dynamic Languages. 97–106.

Digital Library

[56]

Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2022. Static type recommendation for Python. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.

Digital Library

[57]

Yida Tao, Jindae Kim, Sunghun Kim, and Chang Xu. 2014. Automatically generated patches as debugging aids: A human study. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 64–74.

Digital Library

[58]

Guido van Rossum, Jukka Lehtosalo, and Łukasz Langa. 2014. PEP 484 – Type Hints. https://www.python.org/dev/peps/pep-0484/

[59]

Guido van Rossum and Ivan Levkivskyi. 2014. PEP 483 – The Theory of Type Hints. https://www.python.org/dev/peps/pep-0483/

[60]

Guido van van Rossum. 2004. Adding Optional Static Typing to Python. https://www.artima.com/weblogs/viewpost.jsp?thread=85551

[61]

Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic Languages. 45–56.

Digital Library

[62]

Yin Wang. 2014. Pysonar2. https://github.com/yinwang0/pysonar2

[63]

Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. LambdaNet: Probabilistic type inference using graph neural networks. arXiv preprint arXiv:2005.02161 (2020).

[64]

Zhaogui Xu, Peng Liu, Xiangyu Zhang, and Baowen Xu. 2016. Python predictive analysis for bug detection. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 121–132.

Digital Library

[65]

Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 607–618.

Digital Library

[66]

Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. DLInfer: Deep learning with static slicing for Python type inference. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021.

Digital Library

[67]

Łukasz Langa. 2019. PEP 585 – Type hinting generics in standard collections. https://www.python.org/dev/peps/pep-0585/

Cited By

Wu JLemieux C(2024)QuAC: Quick Attribute-Centric Type Inference for PythonProceedings of the ACM on Programming Languages10.1145/36897838:OOPSLA2(2040-2069)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689783
Xu SShen JLi YYao YYu PXu FMa X(2024)On the Heterophily of Program Graphs: A Case Study of Graph-based Type InferenceProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671389(1-10)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671389
Venkatesh ASabu SChekkapalli MWang JLi LBodden E(2024)Static analysis driven enhancements for comprehension in machine learning notebooksEmpirical Software Engineering10.1007/s10664-024-10525-w29:5Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1007/s10664-024-10525-w

Index Terms

Generating Python Type Annotations from Type Inference: How Far Are We?
1. General and reference
  1. Cross-computing tools and techniques
    1. Empirical studies
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Automatic programming
  2. Software notations and tools
    1. General programming languages
      1. Language features

Recommendations

The evolution of type annotations in python: an empirical study
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Type annotations and gradual type checkers attempt to reveal errors and facilitate maintenance in dynamically typed programming languages. Despite the availability of these features and tools, it is currently unclear how quickly developers are ...
ML^F: raising ML to the power of system F
ICFP '03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programming

We propose a type system ML^F that generalizes ML with first-class polymorphism as in System F. Expressions may contain second-order type annotations. Every typable expression admits a principal type, which however depends on type annotations. Principal ...
ML^F: raising ML to the power of system F

We propose a type system ML^F that generalizes ML with first-class polymorphism as in System F. Expressions may contain second-order type annotations. Every typable expression admits a principal type, which however depends on type annotations. Principal ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology

ACM Transactions on Software Engineering and Methodology Volume 33, Issue 5

June 2024

952 pages

EISSN:1557-7392

DOI:10.1145/3618079

Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2024

Online AM: 11 March 2024

Accepted: 22 February 2024

Revised: 05 December 2023

Received: 13 February 2023

Published in TOSEM Volume 33, Issue 5

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Cooperation Fund of Nanjing University-Huawei Novel Software Technology Lab

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
542
Total Downloads

Downloads (Last 12 months)542
Downloads (Last 6 weeks)47

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu JLemieux C(2024)QuAC: Quick Attribute-Centric Type Inference for PythonProceedings of the ACM on Programming Languages10.1145/36897838:OOPSLA2(2040-2069)Online publication date: 8-Oct-2024
https://dl.acm.org/doi/10.1145/3689783
Xu SShen JLi YYao YYu PXu FMa X(2024)On the Heterophily of Program Graphs: A Case Study of Graph-based Type InferenceProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671389(1-10)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671389
Venkatesh ASabu SChekkapalli MWang JLi LBodden E(2024)Static analysis driven enhancements for comprehension in machine learning notebooksEmpirical Software Engineering10.1007/s10664-024-10525-w29:5Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1007/s10664-024-10525-w

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents