skip to main content
research-article

Generating Python Type Annotations from Type Inference: How Far Are We?

Published: 03 June 2024 Publication History

Abstract

In recent years, dynamic languages such as Python have become popular due to their flexibility and productivity. The lack of static typing makes programs face the challenges of fixing type errors, early bug detection, and code understanding. To alleviate these issues, PEP 484 introduced optional type annotations for Python in 2014, but unfortunately, a large number of programs are still not annotated by developers. Annotation generation tools can utilize type inference techniques. However, several important aspects of type annotation generation are overlooked by existing works, such as in-depth effectiveness analysis, potential improvement exploration, and practicality evaluation. And it is unclear how far we have been and how far we can go.
In this paper, we set out to comprehensively investigate the effectiveness of type inference tools for generating type annotations, applying three categories of state-of-the-art tools on a carefully-cleaned dataset. First, we use a comprehensive set of metrics and categories, finding that existing tools have different effectiveness and cannot achieve both high accuracy and high coverage. Then, we summarize six patterns to present the limitations in type annotation generation. Next, we implement a simple but effective tool to demonstrate that existing tools can be improved in practice. Finally, we conduct a controlled experiment showing that existing tools can reduce the time spent annotating types and determine more precise types, but cannot reduce subjective difficulty. Our findings point out the limitations and improvement directions in type annotation generation, which can inspire future work.

References

[1]
Miltiadis Allamanis, Earl T. Barr, Soline Ducousso, and Zheng Gao. 2020. Typilus: Neural type hints. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 91–105.
[2]
Jong-hoon An, Avik Chaudhuri, Jeffrey S. Foster, and Michael Hicks. 2011. Dynamic inference of static types for Ruby. ACM SIGPLAN Notices 46, 1 (2011), 459–472.
[3]
Christopher Anderson, Paola Giannini, and Sophia Drossopoulou. 2005. Towards type inference for JavaScript. In ECOOP, Vol. 5. Springer, 428–452.
[4]
Justus Bogner and Manuel Merkel. 2022. To type or not to type? A systematic comparison of the software quality of JavaScript and typescript applications on GitHub. In Proceedings of the 19th International Conference on Mining Software Repositories. 658–669.
[5]
Brett Cannon. 2005. Localized Type Inference of Atomic Types in Python. California Polytechnic State University.
[6]
Satish Chandra, Colin S. Gordon, Jean-Baptiste Jeannin, Cole Schlesinger, Manu Sridharan, Frank Tip, and Youngil Choi. 2016. Type inference for static compilation of JavaScript. ACM SIGPLAN Notices 51, 10 (2016), 410–429.
[7]
Collin Winter and Tony Lownds. 2006. PEP 3107 - Function Annotations. https://peps.python.org/pep-3107/
[8]
Bas Cornelissen, Andy Zaidman, and Arie van Deursen. 2010. A controlled experiment for program comprehension through trace visualization. IEEE Transactions on Software Engineering 37, 3 (2010), 341–355.
[9]
Siwei Cui, Gang Zhao, Zeyu Dai, Luochao Wang, Ruihong Huang, and Jeff Huang. 2021. PYInfer: Deep learning semantic type inference for Python variables. arXiv preprint arXiv:2106.14316 (2021).
[10]
Santanu Kumar Dash, Miltiadis Allamanis, and Earl T. Barr. 2018. RefiNym: Using names to refine types. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 107–117.
[11]
Luca Di Grazia and Michael Pradel. 2022. The evolution of type annotations in Python: An empirical study. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.
[12]
Michael Furr, Jong-hoon An, Jeffrey S. Foster, and Michael Hicks. 2009. Static type inference for Ruby. In Proceedings of the 2009 ACM Symposium on Applied Computing. 1859–1866.
[13]
Zheng Gao, Christian Bird, and Earl T. Barr. 2017. To type or not to type: Quantifying detectable bugs in JavaScript. In 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). IEEE, 758–769.
[15]
Stefan Hanenberg, Sebastian Kleinschmager, Romain Robbes, Éric Tanter, and Andreas Stefik. 2014. An empirical study on the impact of static typing on software maintainability. Empirical Software Engineering 19, 5 (2014), 1335–1382.
[16]
Mostafa Hassan, Caterina Urban, Marco Eilers, and Peter Müller. 2018. MaxSMT-based type inference for Python 3. In Computer Aided Verification: 30th International Conference, CAV 2018, Held as Part of the Federated Logic Conference, FloC 2018, Oxford, UK, July 14-17, 2018, Proceedings, Part II 30. Springer, 12–19.
[17]
Vincent J. Hellendoorn, Christian Bird, Earl T. Barr, and Miltiadis Allamanis. 2018. Deep learning type inference. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 152–162.
[18]
Simon Holm Jensen, Anders Møller, and Peter Thiemann. 2009. Type analysis for JavaScript. In SAS, Vol. 9. Springer, 238–255.
[19]
Kevin Jesse, Premkumar T. Devanbu, and Toufique Ahmed. 2021. Learning type annotation: Is big data enough?. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1483–1486.
[20]
Jetbrains. 2020. Python developer survey conducted by Jetbrains and Python software foundation. https://www.jetbrains.com/lp/python-developers-survey-2020/
[21]
Wuxia Jin, Dinghong Zhong, Zifan Ding, Ming Fan, and Ting Liu. 2021. Where to start: Studying type annotation practices in Python. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 529–541.
[22]
Faizan Khan, Boqi Chen, Daniel Varro, and Shane Mcintosh. 2021. An empirical study of type-related defects in Python projects. IEEE Transactions on Software Engineering 48, 8 (2021), 3145–3158.
[23]
Robert V. Krejcie and Daryle W. Morgan. 1970. Determining sample size for research activities. Educational and Psychological Measurement 30, 3 (1970), 607–610.
[24]
Triet H. M. Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.
[25]
Jukka Lehtosalo. 2019. PEP 589 – TypedDict: Type hints for dictionaries with a fixed set of keys. https://www.python.org/dev/peps/pep-0589/
[26]
[27]
Ivan Levkivskyi, Jukka Lehtosalo, and Łukasz Langa. 2017. PEP 544 – Protocols: Structural subtyping (static duck typing). https://www.python.org/dev/peps/pep-0544/
[28]
Magnus Madsen. 2015. Static analysis of dynamic languages. https://pure.au.dk/ws/files/85299449/Thesis.pdf (2015).
[29]
Eva Maia, Nelma Moreira, and Rogério Reis. 2012. A static type inference for Python. Proc. of DYLA 5, 1 (2012), 1.
[30]
Rabee Sohail Malik, Jibesh Patra, and Michael Pradel. 2019. NL2Type: Inferring JavaScript function types from natural language information. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 304–315.
[32]
Amir M. Mir, Evaldas Latoškinas, and Georgios Gousios. 2021. ManyTypes4Py: A benchmark Python dataset for machine learning-based type inference. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 585–589.
[33]
Amir M. Mir, Evaldas Latoškinas, Sebastian Proksch, and Georgios Gousios. 2022. Type4Py: Practical deep similarity learning-based type inference for Python. In Proceedings of the 44th International Conference on Software Engineering. 2241–2252.
[34]
Yusuke Miyazaki, Taro Sekiyama, and Atsushi Igarashi. 2019. Dynamic type inference for gradual Hindley–Milner typing. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.
[35]
GitHub Octoverse. 2022. The 2022 state of open source software. https://octoverse.github.com/
[36]
John-Paul Ore, Carrick Detweiler, and Sebastian Elbaum. 2021. An empirical study on type annotations: Accuracy, speed, and suggestion effectiveness. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1–29.
[37]
John-Paul Ore, Sebastian Elbaum, Carrick Detweiler, and Lambros Karkazis. 2018. Assessing the type annotation burden. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 190–201.
[38]
Francisco Ortin, Jose Baltasar Garcia Perez-Schofield, and Jose Manuel Redondo. 2015. Towards a static type checker for Python. In European Conference on Object-Oriented Programming (ECOOP), Scripts to Programs Workshop, STOP, Vol. 15. 1–2.
[39]
Irene Vlassi Pandi, Earl T. Barr, Andrew D. Gordon, and Charles Sutton. 2020. OptTyper: Probabilistic type inference by optimising logical and natural constraints. arXiv preprint arXiv:2004.00348 (2020).
[40]
Jibesh Patra and Michael Pradel. 2022. Nalin: Learning from runtime behavior to find name-value inconsistencies in Jupyter Notebooks. In Proceedings of the 44th International Conference on Software Engineering. 1469–1481.
[41]
Zvonimir Pavlinovic. 2019. Leveraging Program Analysis for Type Inference. Ph. D. Dissertation. New York University.
[42]
Yun Peng, Cuiyun Gao, Zongjie Li, Bowei Gao, David Lo, Qirun Zhang, and Michael Lyu. 2022. Static inference meets deep learning: A hybrid type inference approach for Python. In Proceedings of the 44th International Conference on Software Engineering. 2019–2030.
[43]
Michael Pradel, Georgios Gousios, Jason Liu, and Satish Chandra. 2020. TypeWriter: Neural type prediction with search-based validation. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 209–220.
[44]
[45]
Jochen Quante. 2008. Do Dynamic Object Process Graphs Support Program Understanding?-A Controlled Experiment. In 2008 16th IEEE International Conference on Program Comprehension. IEEE, 73–82.
[46]
Baishakhi Ray, Daryl Posnett, Vladimir Filkov, and Premkumar Devanbu. 2014. A large scale study of programming languages and code quality in GitHub. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 155–165.
[47]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from ”big code”. ACM SIGPLAN Notices 50, 1 (2015), 111–124.
[48]
Brianna M. Ren, John Toman, T Stephen Strickland, and Jeffrey S. Foster. 2013. The ruby type checker. In Proceedings of the 28th Annual ACM Symposium on Applied Computing. 1565–1572.
[49]
Michael Salib. 2004. Starkiller: A Static Type Inferencer and Compiler for Python. Ph. D. Dissertation. Massachusetts Institute of Technology.
[50]
Guido Salvaneschi and Mira Mezini. 2016. Debugging for reactive programming. In Proceedings of the 38th International Conference on Software Engineering. 796–807.
[51]
Sandro Schulze, Jörg Liebig, Janet Siegmund, and Sven Apel. 2013. Does the discipline of preprocessor annotations matter? A controlled experiment. In Proceedings of the 12th International Conference on Generative Programming: Concepts & Experiences. 65–74.
[52]
Hinrich Schütze, Christopher D. Manning, and Prabhakar Raghavan. 2008. Introduction to Information Retrieval. Vol. 39. Cambridge University Press Cambridge.
[53]
Jeremy Siek and Walid Taha. 2007. Gradual typing for objects. In ECOOP 2007–Object-Oriented Programming: 21st European Conference, Berlin, Germany, July 30-August 3, 2007. Proceedings 21. Springer, 2–27.
[54]
IEEE Spectrum. 2022. Top Programming Languages 2022. https://spectrum.ieee.org/top-programming-languages
[55]
Andreas Stuchlik and Stefan Hanenberg. 2011. Static vs. dynamic type systems: An empirical study about the relationship between type casts and development time. In Proceedings of the 7th Symposium on Dynamic Languages. 97–106.
[56]
Ke Sun, Yifan Zhao, Dan Hao, and Lu Zhang. 2022. Static type recommendation for Python. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 1–13.
[57]
Yida Tao, Jindae Kim, Sunghun Kim, and Chang Xu. 2014. Automatically generated patches as debugging aids: A human study. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 64–74.
[58]
Guido van Rossum, Jukka Lehtosalo, and Łukasz Langa. 2014. PEP 484 – Type Hints. https://www.python.org/dev/peps/pep-0484/
[59]
Guido van Rossum and Ivan Levkivskyi. 2014. PEP 483 – The Theory of Type Hints. https://www.python.org/dev/peps/pep-0483/
[60]
Guido van van Rossum. 2004. Adding Optional Static Typing to Python. https://www.artima.com/weblogs/viewpost.jsp?thread=85551
[61]
Michael M. Vitousek, Andrew M. Kent, Jeremy G. Siek, and Jim Baker. 2014. Design and evaluation of gradual typing for Python. In Proceedings of the 10th ACM Symposium on Dynamic Languages. 45–56.
[63]
Jiayi Wei, Maruth Goyal, Greg Durrett, and Isil Dillig. 2020. LambdaNet: Probabilistic type inference using graph neural networks. arXiv preprint arXiv:2005.02161 (2020).
[64]
Zhaogui Xu, Peng Liu, Xiangyu Zhang, and Baowen Xu. 2016. Python predictive analysis for bug detection. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 121–132.
[65]
Zhaogui Xu, Xiangyu Zhang, Lin Chen, Kexin Pei, and Baowen Xu. 2016. Python probabilistic type inference with natural language support. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 607–618.
[66]
Yanyan Yan, Yang Feng, Hongcheng Fan, and Baowen Xu. 2023. DLInfer: Deep learning with static slicing for Python type inference. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2009–2021.
[67]
Łukasz Langa. 2019. PEP 585 – Type hinting generics in standard collections. https://www.python.org/dev/peps/pep-0585/

Cited By

View all
  • (2024)QuAC: Quick Attribute-Centric Type Inference for PythonProceedings of the ACM on Programming Languages10.1145/36897838:OOPSLA2(2040-2069)Online publication date: 8-Oct-2024
  • (2024)On the Heterophily of Program Graphs: A Case Study of Graph-based Type InferenceProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671389(1-10)Online publication date: 24-Jul-2024
  • (2024)Static analysis driven enhancements for comprehension in machine learning notebooksEmpirical Software Engineering10.1007/s10664-024-10525-w29:5Online publication date: 12-Aug-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Software Engineering and Methodology
ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 5
June 2024
952 pages
EISSN:1557-7392
DOI:10.1145/3618079
  • Editor:
  • Mauro Pezzè
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 June 2024
Online AM: 11 March 2024
Accepted: 22 February 2024
Revised: 05 December 2023
Received: 13 February 2023
Published in TOSEM Volume 33, Issue 5

Check for updates

Author Tags

  1. Type annotations
  2. type inference
  3. Python
  4. empirical study

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China
  • Cooperation Fund of Nanjing University-Huawei Novel Software Technology Lab

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)542
  • Downloads (Last 6 weeks)47
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)QuAC: Quick Attribute-Centric Type Inference for PythonProceedings of the ACM on Programming Languages10.1145/36897838:OOPSLA2(2040-2069)Online publication date: 8-Oct-2024
  • (2024)On the Heterophily of Program Graphs: A Case Study of Graph-based Type InferenceProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671389(1-10)Online publication date: 24-Jul-2024
  • (2024)Static analysis driven enhancements for comprehension in machine learning notebooksEmpirical Software Engineering10.1007/s10664-024-10525-w29:5Online publication date: 12-Aug-2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media