skip to main content
10.1145/3551349.3556912acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaseConference Proceedingsconference-collections
research-article

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code

Published:05 January 2023Publication History

ABSTRACT

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this paper, we formulate type inference as a cloze-style fill-in-blank language task. Built on source code naturalness, our approach fine-tunes a code masked language model (MLM) as a neural knowledge base of code elements with a novel “pre-train, prompt and predict” paradigm from raw source code. Our approach is lightweight and has minimum requirements on code compilation. Unlike existing symbolic name and context matching for type inference, our prompt-tuned code MLM packs FQN syntax and usage in its parameters and supports fuzzy neural type inference. We systematically evaluate our approach on a large amount of source code from GitHub and Stack Overflow. Our results confirm the effectiveness of our approach design and the practicality for partial code type inference. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

References

  1. Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. arXiv preprint arXiv:2005.00653(2020).Google ScholarGoogle Scholar
  2. Raja Naeem Akram and Konstantinos Markantonakis. 2016. Challenges of security and trust of mobile devices as digital avionics component. In 2016 Integrated Communications Navigation and Surveillance (ICNS). 1C4–1–1C4–11. https://doi.org/10.1109/ICNSURV.2016.7486323Google ScholarGoogle Scholar
  3. Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. ACM Computing Surveys (CSUR) 51 (2018), 1 – 37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Miltiadis Allamanis, Daniel Tarlow, Andrew D. Gordon, and Yi Wei. 2015. Bimodal Modelling of Source Code and Natural Language. In ICML.Google ScholarGoogle Scholar
  5. Anonymous. 2021. A New Search Paradigm for Natural Language Code Search. (2021).Google ScholarGoogle Scholar
  6. Anonymous. 2022. Analyzing CodeBERT’s Performance on Natural Language Code Search. (2022).Google ScholarGoogle Scholar
  7. Sebastian Baltes and Stephan Diehl. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259–1295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Kurt D. Bollacker, Colin Evans, Praveen K. Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD Conference.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Joel Brandt, Philip J Guo, Joel Lewenstein, Mira Dontcheva, and Scott R Klemmer. 2009. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1589–1598.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, T. J. Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeff Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. ArXiv abs/2005.14165(2020).Google ScholarGoogle Scholar
  11. Luca Buratti, Saurabh Pujar, Mihaela A. Bornea, Scott McCarley, Yunhui Zheng, Gaetano Rossiello, Alessandro Morari, Jim Laredo, Veronika Thost, Yufan Zhuang, and Giacomo Domeniconi. 2020. Exploring Software Naturalness through Neural Language Models. ArXiv abs/2006.12641(2020).Google ScholarGoogle Scholar
  12. Barthélémy Dagenais and Laurie Hendren. 2008. Enabling static analysis for partial java programs. In Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications. 313–328.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Premkumar T. Devanbu. 2012. On the naturalness of software. 2012 34th International Conference on Software Engineering (ICSE) (2012), 837–847.Google ScholarGoogle Scholar
  14. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google ScholarGoogle Scholar
  15. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv abs/1810.04805(2019).Google ScholarGoogle Scholar
  16. Ning Ding, Yulin Chen, Xu Han, Guangwei Xu, Pengjun Xie, Haitao Zheng, Zhiyuan Liu, Juan-Zi Li, and Hong-Gee Kim. 2021. Prompt-Learning for Fine-Grained Entity Typing. ArXiv abs/2108.10604(2021).Google ScholarGoogle Scholar
  17. Yiwen Dong, Tianxiao Gu, Yongqiang Tian, and Chengnian Sun. 2022. SnR: Constraint Based Type Inference for Incomplete Java Code Snippets. International Conference on Software Engineering (ICSE) (2022).Google ScholarGoogle Scholar
  18. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. ArXiv abs/2002.08155(2020).Google ScholarGoogle Scholar
  19. Rosalva Gallardo-Valencia and Susan Sim. 2009. Internet-Scale Code Search. Proceedings - International Conference on Software Engineering, 49–52. https://doi.org/10.1109/SUITE.2009.5070022Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, 2020. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027(2020).Google ScholarGoogle Scholar
  21. Tianyu Gao, Adam Fisch, and Danqi Chen. 2021. Making Pre-trained Language Models Better Few-shot Learners. ArXiv abs/2012.15723(2021).Google ScholarGoogle Scholar
  22. Jian Gu, Pasquale Salza, and Harald C. Gall. 2022. Assemble Foundation Models for Automatic Code Summarization.Google ScholarGoogle Scholar
  23. Yuxian Gu, Xu Han, Zhiyuan Liu, and Minlie Huang. 2021. PPT: Pre-trained Prompt Tuning for Few-shot Learning. ArXiv abs/2109.04332(2021).Google ScholarGoogle Scholar
  24. Piyush Kumar Gupta, Nikita Mehrotra, and Rahul Purandare. 2020. JCoffee: Using Compiler Feedback to Make Partial Code Snippets Compilable. 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2020), 810–813.Google ScholarGoogle ScholarCross RefCross Ref
  25. Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming-Wei Chang. 2020. REALM: Retrieval-Augmented Language Model Pre-Training. ArXiv abs/2002.08909(2020).Google ScholarGoogle Scholar
  26. Sonia Haiduc, Jairo Aponte, Laura Moreno, and Andrian Marcus. 2010. On the Use of Automated Text Summarization Techniques for Summarizing Source Code. 2010 17th Working Conference on Reverse Engineering (2010), 35–44.Google ScholarGoogle Scholar
  27. Xu Han, Weilin Zhao, Ning Ding, Zhiyuan Liu, and Maosong Sun. 2021. PTR: Prompt Tuning with Rules for Text Classification. ArXiv abs/2105.11259(2021).Google ScholarGoogle Scholar
  28. Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll’ar, and Ross B. Girshick. 2021. Masked Autoencoders Are Scalable Vision Learners. ArXiv abs/2111.06377(2021).Google ScholarGoogle Scholar
  29. Benjamin Heinzerling and Kentaro Inui. 2020. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. arXiv preprint arXiv:2008.09036(2020).Google ScholarGoogle Scholar
  30. Benjamin Heinzerling and Kentaro Inui. 2021. Language Models as Knowledge Bases: On Entity Representations, Storage Capacity, and Paraphrased Queries. ArXiv abs/2008.09036(2021).Google ScholarGoogle Scholar
  31. Vincent J. Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2020. Global Relational Models of Source Code. In ICLR.Google ScholarGoogle Scholar
  32. Qing Huang, An Qiu, Maosheng Zhong, and Yuan Wang. 2020. A Code-Description Representation Learning Model Based on Attention. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). 447–455. https://doi.org/10.1109/SANER48275.2020.9054830Google ScholarGoogle ScholarCross RefCross Ref
  33. Qing Huang and Guoqing Wu. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26, 4 (2019), 705–732.Google ScholarGoogle ScholarCross RefCross Ref
  34. Qing Huang and Huaiguang Wu. 2019. QE-integrating framework based on Github knowledge and SVM ranking. Science China Information Sciences 62, 5 (2019), 1–16.Google ScholarGoogle ScholarCross RefCross Ref
  35. Hamel Husain, Hongqi Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. CodeSearchNet Challenge: Evaluating the State of Semantic Code Search. ArXiv abs/1909.09436(2019).Google ScholarGoogle Scholar
  36. Zhengbao Jiang, Frank F. Xu, J. Araki, and Graham Neubig. 2020. How Can We Know What Language Models Know?Transactions of the Association for Computational Linguistics 8 (2020), 423–438.Google ScholarGoogle Scholar
  37. Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, and Kensen Shi. 2020. Learning and evaluating contextual embedding of source code. In International Conference on Machine Learning. PMLR, 5110–5121.Google ScholarGoogle Scholar
  38. Anjan Karmakar and Romain Robbes. 2021. What do pre-trained code models know about code?2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2021), 1332–1336.Google ScholarGoogle Scholar
  39. Kisub Kim, Dongsun Kim, Tegawendé F. Bissyandé, Eunjong Choi, Li Li, Jacques Klein, and Yves Le Traon. 2018. FaCoY – A Code-to-Code Search Engine. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE). 946–957. https://doi.org/10.1145/3180155.3180187Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Brian Lester, Rami Al-Rfou, and Noah Constant. 2021. The Power of Scale for Parameter-Efficient Prompt Tuning. ArXiv abs/2104.08691(2021).Google ScholarGoogle Scholar
  41. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In ACL.Google ScholarGoogle Scholar
  42. Hongwei Li, Sirui Li, Jiamou Sun, Zhenchang Xing, Xin Peng, Mingwei Liu, and Xuejiao Zhao. 2018. Improving API Caveats Accessibility by Mining API Caveats Knowledge Graph. 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2018), 183–193.Google ScholarGoogle ScholarCross RefCross Ref
  43. Xiang Lisa Li and Percy Liang. 2021. Prefix-Tuning: Optimizing Continuous Prompts for Generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) abs/2101.00190 (2021).Google ScholarGoogle ScholarCross RefCross Ref
  44. Noah Liebman, Michael Nagara, Jacek Spiewla, and Erin Zolkosky. 2010. Cuebert: A New Mixing Board Concept for Musical Theatre. In NIME.Google ScholarGoogle Scholar
  45. Chin-Yew Lin and Franz Josef Och. 2004. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation. In COLING.Google ScholarGoogle Scholar
  46. Erik Linstead, Sushil Bajracharya, Trung Ngo, Paul Rigor, Cristina Lopes, and Pierre Baldi. 2009. Sourcerer: mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300–336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Mingwei Liu, Xin Peng, Andrian Marcus, Zhenchang Xing, Wenkai Xie, Shuangshuang Xing, and Yang Liu. 2019. Generating query-specific class API summaries. Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019).Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv preprint arXiv:2107.13586(2021).Google ScholarGoogle Scholar
  49. Xiao Liu, Kaixuan Ji, Yicheng Fu, Zhengxiao Du, Zhilin Yang, and Jie Tang. 2021. P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks. ArXiv abs/2110.07602(2021).Google ScholarGoogle Scholar
  50. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692(2019).Google ScholarGoogle Scholar
  51. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Lidong Zhou, Linjun Shou, Long Zhou, Michele Tufano, Ming Gong, Ming Zhou, Nan Duan, Neel Sundaresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. ArXiv abs/2102.04664(2021).Google ScholarGoogle Scholar
  52. Subhadip Maji, Swapna Sourav Rout, and Sudeep Choudhary. 2021. DCoM: A Deep Column Mapper for Semantic Data Type Detection. ArXiv abs/2106.12871(2021).Google ScholarGoogle Scholar
  53. Leandro T. C. Melo, Rodrigo G. Ribeiro, Breno C. F. Guimarães, and Fernando Magno Quintão Pereira. 2020. Type Inference for C: Applications to the Static Analysis of Incomplete Programs. ACM Trans. Program. Lang. Syst.(2020).Google ScholarGoogle Scholar
  54. Patrick Morrison, Kim Herzig, Brendan Murphy, and Laurie Williams. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 1–9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien Nhut Nguyen. 2013. Lexical statistical machine translation for language migration. In ESEC/FSE 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Renaud Pawlak, Monperrus Martin, Nicolas Petitprez, Carlos Noguera, and Lionel Seinturier. 2016. SPOON: A library for implementing analyses and transformations of Java source code. Software: Practice and Experience 46 (2016), 1155 – 1179.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Hammond A. Pearce, Baleegh Ahmad, Benjamin Tan, Brendan Dolan-Gavitt, and Ramesh Karri. 2021. An Empirical Cybersecurity Evaluation of GitHub Copilot’s Code Contributions. ArXiv abs/2108.09293(2021).Google ScholarGoogle Scholar
  58. Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019. Language models as knowledge bases?arXiv preprint arXiv:1909.01066(2019).Google ScholarGoogle Scholar
  59. Hung Dang Phan, Hoan Anh Nguyen, Ngoc M. Tran, Linh-Huyen Truong, Anh Tuan Nguyen, and Tien Nhut Nguyen. 2018. Statistical Learning of API Fully Qualified Names in Code Snippets of Online Forums. 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) (2018), 632–642.Google ScholarGoogle Scholar
  60. Luca Piccolboni, Giuseppe Di Guglielmo, Luca P. Carloni, and Simha Sethumadhavan. 2021. CRYLOGGER: Detecting Crypto Misuses Dynamically. 2021 IEEE Symposium on Security and Privacy (SP) (2021), 1972–1989.Google ScholarGoogle Scholar
  61. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  62. Colin Raffel, Noam M. Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2020. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. ArXiv abs/1910.10683(2020).Google ScholarGoogle Scholar
  63. Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 779–788.Google ScholarGoogle ScholarCross RefCross Ref
  64. Xiaoxue Ren, Xinyuan Ye, Zhenchang Xing, Xin Xia, Xiwei Xu, Liming Zhu, and Jianling Sun. 2020. API-Misuse Detection Driven by Fine-Grained API-Constraint Knowledge Graph. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2020), 461–472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Adam Roberts, Colin Raffel, and Noam M. Shazeer. 2020. How Much Knowledge Can You Pack into the Parameters of a Language Model?ArXiv abs/2002.08910(2020).Google ScholarGoogle Scholar
  66. C. M. Khaled Saifullah, Muhammad Asaduzzaman, and Chanchal Kumar Roy. 2019. Learning from Examples to Find Fully Qualified Names of API Elements in Code Snippets. 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) (2019), 243–254.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Timo Schick and Hinrich Schütze. 2021. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In EACL.Google ScholarGoogle Scholar
  68. Timo Schick and Hinrich Schütze. 2021. It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners. ArXiv abs/2009.07118(2021).Google ScholarGoogle Scholar
  69. Taylor Shin, Yasaman Razeghi, Robert L Logan IV, Eric Wallace, and Sameer Singh. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980(2020).Google ScholarGoogle Scholar
  70. Siddharth Subramanian, Laura Inozemtseva, and Reid Holmes. 2014. Live API Documentation. International Conference on Software Engineering (ICSE) (2014).Google ScholarGoogle Scholar
  71. Jiamou Sun, Zhenchang Xing, Rui Chu, Heilai Bai, Jinshui Wang, and Xin Peng. 2019. Know-How in Programming Tasks: From Textual Tutorials to Task-Oriented Knowledge Graph. 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2019), 257–268.Google ScholarGoogle ScholarCross RefCross Ref
  72. Yi Sun, Yu Zheng, Chao Hao, and Hangping Qiu. 2021. NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task-Next Sentence Prediction. ArXiv abs/2109.03564(2021).Google ScholarGoogle Scholar
  73. Tianyi Tang, Junyi Li, and Wayne Xin Zhao. 2022. Context-Tuning: Learning Contextualized Prompts for Natural Language Generation. ArXiv abs/2201.08670(2022).Google ScholarGoogle Scholar
  74. Suresh Thummalapenta and Tao Xie. 2007. Parseweb: a programmer assistant for reusing open source code on the web. In Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering. 204–213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Code. ArXiv abs/2202.08975(2022).Google ScholarGoogle Scholar
  76. Sergey Troshin and Nadezhda Chirkova. 2022. Probing Pretrained Models of Source Code. arXiv preprint arXiv:2202.08975(2022).Google ScholarGoogle Scholar
  77. Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology (TOSEM) 28, 4(2019), 1–29.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Medha Umarji, Susan Elliott Sim, and Crista Lopes. 2008. Archetypal internet-scale source code searching. In IFIP International Conference on Open Source Systems. Springer, 257–263.Google ScholarGoogle ScholarCross RefCross Ref
  79. Ashish Vaswani, Noam M. Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In NIPS.Google ScholarGoogle Scholar
  80. Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hairong Jin. 2022. What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code. ArXiv abs/2202.06840(2022).Google ScholarGoogle Scholar
  81. Yao Wan, Wei Zhao, Hongyu Zhang, Yulei Sui, Guandong Xu, and Hai Jin. 2022. What Do They Capture?–A Structural Analysis of Pre-Trained Language Models for Source Code. arXiv preprint arXiv:2202.06840(2022).Google ScholarGoogle Scholar
  82. Deze Wang, Zhouyang Jia, Shanshan Li, Yue Yu, Yun Xiong, Wei Dong, and Xiangke Liao. 2021. Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding. ArXiv abs/2112.02268(2021).Google ScholarGoogle Scholar
  83. Wenhan Wang, Ge Li, Bo Ma, Xin Xia, and Zhi Jin. 2020. Detecting Code Clones with Graph Neural Network and Flow-Augmented Abstract Syntax Tree. In 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 261–271.Google ScholarGoogle Scholar
  84. Yanlin Wang and Hui Li. 2021. Code completion by modeling flattened abstract syntax trees as graphs. Proceedings of AAAIConference on Artificial Intellegence (2021).Google ScholarGoogle ScholarCross RefCross Ref
  85. Yue Wang, Weishi Wang, Shafiq Joty, and Steven CH Hoi. 2021. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859(2021).Google ScholarGoogle Scholar
  86. Yonghui Wu, Mike Schuster, Z. Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason R. Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Gregory S. Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. ArXiv abs/1609.08144(2016).Google ScholarGoogle Scholar
  87. Yuhao Wu, Shaowei Wang, Cor-Paul Bezemer, and Katsuro Inoue. 2019. How do developers utilize source code from stack overflow?Empirical Software Engineering 24, 2 (2019), 637–673.Google ScholarGoogle Scholar
  88. Tianyi Zhang, Ganesha Upadhyaya, Anastasia Reinhardt, Hridesh Rajan, and Miryung Kim. 2018. Are Code Examples on an Online Q&A Forum Reliable?: A Study of API Misuse on Stack Overflow. 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) (2018), 886–896.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Tianyi Zhang, Di Yang, Crista Lopes, and Miryung Kim. 2019. Analyzing and supporting adaptation of online code examples. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 316–327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Wenxuan Zhou, Junyi Du, and Xiang Ren. 2019. Improving BERT fine-tuning with embedding normalization. arXiv preprint arXiv:1911.03918(2019).Google ScholarGoogle Scholar
  91. Yaqin Zhou, Shangqing Liu, J. Siow, Xiaoning Du, and Yang Liu. 2019. Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks. ArXiv abs/1909.03496(2019).Google ScholarGoogle Scholar

Index Terms

  1. Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering
      October 2022
      2006 pages
      ISBN:9781450394758
      DOI:10.1145/3551349

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 January 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate82of337submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format