research-article

FQN Inference in Partial Code by Prompt-tuned Language Model of Code

Authors:
Qing Huang

Jiangxi Normal University, China

Jiangxi Normal University, China

0000-0002-8877-4267
Search about this author

,
Zhiqiang Yuan

Jiangxi Normal University, China

Jiangxi Normal University, China

0000-0002-6497-9380
Search about this author

,
Zhenchang Xing

CSIRO’s Data61 & Australian National University, Australia

CSIRO’s Data61 & Australian National University, Australia

0000-0001-7663-1421
Search about this author

,
Xin Peng

Fudan University, China

Fudan University, China

0000-0003-3376-2581
Search about this author

,
Xiwei Xu

CSIRO’s Data61, Australia

CSIRO’s Data61, Australia

0000-0002-2273-1862
Search about this author

,
Qinghua Lu

CSIRO’s Data61, Australia

CSIRO’s Data61, Australia

0000-0002-9466-1672
Search about this author

ACM Transactions on Software Engineering and Methodology Volume 33 Issue 2Article No.: 31pp 1–32https://doi.org/10.1145/3617174

Published:21 December 2023Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this article, we propose using a prompt-tuned code masked language model (MLM) as a neural knowledge base for type inference, called POME, which is lightweight and has minimal requirements on code compilation. Unlike the existing symbol name and context matching for type inference, POME infers the FQNs syntax and usage knowledge encapsulated in prompt-tuned code MLM through a colze-style fill-in-blank strategy. POME is integrated as a plug-in into web and integrated development environments (IDE) to assist developers in inferring FQNs in the real world. We systematically evaluate POME on a large amount of source code from GitHub and Stack Overflow, and explore its generalization and hybrid capability. The results validate the effectiveness of the POME design and its applicability for partial code type inference, and they can be easily extended to different programming languages (PL). POME can also be used to generate a PL-hybrid type inference model for providing a one-for-all solution. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

REFERENCES

[1] Saifullah C. M. Khaled, Asaduzzaman Muhammad, and Roy Chanchal Kumar. 2019. Learning from examples to find fully qualified names of API elements in code snippets. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering.243–254.Google Scholar
[2] Gupta Piyush Kumar, Mehrotra Nikita, and Purandare Rahul. 2020. JCoffee: Using compiler feedback to make partial code snippets compilable. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. 810–813.Google Scholar
[3] Thummalapenta Suresh and Xie Tao. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 204–213.Google ScholarDigital Library
[4] Maji Subhadip, Rout Swapna Sourav, and Choudhary Sudeep. 2021. Dcom: A deep column mapper for semantic data type detection. CoRR, abs/2106.12871, 2021.Google Scholar
[5] Zhang Tianyi, Upadhyaya Ganesha, Reinhardt Anastasia, Rajan Hridesh, and Kim Miryung. 2018. Are code examples on an online Q&A forum reliable?: A study of API misuse on stack overflow. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.886–896.Google ScholarDigital Library
[6] Piccolboni Luca, Guglielmo Giuseppe Di, Carloni Luca P., and Sethumadhavan Simha. 2021. CRYLOGGER: detecting crypto misuses dynamically. In 42nd IEEE Symposium on Security and Privacy, (SP’21), San Francisco, CA, 1972–1989.Google Scholar
[7] Zhou Yaqin, Liu Shangqing, Siow Jing Kai, Du Xiaoning, and Liu Yang. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, (NeurIPS’19), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, (Eds.). Vancouver, BC, 10197–10207.Google Scholar
[8] Ren Xiaoxue, Ye Xinyuan, Xing Zhenchang, Xia Xin, Xu Xiwei, Zhu Liming, and Sun Jianling. 2020. API-misuse detection driven by fine-grained API-constraint knowledge graph. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering.461–472.Google ScholarDigital Library
[9] Melo Leandro T. C., Ribeiro Rodrigo G., Guimarães Breno C. F., and Pereira Fernando Magno Quintão. 2020. Type inference for C: Applications to the static analysis of incomplete programs. ACM Transactions on Programming Languages and Systems 42, 3 (2020), 15:1–15:71.Google ScholarDigital Library
[10] Subramanian Siddharth, Inozemtseva Laura, and Holmes Reid. 2014. Live API documentation. In 36th International Conference on Software Engineering, ICSE’14, Hyderabad, India - May 31 - June 07), Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.). ACM, 643–652.Google Scholar
[11] Dong Yiwen, Gu Tianxiao, Tian Yongqiang, and Sun Chengnian. 2022. SnR: Constraint-based type inference for incomplete Java code snippets. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27). ACM, 1982–1993. ACM, 1982–1993.Google Scholar
[12] Feng Zhangyin, Guo Daya, Tang Duyu, Duan Nan, Feng Xiaocheng, Gong Ming, Shou Linjun, Qin Bing, Liu Ting, Jiang Daxin, and Zhou Ming. 2020. Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: (EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547.Google Scholar
[13] Kanade Aditya, Maniatis Petros, Balakrishnan Gogul, and Shi Kensen. 2020. Learning and evaluating contextual embedding of source code. In Proceedings of the International Conference on Machine Learning. PMLR, 5110–5121.Google Scholar
[14] Devanbu Premkumar T.. 2012. On the naturalness of software. In Proceedings of the 2012 34th International Conference on Software Engineering.837–847.Google Scholar
[15] Allamanis Miltiadis, Barr Earl T., Devanbu Premkumar T., and Sutton Charles. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51, 4 (2018), 81:1–81:37.Google Scholar
[16] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.Google Scholar
[17] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, Agarwal Sandhini, Herbert-Voss Ariel, Krueger Gretchen, Henighan Tom, Child Rewon, Ramesh Aditya, Ziegler Daniel M., Wu Jeff, Winter Clemens, Hesse Christopher, Chen Mark, Sigler Eric, Litwin Mateusz, Gray Scott, Chess Benjamin, Clark Jack, Berner Christopher, McCandlish Sam, Radford Alec, Sutskever Ilya, and Amodei Dario. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS 2020, December 6-12, 2020, virtual, 2020), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).Google Scholar
[18] Raffel Colin, Shazeer Noam, Roberts Adam, Lee Katherine, Narang Sharan, Matena Michael, Zhou Yanqi, Li Wei, and Liu Peter J.. 2020. Exploring the limits of transfer learning with a unified text to-text transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67.Google Scholar
[19] Liebman Noah, Nagara Michael, Spiewla Jacek, and Zolkosky Erin. 2010. Cuebert: A new mixing board concept for musical theatre. In Proceedings of the NIME.Google Scholar
[20] Allamanis Miltiadis, Tarlow Daniel, Gordon Andrew D., and Wei Yi. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning.Google Scholar
[21] Nguyen Anh Tuan, Nguyen Tung Thanh, and Nguyen Tien Nhut. 2013. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering.Google ScholarDigital Library
[22] Haiduc Sonia, Aponte Jairo, Moreno Laura, and Marcus Andrian. 2010. On the use of automated text summarization techniques for summarizing source code. In 17th Working Conference on Reverse Engineering (WCRE’10, 13-16 October 2010, Beverly, MA), Giuliano Antoniol, Martin Pinzger, and Elliot J. Chikofsky, (Eds.). IEEE Computer Society, 35–44.Google Scholar
[23] Hellendoorn Vincent J., Sutton Charles, Singh Rishabh, Maniatis Petros, and Bieber David. 2020. Global relational models of source code. In Proceedings of the International Conference on Learning Representations.Google Scholar
[24] Petroni Fabio, Rocktäschel Tim, Riedel Sebastian, Lewis Patrick S. H., Bakhtin Anton, Wu Yuxiang, and Miller Alexander H.. 2019. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, (EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 2463–2473.Google Scholar
[25] Bollacker Kurt D., Evans Colin, Paritosh Praveen K., Sturge Tim, and Taylor Jamie. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.Google ScholarDigital Library
[26] Redmon Joseph, Divvala Santosh Kumar, Girshick Ross B., and Farhadi Ali. 2016. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16, Las Vegas, NV, USA, June 27-30), IEEE Computer Society, 779–788.Google Scholar
[27] Anonymous. 2022. Analyzing CodeBERT’s performance on natural language code search. (2022).Google Scholar
[28] Sun Yi, Zheng Yu, Hao Chao, and Qiu Hangping. 2021. NSP-BERT: A prompt-based zero-shot learner through an original pre-training task-next sentence prediction. CoRR, abs/2109.03564.Google Scholar
[29] Han Xu, Zhao Weilin, Ding Ning, Liu Zhiyuan, and Sun Maosong. 2022. PTR: prompt tuning with rules for text classification. AI Open, 3 (2022), 182–192.Google Scholar
[30] Gu Yuxian, Han Xu, Liu Zhiyuan, and Huang Minlie. 2022. PPT: pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (ACL 2022, Dublin, Ireland, May 22-27, 2022), Association for Computational Linguistics, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, (Eds.). 8410–8423.Google Scholar
[31] Ding Ning, Chen Yulin, Han Xu, Xu Guangwei, Xie Pengjun, Zheng Haitao, Liu Zhiyuan, Li Juanzi, and Kim Hong-Gee. 2022. Prompt-learning for fine-grained entity typing. In Findings of the Association for Computational Linguistics: (EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022), Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 6888–6901.Google Scholar
[32] Liu Xiao, Ji Kaixuan, Fu Yicheng, Du Zhengxiao, Yang Zhilin, and Tang Jie. 2021. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR, abs/2110.07602.Google Scholar
[33] Phan Hung Dang, Nguyen Hoan Anh, Tran Ngoc M., Truong Linh-Huyen, Nguyen Anh Tuan, and Nguyen Tien Nhut. 2018. Statistical learning of API fully qualified names in code snippets of online forums. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.632–642.Google ScholarDigital Library
[34] ChatGPT. https://openai.com/blog/chatgpt. Access date: May 13, 2023.Google Scholar
[35] Schick Timo and Schütze Hinrich. 2021. It’s not just size that matters: Small language models are also few-shot learners. arXiv:2009.07118. Retrieved from https://arxiv.org/abs/2009.07118Google Scholar
[36] Schick Timo and Schütze Hinrich. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the EACL.Google ScholarCross Ref
[37] Huang Qing, Yuan Zhiqiang, Xing Zhenchang, Xu Xiwei, Zhu Liming, and Lu Qinghua. 2023. Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering.Association for Computing Machinery, New York, NY, 13 pages. DOI:Google ScholarDigital Library
[38] Dagenais Barthélémy and Hendren Laurie. 2008. Enabling static analysis for partial java programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications. 313–328.Google ScholarDigital Library
[39] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems Annual Conference on Neural Information Processing Systems (2017, December 4-9, 2017), Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). Long Beach, CA, 5998–6008.Google Scholar
[40] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019), Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.Google Scholar
[41] Husain Hamel, Wu Ho-Hsiang, Gazit Tiferet, Allamanis Miltiadis, and Brockschmidt Marc. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. CoRR, abs/1909.09436.Google Scholar
[42] Karmakar Anjan and Robbes Romain. 2021. What do pre-trained code models know about code? In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering.1332–1336.Google Scholar
[43] Troshin Sergey and Chirkova Nadezhda. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.), Association for Computational Linguistics, 371–383.Google Scholar
[44] Wan Yao, Zhao Wei, Zhang Hongyu, Sui Yulei, Xu Guandong, and Jin Hairong. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). Pittsburgh, PA, 2377–2388.Google Scholar
[45] Lu Shuai, Guo Daya, Ren Shuo, Huang Junjie, Svyatkovskiy Alexey, Blanco Ambrosio, Clement Colin B., Drain Dawn, Jiang Daxin, Tang Duyu, Li Ge, Zhou Lidong, Shou Linjun, Zhou Long, Tufano Michele, Gong Ming, Zhou Ming, Duan Nan, Sundaresan Neel, Deng Shao Kun, Fu Shengyu, and Liu Shujie. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021, Joaquin Vanschoren and Sai-Kit Yeung (Eds.).Google Scholar
[46] Wang Wenhan, Li Ge, Ma Bo, Xia Xin, and Jin Zhi. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 261–271.Google ScholarCross Ref
[47] Tufano Michele, Watson Cody, Bavota Gabriele, Penta Massimiliano Di, White Martin, and Poshyvanyk Denys. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology 28, 4 (2019), 1–29.Google ScholarDigital Library
[48] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, Cao Yuan, Gao Qin, Macherey Klaus, Klingner Jeff, Shah Apurva, Johnson Melvin, Liu Xiaobing, Kaiser Lukasz, Gouws Stephan, Kato Yoshikiyo, Kudo Taku, Kazawa Hideto, Stevens Keith, Kurian George, Patil Nishant, Wang Wei, Young Cliff, Smith Jason, Riesa Jason, Rudnick Alex, Vinyals Oriol, Corrado Greg, Hughes Macduff, and Dean Jeffrey. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144.Google Scholar
[49] Gu Jian, Salza Pasquale, and Gall Harald C.. 2022. Assemble foundation models for automatic code summarization.Google Scholar
[50] Wang Deze, Jia Zhouyang, Li Shanshan, Yu Yue, Xiong Yun, Dong Wei, and Liao Xiangke. 2021. Bridging pre-trained models and downstream tasks for source code understanding. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022), ACM, 287–298.Google Scholar
[51] Guu Kelvin, Lee Kenton, Tung Zora, Pasupat Panupong, and Chang Ming-Wei. 2020. REALM: retrieval-augmented language model pre-training. CoRR, abs/2002.08909.Google Scholar
[52] Akram Raja Naeem and Markantonakis Konstantinos. 2016. Challenges of security and trust of mobile devices as digital avionics component. In Proceedings of the 2016 Integrated Communications Navigation and Surveillance. 1C4–1–1C4–11. DOI:Google ScholarCross Ref
[53] IDEA IntelliJ. https://www.jetbrains.com/idea/. Access date: December, 2022.Google Scholar
[54] Lin Chin-Yew and Och Franz Josef. 2004. ORANGE: A method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the COLING.Google ScholarDigital Library
[55] Liu Pengfei, Yuan Weizhe, Fu Jinlan, Jiang Zhengbao, Hayashi Hiroaki, and Neubig Graham. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv:2107.13586. Retrieved from https://arxiv.org/abs/2107.13586Google Scholar
[56] WELCH B. L.. 1947. The Generalization of ‘Student’s’ Problem when several different population varlances are Involved. Biometrik, 34, 1-2 (1947), 28–35.Google Scholar
[57] Pearce Hammond, Ahmad Baleegh, Tan Benjamin, Dolan-Gavitt Brendan, and Karri Ramesh. 2021. An empirical cybersecurity evaluation of github copilot’s code contributions. CoRR, abs/2108.09293Google Scholar
[58] Gao Leo, Biderman Stella, Black Sid, Golding Laurence, Hoppe Travis, Foster Charles, Phang Jason, He Horace, Thite Anish, Nabeshima Noa, Presser Shawn, and Leahy Connor. 2020. The Pile: An 800GB dataset of diverse text for language modeling. arXiv e-prints, arXiv:2101.00027.Google Scholar
[59] He Kaiming, Chen Xinlei, Xie Saining, Li Yanghao, Dollár Piotr, and Girshick Ross B.. 2022. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022, New Orleans, LA, USA, June 18-24, 2022), IEEE, 15979–15988.Google Scholar
[60] Wang Yanlin and Li Hui. 2021. Code completion by modeling flattened abstract syntax trees as graphs. In Proceedings of the AAAI Conference on Artificial Intellegence (2021).Google Scholar
[61] Peng Yun, Gao Cuiyun, Li Zongjie, Gao Bowei, Lo David, Zhang Qirun, and Lyu Michael. 2022. Static inference meets deep learning. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:Google ScholarDigital Library
[62] Mir Amir M., Latoškinas Evaldas, Proksch Sebastian, and Gousios Georgios. 2022. Type4Py. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:Google ScholarDigital Library
[63] Zhang Tianyi, Yang Di, Lopes Crista, and Kim Miryung. 2019. Analyzing and supporting adaptation of online code examples. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. IEEE, 316–327.Google ScholarDigital Library
[64] Umarji Medha, Sim Susan Elliott, and Lopes Crista. 2008. Archetypal internet-scale source code searching. In Proceedings of the IFIP International Conference on Open Source Systems. Springer, 257–263.Google ScholarCross Ref
[65] Gallardo-Valencia Rosalva E. and Sim Susan Elliott. 2009. Internet-scale code search. In 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation. 49–52.Google Scholar
[66] Brandt Joel, Guo Philip J., Lewenstein Joel, Dontcheva Mira, and Klemmer Scott R.. 2009. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1589–1598.Google ScholarDigital Library
[67] Baltes Sebastian and Diehl Stephan. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 1259–1295.Google ScholarDigital Library
[68] Wu Yuhao, Wang Shaowei, Bezemer Cor-Paul, and Inoue Katsuro. 2019. How do developers utilize source code from stack overflow? Empirical Software Engineering 24, 2 (2019), 637–673.Google ScholarDigital Library
[69] Li Hongwei, Li Sirui, Sun Jiamou, Xing Zhenchang, Peng Xin, Liu Mingwei, and Zhao Xuejiao. 2018. Improving API caveats accessibility by mining API caveats knowledge graph. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution.183–193.Google Scholar
[70] Sun Jiamou, Xing Zhenchang, Chu Rui, Bai Heilai, Wang Jinshui, and Peng Xin. 2019. Know-how in programming tasks: From textual tutorials to task-oriented knowledge graph. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution.257–268.Google Scholar
[71] Liu Mingwei, Peng Xin, Marcus Andrian, Xing Zhenchang, Xie Wenkai, Xing Shuangshuang, and Liu Yang. 2019. Generating query-specific class API summaries. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019).Google Scholar
[72] Linstead Erik, Bajracharya Sushil, Ngo Trung, Rigor Paul, Lopes Cristina, and Baldi Pierre. 2009. Sourcerer: Mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300–336.Google ScholarDigital Library
[73] Kim Kisub, Kim Dongsun, Bissyandé Tegawendé F., Choi Eunjong, Li Li, Klein Jacques, and Traon Yves Le. 2018. FaCoY – A code-to-code search engine. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering. 946–957. DOI:Google ScholarDigital Library
[74] Huang Qing, Qiu An, Zhong Maosheng, and Wang Yuan. 2020. A code-description representation learning model based on attention. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. 447–455. DOI:Google ScholarCross Ref
[75] Huang Qing and Wu Guoqing. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26, 4 (2019), 705–732.Google ScholarCross Ref
[76] Huang Qing and Wu Huaiguang. 2019. QE-integrating framework based on Github knowledge and SVM ranking. Science China Information Sciences 62, 5 (2019), 1–16.Google ScholarCross Ref
[77] Pawlak Renaud, Monperrus Martin, Petitprez Nicolas, Noguera Carlos, and Seinturier Lionel. 2016. SPOON: A library for implementing analyses and transformations of java source code. Softw. Pract. Exp., 46, 9 (2016), 1155–1179.Google Scholar
[78] Lewis Mike, Liu Yinhan, Goyal Naman, Ghazvininejad Marjan, Mohamed Abdelrahman, Levy Omer, Stoyanov Veselin, and Zettlemoyer Luke. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the ACL.Google ScholarCross Ref
[79] Heinzerling Benjamin and Inui Kentaro. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL 2021, Online, April 19-23, 2021), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.). Association for Computational Linguistics, 1772–1791.Google Scholar
[80] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.Google Scholar
[81] Wang Yue, Wang Weishi, Joty Shafiq R., and Hoi Steven C. H.. 2021. Codet5: Identifier-aware unified pre-trained encoderdecoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event/Punta Cana, Dominican Republic, 7-11 November, 2021), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708.Google Scholar
[82] Anonymous. 2021. A new search paradigm for natural language code search. (2021).Google Scholar
[83] Buratti Luca, Pujar Saurabh, Bornea Mihaela A., McCarley J. Scott, Zheng Yunhui, Rossiello Gaetano, Morari Alessandro, Laredo Jim, Thost Veronika, Zhuang Yufan, and Domeniconi Giacomo. 2020. Exploring software naturalness through neural language models. CoRR, abs/2006.12641Google Scholar
[84] Wan Yao, Zhao Wei, Zhang Hongyu, Sui Yulei, Xu Guandong, and Jin Hai. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22 Pittsburgh, PA, USA, May 25-27, 2022), ACM, 2377–2388.Google Scholar
[85] Morrison Patrick, Herzig Kim, Murphy Brendan, and Williams Laurie. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 1–9.Google ScholarDigital Library
[86] Ahmad Wasi Uddin, Chakraborty Saikat, Ray Baishakhi, and Chang Kai-Wei. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020, Online, July 5-10, 2020), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, (Eds.). Association for Computational Linguistics, 4998–5007.Google Scholar
[87] Troshin Sergey and Chirkova Nadezhda. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.). Association for Computational Linguistics, 371–383.Google Scholar
[88] Zhou Wenxuan, Du Junyi, and Ren Xiang. 2019. Improving BERT fine-tuning with embedding normalization. ArXiv abs/1911.03918Google Scholar
[89] Radford Alec, Wu Jeffrey, Child Rewon, Luan David, Amodei Dario, Sutskever Ilya. 2018. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2018), 9.Google Scholar
[90] Shin Taylor, Razeghi Yasaman, IV Robert L. Logan, Wallace Eric, and Singh Sameer. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20, 2020), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 4222–4235.Google Scholar
[91] Gao Tianyu, Fisch Adam, and Chen Danqi. 2021. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3816–3830.Google Scholar
[92] Lester Brian, Al-Rfou Rami, and Constant Noah. 2021. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021), MarieFrancine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, (Eds.). Association for Computational Linguistics, 3045–3059.Google Scholar
[93] Li Xiang Lisa and Liang Percy. 2021. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) abs/2101.00190 (2021).Google Scholar
[94] Tang Tianyi, Li Junyi, Zhao Wayne Xin, and Wen Ji-Rong. 2022. Context-tuning: Learning contextualized prompts for natural language generation. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022, Gyeongju, Republic of Korea, October 12-17), Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na, (Eds.). International Committee on Computational Linguistics, 6340–6354.Google Scholar
[95] Roberts Adam, Raffel Colin, and Shazeer Noam. 2020. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, (Eds.). Association for Computational Linguistics, 5418–5426.Google Scholar
[96] Jiang Zhengbao, Xu Frank F., Araki Jun, and Neubig Graham. 2020. How can we know what language models know. Trans. Assoc. Comput. Linguistics 8 (2020), 423–438.Google Scholar
[97] Heinzerling Benjamin and Inui Kentaro. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, (EACL 2021, Online, April 19-23), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.), Association for Computational Linguistics, 1772–1791.Google Scholar

Index Terms

FQN Inference in Partial Code by Prompt-tuned Language Model of Code
1. Software and its engineering
  1. Software organization and properties

Recommendations

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code
ASE '22: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search ...
Read More
Polymorphic type inference for machine code
PLDI '16

For many compiled languages, source-level types are erased very early in the compilation process. As a result, further compiler passes may convert type-safe source into type-unsafe machine code. Type-unsafe idioms in the original source and type-unsafe ...
Read More
Code template inference using language models
ACM SE '10: Proceedings of the 48th Annual Southeast Regional Conference

This paper investigates the use of a natural language processing technique that automatically detects project-specific code templates (i.e., frequently used code blocks), which can be made available to software developers within an integrated ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 33, Issue 2
February 2024
947 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3618077
Editor:
Mauro Pezzè
USI Universitá della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 December 2023
- Online AM: 24 August 2023
- Accepted: 24 July 2023
- Revised: 15 July 2023
- Received: 13 December 2022
Published in tosem Volume 33, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Type inference
fully qualified names
code masked language model
neural knowledge base
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 337
  Total Downloads
- Downloads (Last 12 months)337
- Downloads (Last 6 weeks)46
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

FQN Inference in Partial Code by Prompt-tuned Language Model of Code

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code

Polymorphic type inference for machine code

Code template inference using language models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

FQN Inference in Partial Code by Prompt-tuned Language Model of Code

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code

Polymorphic type inference for machine code

Code template inference using language models

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media