skip to main content
research-article

FQN Inference in Partial Code by Prompt-tuned Language Model of Code

Published:21 December 2023Publication History
Skip Abstract Section

Abstract

Partial code usually involves non-fully-qualified type names (non-FQNs) and undeclared receiving objects. Resolving the FQNs of these non-FQN types and undeclared receiving objects (referred to as type inference) is the prerequisite to effective search and reuse of partial code. Existing dictionary-lookup based methods build a symbolic knowledge base of API names and code contexts, which involve significant compilation overhead and are sensitive to unseen API names and code context variations. In this article, we propose using a prompt-tuned code masked language model (MLM) as a neural knowledge base for type inference, called POME, which is lightweight and has minimal requirements on code compilation. Unlike the existing symbol name and context matching for type inference, POME infers the FQNs syntax and usage knowledge encapsulated in prompt-tuned code MLM through a colze-style fill-in-blank strategy. POME is integrated as a plug-in into web and integrated development environments (IDE) to assist developers in inferring FQNs in the real world. We systematically evaluate POME on a large amount of source code from GitHub and Stack Overflow, and explore its generalization and hybrid capability. The results validate the effectiveness of the POME design and its applicability for partial code type inference, and they can be easily extended to different programming languages (PL). POME can also be used to generate a PL-hybrid type inference model for providing a one-for-all solution. As the first of its kind, our neural type inference method opens the door to many innovative ways of using partial code.

REFERENCES

  1. [1] Saifullah C. M. Khaled, Asaduzzaman Muhammad, and Roy Chanchal Kumar. 2019. Learning from examples to find fully qualified names of API elements in code snippets. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering.243254.Google ScholarGoogle Scholar
  2. [2] Gupta Piyush Kumar, Mehrotra Nikita, and Purandare Rahul. 2020. JCoffee: Using compiler feedback to make partial code snippets compilable. In Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. 810813.Google ScholarGoogle Scholar
  3. [3] Thummalapenta Suresh and Xie Tao. 2007. Parseweb: A programmer assistant for reusing open source code on the web. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 204213.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. [4] Maji Subhadip, Rout Swapna Sourav, and Choudhary Sudeep. 2021. Dcom: A deep column mapper for semantic data type detection. CoRR, abs/2106.12871, 2021.Google ScholarGoogle Scholar
  5. [5] Zhang Tianyi, Upadhyaya Ganesha, Reinhardt Anastasia, Rajan Hridesh, and Kim Miryung. 2018. Are code examples on an online Q&A forum reliable?: A study of API misuse on stack overflow. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.886896.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Piccolboni Luca, Guglielmo Giuseppe Di, Carloni Luca P., and Sethumadhavan Simha. 2021. CRYLOGGER: detecting crypto misuses dynamically. In 42nd IEEE Symposium on Security and Privacy, (SP’21), San Francisco, CA, 1972–1989.Google ScholarGoogle Scholar
  7. [7] Zhou Yaqin, Liu Shangqing, Siow Jing Kai, Du Xiaoning, and Liu Yang. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, (NeurIPS’19), Hanna M. Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett, (Eds.). Vancouver, BC, 10197–10207.Google ScholarGoogle Scholar
  8. [8] Ren Xiaoxue, Ye Xinyuan, Xing Zhenchang, Xia Xin, Xu Xiwei, Zhu Liming, and Sun Jianling. 2020. API-misuse detection driven by fine-grained API-constraint knowledge graph. In Proceedings of the 2020 35th IEEE/ACM International Conference on Automated Software Engineering.461472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [9] Melo Leandro T. C., Ribeiro Rodrigo G., Guimarães Breno C. F., and Pereira Fernando Magno Quintão. 2020. Type inference for C: Applications to the static analysis of incomplete programs. ACM Transactions on Programming Languages and Systems 42, 3 (2020), 15:1–15:71.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. [10] Subramanian Siddharth, Inozemtseva Laura, and Holmes Reid. 2014. Live API documentation. In 36th International Conference on Software Engineering, ICSE’14, Hyderabad, India - May 31 - June 07), Pankaj Jalote, Lionel C. Briand, and André van der Hoek (Eds.). ACM, 643–652.Google ScholarGoogle Scholar
  11. [11] Dong Yiwen, Gu Tianxiao, Tian Yongqiang, and Sun Chengnian. 2022. SnR: Constraint-based type inference for incomplete Java code snippets. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27). ACM, 1982–1993. ACM, 1982–1993.Google ScholarGoogle Scholar
  12. [12] Feng Zhangyin, Guo Daya, Tang Duyu, Duan Nan, Feng Xiaocheng, Gong Ming, Shou Linjun, Qin Bing, Liu Ting, Jiang Daxin, and Zhou Ming. 2020. Codebert: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: (EMNLP 2020, Online Event, 16-20 November 2020, volume EMNLP 2020 of Findings of ACL), Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547.Google ScholarGoogle Scholar
  13. [13] Kanade Aditya, Maniatis Petros, Balakrishnan Gogul, and Shi Kensen. 2020. Learning and evaluating contextual embedding of source code. In Proceedings of the International Conference on Machine Learning. PMLR, 51105121.Google ScholarGoogle Scholar
  14. [14] Devanbu Premkumar T.. 2012. On the naturalness of software. In Proceedings of the 2012 34th International Conference on Software Engineering.837847.Google ScholarGoogle Scholar
  15. [15] Allamanis Miltiadis, Barr Earl T., Devanbu Premkumar T., and Sutton Charles. 2018. A survey of machine learning for big code and naturalness. ACM Comput. Surv. 51, 4 (2018), 81:1–81:37.Google ScholarGoogle Scholar
  16. [16] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.Google ScholarGoogle Scholar
  17. [17] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, Agarwal Sandhini, Herbert-Voss Ariel, Krueger Gretchen, Henighan Tom, Child Rewon, Ramesh Aditya, Ziegler Daniel M., Wu Jeff, Winter Clemens, Hesse Christopher, Chen Mark, Sigler Eric, Litwin Mateusz, Gray Scott, Chess Benjamin, Clark Jack, Berner Christopher, McCandlish Sam, Radford Alec, Sutskever Ilya, and Amodei Dario. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems (NeurIPS 2020, December 6-12, 2020, virtual, 2020), Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.).Google ScholarGoogle Scholar
  18. [18] Raffel Colin, Shazeer Noam, Roberts Adam, Lee Katherine, Narang Sharan, Matena Michael, Zhou Yanqi, Li Wei, and Liu Peter J.. 2020. Exploring the limits of transfer learning with a unified text to-text transformer. J. Mach. Learn. Res. 21 (2020), 140:1–140:67.Google ScholarGoogle Scholar
  19. [19] Liebman Noah, Nagara Michael, Spiewla Jacek, and Zolkosky Erin. 2010. Cuebert: A new mixing board concept for musical theatre. In Proceedings of the NIME.Google ScholarGoogle Scholar
  20. [20] Allamanis Miltiadis, Tarlow Daniel, Gordon Andrew D., and Wei Yi. 2015. Bimodal modelling of source code and natural language. In Proceedings of the International Conference on Machine Learning.Google ScholarGoogle Scholar
  21. [21] Nguyen Anh Tuan, Nguyen Tung Thanh, and Nguyen Tien Nhut. 2013. Lexical statistical machine translation for language migration. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. [22] Haiduc Sonia, Aponte Jairo, Moreno Laura, and Marcus Andrian. 2010. On the use of automated text summarization techniques for summarizing source code. In 17th Working Conference on Reverse Engineering (WCRE’10, 13-16 October 2010, Beverly, MA), Giuliano Antoniol, Martin Pinzger, and Elliot J. Chikofsky, (Eds.). IEEE Computer Society, 35–44.Google ScholarGoogle Scholar
  23. [23] Hellendoorn Vincent J., Sutton Charles, Singh Rishabh, Maniatis Petros, and Bieber David. 2020. Global relational models of source code. In Proceedings of the International Conference on Learning Representations.Google ScholarGoogle Scholar
  24. [24] Petroni Fabio, Rocktäschel Tim, Riedel Sebastian, Lewis Patrick S. H., Bakhtin Anton, Wu Yuxiang, and Miller Alexander H.. 2019. Language models as knowledge bases? In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, (EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019), Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan (Eds.). Association for Computational Linguistics, 2463–2473.Google ScholarGoogle Scholar
  25. [25] Bollacker Kurt D., Evans Colin, Paritosh Praveen K., Sturge Tim, and Taylor Jamie. 2008. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. [26] Redmon Joseph, Divvala Santosh Kumar, Girshick Ross B., and Farhadi Ali. 2016. You only look once: Unified, real-time object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16, Las Vegas, NV, USA, June 27-30), IEEE Computer Society, 779–788.Google ScholarGoogle Scholar
  27. [27] Anonymous. 2022. Analyzing CodeBERT’s performance on natural language code search. (2022).Google ScholarGoogle Scholar
  28. [28] Sun Yi, Zheng Yu, Hao Chao, and Qiu Hangping. 2021. NSP-BERT: A prompt-based zero-shot learner through an original pre-training task-next sentence prediction. CoRR, abs/2109.03564.Google ScholarGoogle Scholar
  29. [29] Han Xu, Zhao Weilin, Ding Ning, Liu Zhiyuan, and Sun Maosong. 2022. PTR: prompt tuning with rules for text classification. AI Open, 3 (2022), 182–192.Google ScholarGoogle Scholar
  30. [30] Gu Yuxian, Han Xu, Liu Zhiyuan, and Huang Minlie. 2022. PPT: pre-trained prompt tuning for few-shot learning. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (ACL 2022, Dublin, Ireland, May 22-27, 2022), Association for Computational Linguistics, Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, (Eds.). 8410–8423.Google ScholarGoogle Scholar
  31. [31] Ding Ning, Chen Yulin, Han Xu, Xu Guangwei, Xie Pengjun, Zheng Haitao, Liu Zhiyuan, Li Juanzi, and Kim Hong-Gee. 2022. Prompt-learning for fine-grained entity typing. In Findings of the Association for Computational Linguistics: (EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022), Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, 6888–6901.Google ScholarGoogle Scholar
  32. [32] Liu Xiao, Ji Kaixuan, Fu Yicheng, Du Zhengxiao, Yang Zhilin, and Tang Jie. 2021. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. CoRR, abs/2110.07602.Google ScholarGoogle Scholar
  33. [33] Phan Hung Dang, Nguyen Hoan Anh, Tran Ngoc M., Truong Linh-Huyen, Nguyen Anh Tuan, and Nguyen Tien Nhut. 2018. Statistical learning of API fully qualified names in code snippets of online forums. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering.632642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. [34] ChatGPT. https://openai.com/blog/chatgpt. Access date: May 13, 2023.Google ScholarGoogle Scholar
  35. [35] Schick Timo and Schütze Hinrich. 2021. It’s not just size that matters: Small language models are also few-shot learners. arXiv:2009.07118. Retrieved from https://arxiv.org/abs/2009.07118Google ScholarGoogle Scholar
  36. [36] Schick Timo and Schütze Hinrich. 2021. Exploiting cloze-questions for few-shot text classification and natural language inference. In Proceedings of the EACL.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Huang Qing, Yuan Zhiqiang, Xing Zhenchang, Xu Xiwei, Zhu Liming, and Lu Qinghua. 2023. Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering.Association for Computing Machinery, New York, NY, 13 pages. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Dagenais Barthélémy and Hendren Laurie. 2008. Enabling static analysis for partial java programs. In Proceedings of the 23rd ACM SIGPLAN Conference on Object-oriented Programming Systems Languages and Applications. 313328.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Advances in Neural Information Processing Systems Annual Conference on Neural Information Processing Systems (2017, December 4-9, 2017), Isabelle Guyon, Ulrike von Luxburg, Samy Bengio, Hanna M. Wallach, Rob Fergus, S. V. N. Vishwanathan, and Roman Garnett (Eds.). Long Beach, CA, 5998–6008.Google ScholarGoogle Scholar
  40. [40] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019), Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186.Google ScholarGoogle Scholar
  41. [41] Husain Hamel, Wu Ho-Hsiang, Gazit Tiferet, Allamanis Miltiadis, and Brockschmidt Marc. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. CoRR, abs/1909.09436.Google ScholarGoogle Scholar
  42. [42] Karmakar Anjan and Robbes Romain. 2021. What do pre-trained code models know about code? In Proceedings of the 2021 36th IEEE/ACM International Conference on Automated Software Engineering.13321336.Google ScholarGoogle Scholar
  43. [43] Troshin Sergey and Chirkova Nadezhda. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.), Association for Computational Linguistics, 371–383.Google ScholarGoogle Scholar
  44. [44] Wan Yao, Zhao Wei, Zhang Hongyu, Sui Yulei, Xu Guandong, and Jin Hairong. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22). Pittsburgh, PA, 2377–2388.Google ScholarGoogle Scholar
  45. [45] Lu Shuai, Guo Daya, Ren Shuo, Huang Junjie, Svyatkovskiy Alexey, Blanco Ambrosio, Clement Colin B., Drain Dawn, Jiang Daxin, Tang Duyu, Li Ge, Zhou Lidong, Shou Linjun, Zhou Long, Tufano Michele, Gong Ming, Zhou Ming, Duan Nan, Sundaresan Neel, Deng Shao Kun, Fu Shengyu, and Liu Shujie. 2021. Codexglue: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, 2021, Joaquin Vanschoren and Sai-Kit Yeung (Eds.).Google ScholarGoogle Scholar
  46. [46] Wang Wenhan, Li Ge, Ma Bo, Xia Xin, and Jin Zhi. 2020. Detecting code clones with graph neural network and flow-augmented abstract syntax tree. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. IEEE, 261271.Google ScholarGoogle ScholarCross RefCross Ref
  47. [47] Tufano Michele, Watson Cody, Bavota Gabriele, Penta Massimiliano Di, White Martin, and Poshyvanyk Denys. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology 28, 4 (2019), 129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. [48] Wu Yonghui, Schuster Mike, Chen Zhifeng, Le Quoc V., Norouzi Mohammad, Macherey Wolfgang, Krikun Maxim, Cao Yuan, Gao Qin, Macherey Klaus, Klingner Jeff, Shah Apurva, Johnson Melvin, Liu Xiaobing, Kaiser Lukasz, Gouws Stephan, Kato Yoshikiyo, Kudo Taku, Kazawa Hideto, Stevens Keith, Kurian George, Patil Nishant, Wang Wei, Young Cliff, Smith Jason, Riesa Jason, Rudnick Alex, Vinyals Oriol, Corrado Greg, Hughes Macduff, and Dean Jeffrey. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144.Google ScholarGoogle Scholar
  49. [49] Gu Jian, Salza Pasquale, and Gall Harald C.. 2022. Assemble foundation models for automatic code summarization.Google ScholarGoogle Scholar
  50. [50] Wang Deze, Jia Zhouyang, Li Shanshan, Yu Yue, Xiong Yun, Dong Wei, and Liao Xiangke. 2021. Bridging pre-trained models and downstream tasks for source code understanding. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022), ACM, 287–298.Google ScholarGoogle Scholar
  51. [51] Guu Kelvin, Lee Kenton, Tung Zora, Pasupat Panupong, and Chang Ming-Wei. 2020. REALM: retrieval-augmented language model pre-training. CoRR, abs/2002.08909.Google ScholarGoogle Scholar
  52. [52] Akram Raja Naeem and Markantonakis Konstantinos. 2016. Challenges of security and trust of mobile devices as digital avionics component. In Proceedings of the 2016 Integrated Communications Navigation and Surveillance. 1C4–1–1C4–11. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] IDEA IntelliJ. https://www.jetbrains.com/idea/. Access date: December, 2022.Google ScholarGoogle Scholar
  54. [54] Lin Chin-Yew and Och Franz Josef. 2004. ORANGE: A method for evaluating automatic evaluation metrics for machine translation. In Proceedings of the COLING.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. [55] Liu Pengfei, Yuan Weizhe, Fu Jinlan, Jiang Zhengbao, Hayashi Hiroaki, and Neubig Graham. 2021. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. arXiv:2107.13586. Retrieved from https://arxiv.org/abs/2107.13586Google ScholarGoogle Scholar
  56. [56] WELCH B. L.. 1947. The Generalization of ‘Student’s’ Problem when several different population varlances are Involved. Biometrik, 34, 1-2 (1947), 28–35.Google ScholarGoogle Scholar
  57. [57] Pearce Hammond, Ahmad Baleegh, Tan Benjamin, Dolan-Gavitt Brendan, and Karri Ramesh. 2021. An empirical cybersecurity evaluation of github copilot’s code contributions. CoRR, abs/2108.09293Google ScholarGoogle Scholar
  58. [58] Gao Leo, Biderman Stella, Black Sid, Golding Laurence, Hoppe Travis, Foster Charles, Phang Jason, He Horace, Thite Anish, Nabeshima Noa, Presser Shawn, and Leahy Connor. 2020. The Pile: An 800GB dataset of diverse text for language modeling. arXiv e-prints, arXiv:2101.00027.Google ScholarGoogle Scholar
  59. [59] He Kaiming, Chen Xinlei, Xie Saining, Li Yanghao, Dollár Piotr, and Girshick Ross B.. 2022. Masked autoencoders are scalable vision learners. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022, New Orleans, LA, USA, June 18-24, 2022), IEEE, 15979–15988.Google ScholarGoogle Scholar
  60. [60] Wang Yanlin and Li Hui. 2021. Code completion by modeling flattened abstract syntax trees as graphs. In Proceedings of the AAAI Conference on Artificial Intellegence (2021).Google ScholarGoogle Scholar
  61. [61] Peng Yun, Gao Cuiyun, Li Zongjie, Gao Bowei, Lo David, Zhang Qirun, and Lyu Michael. 2022. Static inference meets deep learning. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. [62] Mir Amir M., Latoškinas Evaldas, Proksch Sebastian, and Gousios Georgios. 2022. Type4Py. In Proceedings of the 44th International Conference on Software Engineering. ACM. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Zhang Tianyi, Yang Di, Lopes Crista, and Kim Miryung. 2019. Analyzing and supporting adaptation of online code examples. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering. IEEE, 316327.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. [64] Umarji Medha, Sim Susan Elliott, and Lopes Crista. 2008. Archetypal internet-scale source code searching. In Proceedings of the IFIP International Conference on Open Source Systems. Springer, 257263.Google ScholarGoogle ScholarCross RefCross Ref
  65. [65] Gallardo-Valencia Rosalva E. and Sim Susan Elliott. 2009. Internet-scale code search. In 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation. 49–52.Google ScholarGoogle Scholar
  66. [66] Brandt Joel, Guo Philip J., Lewenstein Joel, Dontcheva Mira, and Klemmer Scott R.. 2009. Two studies of opportunistic programming: interleaving web foraging, learning, and writing code. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 15891598.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Baltes Sebastian and Diehl Stephan. 2019. Usage and attribution of Stack Overflow code snippets in GitHub projects. Empirical Software Engineering 24, 3 (2019), 12591295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. [68] Wu Yuhao, Wang Shaowei, Bezemer Cor-Paul, and Inoue Katsuro. 2019. How do developers utilize source code from stack overflow? Empirical Software Engineering 24, 2 (2019), 637673.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Li Hongwei, Li Sirui, Sun Jiamou, Xing Zhenchang, Peng Xin, Liu Mingwei, and Zhao Xuejiao. 2018. Improving API caveats accessibility by mining API caveats knowledge graph. In Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution.183193.Google ScholarGoogle Scholar
  70. [70] Sun Jiamou, Xing Zhenchang, Chu Rui, Bai Heilai, Wang Jinshui, and Peng Xin. 2019. Know-how in programming tasks: From textual tutorials to task-oriented knowledge graph. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution.257268.Google ScholarGoogle Scholar
  71. [71] Liu Mingwei, Peng Xin, Marcus Andrian, Xing Zhenchang, Xie Wenkai, Xing Shuangshuang, and Liu Yang. 2019. Generating query-specific class API summaries. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (2019).Google ScholarGoogle Scholar
  72. [72] Linstead Erik, Bajracharya Sushil, Ngo Trung, Rigor Paul, Lopes Cristina, and Baldi Pierre. 2009. Sourcerer: Mining and searching internet-scale software repositories. Data Mining and Knowledge Discovery 18, 2 (2009), 300336.Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. [73] Kim Kisub, Kim Dongsun, Bissyandé Tegawendé F., Choi Eunjong, Li Li, Klein Jacques, and Traon Yves Le. 2018. FaCoY – A code-to-code search engine. In Proceedings of the 2018 IEEE/ACM 40th International Conference on Software Engineering. 946957. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. [74] Huang Qing, Qiu An, Zhong Maosheng, and Wang Yuan. 2020. A code-description representation learning model based on attention. In Proceedings of the 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering. 447455. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Huang Qing and Wu Guoqing. 2019. Enhance code search via reformulating queries with evolving contexts. Automated Software Engineering 26, 4 (2019), 705732.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Huang Qing and Wu Huaiguang. 2019. QE-integrating framework based on Github knowledge and SVM ranking. Science China Information Sciences 62, 5 (2019), 116.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Pawlak Renaud, Monperrus Martin, Petitprez Nicolas, Noguera Carlos, and Seinturier Lionel. 2016. SPOON: A library for implementing analyses and transformations of java source code. Softw. Pract. Exp., 46, 9 (2016), 1155–1179.Google ScholarGoogle Scholar
  78. [78] Lewis Mike, Liu Yinhan, Goyal Naman, Ghazvininejad Marjan, Mohamed Abdelrahman, Levy Omer, Stoyanov Veselin, and Zettlemoyer Luke. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the ACL.Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] Heinzerling Benjamin and Inui Kentaro. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (EACL 2021, Online, April 19-23, 2021), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.). Association for Computational Linguistics, 1772–1791.Google ScholarGoogle Scholar
  80. [80] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.Google ScholarGoogle Scholar
  81. [81] Wang Yue, Wang Weishi, Joty Shafiq R., and Hoi Steven C. H.. 2021. Codet5: Identifier-aware unified pre-trained encoderdecoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event/Punta Cana, Dominican Republic, 7-11 November, 2021), Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (Eds.). Association for Computational Linguistics, 8696–8708.Google ScholarGoogle Scholar
  82. [82] Anonymous. 2021. A new search paradigm for natural language code search. (2021).Google ScholarGoogle Scholar
  83. [83] Buratti Luca, Pujar Saurabh, Bornea Mihaela A., McCarley J. Scott, Zheng Yunhui, Rossiello Gaetano, Morari Alessandro, Laredo Jim, Thost Veronika, Zhuang Yufan, and Domeniconi Giacomo. 2020. Exploring software naturalness through neural language models. CoRR, abs/2006.12641Google ScholarGoogle Scholar
  84. [84] Wan Yao, Zhao Wei, Zhang Hongyu, Sui Yulei, Xu Guandong, and Jin Hai. 2022. What do they capture? - A structural analysis of pre-trained language models for source code. In 44th IEEE/ACM 44th International Conference on Software Engineering (ICSE’22 Pittsburgh, PA, USA, May 25-27, 2022), ACM, 2377–2388.Google ScholarGoogle Scholar
  85. [85] Morrison Patrick, Herzig Kim, Murphy Brendan, and Williams Laurie. 2015. Challenges with applying vulnerability prediction models. In Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. 19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Ahmad Wasi Uddin, Chakraborty Saikat, Ray Baishakhi, and Chang Kai-Wei. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020, Online, July 5-10, 2020), Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel R. Tetreault, (Eds.). Association for Computational Linguistics, 4998–5007.Google ScholarGoogle Scholar
  87. [87] Troshin Sergey and Chirkova Nadezhda. 2022. Probing pretrained models of source codes. In Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP 2022, Abu Dhabi, United Arab Emirates (Hybrid), December 8, 2022), Jasmijn Bastings, Yonatan Belinkov, Yanai Elazar, Dieuwke Hupkes, Naomi Saphra, and Sarah Wiegreffe (Eds.). Association for Computational Linguistics, 371–383.Google ScholarGoogle Scholar
  88. [88] Zhou Wenxuan, Du Junyi, and Ren Xiang. 2019. Improving BERT fine-tuning with embedding normalization. ArXiv abs/1911.03918Google ScholarGoogle Scholar
  89. [89] Radford Alec, Wu Jeffrey, Child Rewon, Luan David, Amodei Dario, Sutskever Ilya. 2018. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2018), 9.Google ScholarGoogle Scholar
  90. [90] Shin Taylor, Razeghi Yasaman, IV Robert L. Logan, Wallace Eric, and Singh Sameer. 2020. Autoprompt: Eliciting knowledge from language models with automatically generated prompts. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20, 2020), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 4222–4235.Google ScholarGoogle Scholar
  91. [91] Gao Tianyu, Fisch Adam, and Chen Danqi. 2021. Making pre-trained language models better few-shot learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, August 1-6, 2021), Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (Eds.). Association for Computational Linguistics, 3816–3830.Google ScholarGoogle Scholar
  92. [92] Lester Brian, Al-Rfou Rami, and Constant Noah. 2021. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021), MarieFrancine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih, (Eds.). Association for Computational Linguistics, 3045–3059.Google ScholarGoogle Scholar
  93. [93] Li Xiang Lisa and Liang Percy. 2021. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) abs/2101.00190 (2021).Google ScholarGoogle Scholar
  94. [94] Tang Tianyi, Li Junyi, Zhao Wayne Xin, and Wen Ji-Rong. 2022. Context-tuning: Learning contextualized prompts for natural language generation. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022, Gyeongju, Republic of Korea, October 12-17), Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, and Seung-Hoon Na, (Eds.). International Committee on Computational Linguistics, 6340–6354.Google ScholarGoogle Scholar
  95. [95] Roberts Adam, Raffel Colin, and Shazeer Noam. 2020. How much knowledge can you pack into the parameters of a language model? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020, Online, November 16-20), Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu, (Eds.). Association for Computational Linguistics, 5418–5426.Google ScholarGoogle Scholar
  96. [96] Jiang Zhengbao, Xu Frank F., Araki Jun, and Neubig Graham. 2020. How can we know what language models know. Trans. Assoc. Comput. Linguistics 8 (2020), 423–438.Google ScholarGoogle Scholar
  97. [97] Heinzerling Benjamin and Inui Kentaro. 2021. Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, (EACL 2021, Online, April 19-23), Paola Merlo, Jörg Tiedemann, and Reut Tsarfaty, (Eds.), Association for Computational Linguistics, 1772–1791.Google ScholarGoogle Scholar

Index Terms

  1. FQN Inference in Partial Code by Prompt-tuned Language Model of Code

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Software Engineering and Methodology
      ACM Transactions on Software Engineering and Methodology  Volume 33, Issue 2
      February 2024
      947 pages
      ISSN:1049-331X
      EISSN:1557-7392
      DOI:10.1145/3618077
      • Editor:
      • Mauro Pezzè
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 December 2023
      • Online AM: 24 August 2023
      • Accepted: 24 July 2023
      • Revised: 15 July 2023
      • Received: 13 December 2022
      Published in tosem Volume 33, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
    • Article Metrics

      • Downloads (Last 12 months)337
      • Downloads (Last 6 weeks)46

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text