AdaComplete: improve DL-based code completion method’s domain adaptability

Wang, Zejun; Liu, Fang; Hao, Yiyang; Jin, Zhi

doi:10.1007/s10515-023-00376-y

AdaComplete: improve DL-based code completion method’s domain adaptability

Published: 06 March 2023

Volume 30, article number 11, (2023)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Zejun Wang^1,2,
Fang Liu³,
Yiyang Hao⁴ &
…
Zhi Jin^1,2

314 Accesses
1 Citation
Explore all metrics

Abstract

Code completion is an important feature in integrated development environments that can accelerate the coding process. With the development of deep learning technologies and easy-to-acquire open-source codebases, many Deep Learning based code completion models (DL models) are proposed. These models are trained using the generic source code datasets, resulting in poor domain adaptability. That is, these models suffer from performance loss when helping programmers code in a specific domain, e.g., helping to decide which domain-specific API to call. To solve the problem, we propose AdaComplete, a simple and effective framework that utilizes a local code completion model to compensate DL models’ domain adaptability. The local code completion model is trained using the source codes of the target domain. When used in code completion, given the context, AdaComplete can adaptively choose the recommendations from either the DL model or the local code completion model based on our hand-crafted features. Experimental results show that AdaComplete outperforms state-of-the-art DL-based code completion methods on specific domains and can improve the accuracy by 7% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Code Completion Approach Combining Pointer Network and Transformer-XL Network

Source code auto-completion using various deep learning models under limited computing resources

Article Open access 06 April 2022

Madhab Sharma, Tapas Kumar Mishra & Arun Kumar

Code Autocomplete Using Transformers

Notes

References

Allamanis, M., Barr, E.T., Devanbu, P., et al.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)
Article Google Scholar
Alon, U., Sadaka, R., Levy, O., et al.: Structural language models of code. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, vol 119. PMLR, pp 245–256, (2020) http://proceedings.mlr.press/v119/alon20a.html
Bakhtin, A., Szlam, A., Ranzato, M., et al.: Lightweight adaptive mixture of neural and n-gram language models.(2018) arXiv e-prints arXiv–1804
Barone, AVM., Haddow, B., Germann, U., et al.: Regularization techniques for fine-tuning in neural machine translation. (2017) arXiv preprint arXiv:1707.09920
Bhoopchand, A., Rocktäschel, T., Barr, E., et al.: Learning python code suggestion with a sparse pointer network. (2016) arXiv preprint arXiv:1611.08307
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., et al.: An estimate of an upper bound for the entropy of english. Comput. Linguist. 18(1), 31–40 (1992)
MATH Google Scholar
Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: van Vliet H, Issarny V (eds) Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2009, Amsterdam, The Netherlands, August 24-28, 2009. ACM, pp 213–222, (2009) https://doi.org/10.1145/1595696.1595728,
Chen, C., Breiman, L.: Using Random Forest to Learn Imbalanced Data. University of California, Berkeley (2004)
Google Scholar
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999)
Article Google Scholar
Ciniselli, M., Cooper, N., Pascarella, L., et al.: An empirical study on the usage of bert models for code completion. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, pp. 108–119 (2021)
Corbière, C., Thome, N., Bar-Hen, A., et al.: Addressing failure prediction by learning model confidence. In: Wallach HM, Larochelle H, Beygelzimer A, et al (eds) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 2898–2909, (2019) https://proceedings.neurips.cc/paper/2019/hash/757f843a169cc678064d9530d12a1881-Abstract.html
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018
Article MATH Google Scholar
Dai, Z., Yang, Z., Yang, Y., et al.: Transformer-xl: Attentive language models beyond a fixed-length context. (2019) arXiv preprint arXiv:1901.02860
Dam, HK., Tran, T., Pham, T.: A deep language model for software code. (2016) arXiv preprint arXiv:1608.02715
Devlin, J., Chang, MW., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. (2018) arXiv preprint arXiv:1810.04805
Feng, Z., Guo, D., Tang, D., et al.: Codebert: A pre-trained model for programming and natural languages. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp 1536–1547 (2020)
Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)
Google Scholar
Hellendoorn, VJ., Devanbu, P.: Are deep neural networks the best choice for modeling source code? In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp 763–773 (2017)
Hindle, A., Barr, ET., Su, Z., et al.: On the naturalness of software. In: Glinz M, Murphy GC, Pezzè M (eds) 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. IEEE Computer Society, pp. 837–847, (2012) https://doi.org/10.1109/ICSE.2012.6227135,
Ho, TK.: Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, IEEE, pp. 278–282 (1995)
Hou, D., Pletcher, DM.: Towards a better code completion system by api grouping, filtering, and popularity-based ranking. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, pp. 26–30 (2010)
Kamath, A., Jia, R., Liang, P.: Selective question answering under domain shift. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp. 5684–5696, (2020) https://doi.org/10.18653/v1/2020.acl-main.503,
Karampatsis, RM., Babii, H., Robbes, R., et al.: Big code!= big vocabulary: Open-vocabulary models for source code. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), IEEE, pp. 1073–1085 (2020)
Kim, S., Zhao, J., Tian, Y., et al.: Code prediction by feeding trees to transformers. (2020) arXiv preprint arXiv:2003.13848
Kim, S., Zhao, J., Tian, Y., et al.: Code prediction by feeding trees to transformers. In: 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, pp. 150–162, (2021) https://doi.org/10.1109/ICSE43902.2021.00026,
Kuang, K., Xiong, R., Cui, P., et al.: Stable prediction with model misspecification and agnostic distribution shift. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applicationsof Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, pp. 4485–4492, (2020) https://aaai.org/ojs/index.php/AAAI/article/view/5876
Li, J., Wang, Y., Lyu, MR., et al.: Code completion with neural attention and pointer networks. In: Lang J (ed) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, pp. 4159–4165, (2018) https://doi.org/10.24963/ijcai.2018/578,
Liu, C., Wang, X., Shin, R., et al.: Neural code completion. (2017) https://openreview.net/forum?id=rJbPBt9lg
Liu, F., Li, G., Wei, B., et al.: A self-attentional neural architecture for code completion with multi-task learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 37–47 (2020a)
Liu, F., Li, G., Zhao, Y., et al.: Multi-task learning based pre-trained language model for code completion. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp. 473–485 (2020b)
Nguyen, TT., Nguyen, AT., Nguyen, HA., et al.: A statistical semantic language model for source code. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 532–542 (2013)
Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Radford, A., Wu, J., Child, R., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)
Google Scholar
Raychev, V., Vechev, MT., Yahav, E.: Code completion with statistical language models. In: PLDI. ACM, pp. 419–428 (2014)
Robbes, R., Lanza, M.: Improving code completion with program history. Autom. Softw. Eng. 17(2), 181–212 (2010). https://doi.org/10.1007/s10515-010-0064-x
Article Google Scholar
Roos, P.: Fast and precise statistical code completion. In: ICSE (2). IEEE Computer Society, pp. 757–759 (2015)
Saenko, K., Kulis, B., Fritz, M., et al.: Adapting visual category models to new domains. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV, Lecture Notes in Computer Science, vol 6314. Springer, pp. 213–226, (2010) https://doi.org/10.1007/978-3-642-15561-1_16,
Salton, G., Ross, R., Kelleher, J.: Attentive language models. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 441–450 (2017)
Tu, Z., Su, Z., Devanbu, P.: On the localness of software. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 269–280 (2014)
Vashishth, S., Yadav, P., Bhandari, M., et al.: Confidence-based graph convolutional networks for semi-supervised learning. In: The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 1792–1801 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. (2017) arXiv preprint arXiv:1706.03762
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. (2015) arXiv preprint arXiv:1506.03134
Wang, J., Lan, C., Liu, C., et al.: Generalizing to unseen domains: A survey on domain generalization. In: Zhou Z (ed) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021. ijcai.org, pp. 4627–4635, (2021) https://doi.org/10.24963/ijcai.2021/628,
Wang, Y., Li, H.: Code completion by modeling flattened abstract syntax trees as graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 14,015–14,023 (2021)
Yang, K., Yu, H., Fan, G., et al.: A graph sequence neural architecture for code completion with semantic structure features. J. Softw. Evol. Process 34(1), e2414 (2022)
Article Google Scholar
Yang, X., Song, Q., Wang, Y.: A weighted support vector machine for data classification. Int. J. Pattern Recogn. Artif. Intell. 21(05), 961–976 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Peking University, Beijing, China
Zejun Wang & Zhi Jin
Key Lab of High Confidence Software Technology, MoE (Peking University), Beijing, China
Zejun Wang & Zhi Jin
State Key Laboratory of Software Development Environment, Beihang University, Beijing, China
Fang Liu
Silicon Heart Tech Co., Beijing, China
Yiyang Hao

Authors

Zejun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yiyang Hao
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Jin
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the proposal of the idea. ZW: conducted the experiments. ZW and FL: wrote the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhi Jin.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Z., Liu, F., Hao, Y. et al. AdaComplete: improve DL-based code completion method’s domain adaptability. Autom Softw Eng 30, 11 (2023). https://doi.org/10.1007/s10515-023-00376-y

Download citation

Received: 31 August 2022
Accepted: 17 January 2023
Published: 06 March 2023
DOI: https://doi.org/10.1007/s10515-023-00376-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AdaComplete: improve DL-based code completion method’s domain adaptability

Abstract

Access this article

Similar content being viewed by others

A Code Completion Approach Combining Pointer Network and Transformer-XL Network

Source code auto-completion using various deep learning models under limited computing resources

Code Autocomplete Using Transformers

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

AdaComplete: improve DL-based code completion method’s domain adaptability

Abstract

Access this article

Similar content being viewed by others

A Code Completion Approach Combining Pointer Network and Transformer-XL Network

Source code auto-completion using various deep learning models under limited computing resources

Code Autocomplete Using Transformers

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation