Skip to main content
Log in

AdaComplete: improve DL-based code completion method’s domain adaptability

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Code completion is an important feature in integrated development environments that can accelerate the coding process. With the development of deep learning technologies and easy-to-acquire open-source codebases, many Deep Learning based code completion models (DL models) are proposed. These models are trained using the generic source code datasets, resulting in poor domain adaptability. That is, these models suffer from performance loss when helping programmers code in a specific domain, e.g., helping to decide which domain-specific API to call. To solve the problem, we propose AdaComplete, a simple and effective framework that utilizes a local code completion model to compensate DL models’ domain adaptability. The local code completion model is trained using the source codes of the target domain. When used in code completion, given the context, AdaComplete can adaptively choose the recommendations from either the DL model or the local code completion model based on our hand-crafted features. Experimental results show that AdaComplete outperforms state-of-the-art DL-based code completion methods on specific domains and can improve the accuracy by 7% on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://www.tabnine.com/.

  2. https://github.com/.

  3. https://figshare.com/s/c0bd0430cd4134ab07f4.

  4. https://github.com/mast-group/OpenVocabCodeNLM.

  5. https://huggingface.co/transformers/index.html.

  6. https://github.com/mast-group/OpenVocabCodeNLM.

  7. https://github.com/LiuFang816/CugLM.

  8. https://github.com/SLP-team/SLP-Core.

References

  • Allamanis, M., Barr, E.T., Devanbu, P., et al.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)

    Article  Google Scholar 

  • Alon, U., Sadaka, R., Levy, O., et al.: Structural language models of code. In: Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, Proceedings of Machine Learning Research, vol 119. PMLR, pp 245–256, (2020) http://proceedings.mlr.press/v119/alon20a.html

  • Bakhtin, A., Szlam, A., Ranzato, M., et al.: Lightweight adaptive mixture of neural and n-gram language models.(2018) arXiv e-prints arXiv–1804

  • Barone, AVM., Haddow, B., Germann, U., et al.: Regularization techniques for fine-tuning in neural machine translation. (2017) arXiv preprint arXiv:1707.09920

  • Bhoopchand, A., Rocktäschel, T., Barr, E., et al.: Learning python code suggestion with a sparse pointer network. (2016) arXiv preprint arXiv:1611.08307

  • Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., et al.: An estimate of an upper bound for the entropy of english. Comput. Linguist. 18(1), 31–40 (1992)

    MATH  Google Scholar 

  • Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: van Vliet H, Issarny V (eds) Proceedings of the 7th joint meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2009, Amsterdam, The Netherlands, August 24-28, 2009. ACM, pp 213–222, (2009) https://doi.org/10.1145/1595696.1595728,

  • Chen, C., Breiman, L.: Using Random Forest to Learn Imbalanced Data. University of California, Berkeley (2004)

    Google Scholar 

  • Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. Comput. Speech Lang. 13(4), 359–394 (1999)

    Article  Google Scholar 

  • Ciniselli, M., Cooper, N., Pascarella, L., et al.: An empirical study on the usage of bert models for code completion. In: 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), IEEE, pp. 108–119 (2021)

  • Corbière, C., Thome, N., Bar-Hen, A., et al.: Addressing failure prediction by learning model confidence. In: Wallach HM, Larochelle H, Beygelzimer A, et al (eds) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 2898–2909, (2019) https://proceedings.neurips.cc/paper/2019/hash/757f843a169cc678064d9530d12a1881-Abstract.html

  • Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn 20(3), 273–297 (1995). https://doi.org/10.1007/BF00994018

    Article  MATH  Google Scholar 

  • Dai, Z., Yang, Z., Yang, Y., et al.: Transformer-xl: Attentive language models beyond a fixed-length context. (2019) arXiv preprint arXiv:1901.02860

  • Dam, HK., Tran, T., Pham, T.: A deep language model for software code. (2016) arXiv preprint arXiv:1608.02715

  • Devlin, J., Chang, MW., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. (2018) arXiv preprint arXiv:1810.04805

  • Feng, Z., Guo, D., Tang, D., et al.: Codebert: A pre-trained model for programming and natural languages. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings, pp 1536–1547 (2020)

  • Gage, P.: A new algorithm for data compression. C Users J. 12(2), 23–38 (1994)

    Google Scholar 

  • Hellendoorn, VJ., Devanbu, P.: Are deep neural networks the best choice for modeling source code? In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp 763–773 (2017)

  • Hindle, A., Barr, ET., Su, Z., et al.: On the naturalness of software. In: Glinz M, Murphy GC, Pezzè M (eds) 34th International Conference on Software Engineering, ICSE 2012, June 2-9, 2012, Zurich, Switzerland. IEEE Computer Society, pp. 837–847, (2012) https://doi.org/10.1109/ICSE.2012.6227135,

  • Ho, TK.: Random decision forests. In: Proceedings of 3rd international conference on document analysis and recognition, IEEE, pp. 278–282 (1995)

  • Hou, D., Pletcher, DM.: Towards a better code completion system by api grouping, filtering, and popularity-based ranking. In: Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering, pp. 26–30 (2010)

  • Kamath, A., Jia, R., Liang, P.: Selective question answering under domain shift. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. Association for Computational Linguistics, pp. 5684–5696, (2020) https://doi.org/10.18653/v1/2020.acl-main.503,

  • Karampatsis, RM., Babii, H., Robbes, R., et al.: Big code!= big vocabulary: Open-vocabulary models for source code. In: 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE), IEEE, pp. 1073–1085 (2020)

  • Kim, S., Zhao, J., Tian, Y., et al.: Code prediction by feeding trees to transformers. (2020) arXiv preprint arXiv:2003.13848

  • Kim, S., Zhao, J., Tian, Y., et al.: Code prediction by feeding trees to transformers. In: 43rd IEEE/ACM International Conference on Software Engineering, ICSE 2021, Madrid, Spain, 22-30 May 2021. IEEE, pp. 150–162, (2021) https://doi.org/10.1109/ICSE43902.2021.00026,

  • Kuang, K., Xiong, R., Cui, P., et al.: Stable prediction with model misspecification and agnostic distribution shift. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applicationsof Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, pp. 4485–4492, (2020) https://aaai.org/ojs/index.php/AAAI/article/view/5876

  • Li, J., Wang, Y., Lyu, MR., et al.: Code completion with neural attention and pointer networks. In: Lang J (ed) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, pp. 4159–4165, (2018) https://doi.org/10.24963/ijcai.2018/578,

  • Liu, C., Wang, X., Shin, R., et al.: Neural code completion. (2017) https://openreview.net/forum?id=rJbPBt9lg

  • Liu, F., Li, G., Wei, B., et al.: A self-attentional neural architecture for code completion with multi-task learning. In: Proceedings of the 28th International Conference on Program Comprehension, pp. 37–47 (2020a)

  • Liu, F., Li, G., Zhao, Y., et al.: Multi-task learning based pre-trained language model for code completion. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp. 473–485 (2020b)

  • Nguyen, TT., Nguyen, AT., Nguyen, HA., et al.: A statistical semantic language model for source code. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pp. 532–542 (2013)

  • Pedregosa, F., Varoquaux, G., Gramfort, A., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  • Radford, A., Wu, J., Child, R., et al.: Language models are unsupervised multitask learners. OpenAI Blog 1(8), 9 (2019)

    Google Scholar 

  • Raychev, V., Vechev, MT., Yahav, E.: Code completion with statistical language models. In: PLDI. ACM, pp. 419–428 (2014)

  • Robbes, R., Lanza, M.: Improving code completion with program history. Autom. Softw. Eng. 17(2), 181–212 (2010). https://doi.org/10.1007/s10515-010-0064-x

    Article  Google Scholar 

  • Roos, P.: Fast and precise statistical code completion. In: ICSE (2). IEEE Computer Society, pp. 757–759 (2015)

  • Saenko, K., Kulis, B., Fritz, M., et al.: Adapting visual category models to new domains. In: Daniilidis K, Maragos P, Paragios N (eds) Computer Vision - ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV, Lecture Notes in Computer Science, vol 6314. Springer, pp. 213–226, (2010) https://doi.org/10.1007/978-3-642-15561-1_16,

  • Salton, G., Ross, R., Kelleher, J.: Attentive language models. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 441–450 (2017)

  • Tu, Z., Su, Z., Devanbu, P.: On the localness of software. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 269–280 (2014)

  • Vashishth, S., Yadav, P., Bhandari, M., et al.: Confidence-based graph convolutional networks for semi-supervised learning. In: The 22nd International Conference on Artificial Intelligence and Statistics, PMLR, pp. 1792–1801 (2019)

  • Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. (2017) arXiv preprint arXiv:1706.03762

  • Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. (2015) arXiv preprint arXiv:1506.03134

  • Wang, J., Lan, C., Liu, C., et al.: Generalizing to unseen domains: A survey on domain generalization. In: Zhou Z (ed) Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021. ijcai.org, pp. 4627–4635, (2021) https://doi.org/10.24963/ijcai.2021/628,

  • Wang, Y., Li, H.: Code completion by modeling flattened abstract syntax trees as graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 14,015–14,023 (2021)

  • Yang, K., Yu, H., Fan, G., et al.: A graph sequence neural architecture for code completion with semantic structure features. J. Softw. Evol. Process 34(1), e2414 (2022)

    Article  Google Scholar 

  • Yang, X., Song, Q., Wang, Y.: A weighted support vector machine for data classification. Int. J. Pattern Recogn. Artif. Intell. 21(05), 961–976 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the proposal of the idea. ZW: conducted the experiments. ZW and FL: wrote the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Zhi Jin.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Liu, F., Hao, Y. et al. AdaComplete: improve DL-based code completion method’s domain adaptability. Autom Softw Eng 30, 11 (2023). https://doi.org/10.1007/s10515-023-00376-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10515-023-00376-y

Keywords

Navigation