Code Autocomplete Using Transformers

Meyrer, Gabriel T.; Araújo, Denis A.; Rigo, Sandro J.

doi:10.1007/978-3-030-91699-2_15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13074))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

1163 Accesses
1 Citations

Abstract

In software development, code autocomplete can be an essential tool in order to accelerate coding. However, many of these tools built into the IDEs are limited to suggesting only methods or arguments, often presenting to the user long lists of irrelevant items. Since innovations introduced by transformer-based models that have reached the state of the art performance in tasks involving natural language processing (NLP), the application of these models also in tasks involving code intelligence, such as code completion, has become a frequent object of study in recent years. In these paper, we present a transformer-based model trained on 1.2 million Java files gathered from top-starred Github repositories. Our evaluation approach was based on measuring the model’s ability to predict the completion of a line, proposing a new metric to measure the applicability of the suggestions that we consider better adapted to the practical reality of the code completion task. With a recently developed Java web project as test set, our experiments showed that in 55.9% of the test cases the model brought at least one suggestion applicable, while the best baseline model presented this in 26.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE ’09, pp. 213–222. Association for Computing Machinery, New York (2020). ISBN 9781605580012, https://doi.org/10.1145/1595696.1595728
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, vol. 1 (Long and Short Papers), pp. 4171–4186. Association for Computational Linguistics (2019). https://doi.org/10.18653/v1/N19-1423, https://www.aclweb.org/anthology/N19-1423
Feng, Z., et al.: CodeBERT: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536–1547. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.139, https://www.aclweb.org/anthology/2020.findings-emnlp.139
Karampatsis, R.M., Babii, H., Robbes, R., Sutton, C., Janes, A.: Big code != big vocabulary. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (2020). https://doi.org/10.1145/3377811.3380342, http://dx.doi.org/10.1145/3377811.3380342
Kim, S., Zhao, J., Tian, Y., Chandra, S.: Code prediction by feeding trees to transformers. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 150–162 (2021). https://doi.org/10.1109/ICSE43902.2021.00026
Li, J., Wang, Y., Lyu, M.R., King, I.: Code completion with neural attention and pointer networks. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (2018). https://doi.org/10.24963/ijcai.2018/578, http://dx.doi.org/10.24963/ijcai.2018/578
Shuai, L., et al.: A machine learning benchmark dataset for code understanding and generation, Codexglue (2021)
Google Scholar
Microsoft Code GPT small Java (2021). https://huggingface.co/microsoft/CodeGPT-small-java, Accessed 18 May 2021
Microsoft Code GPT small Java adapted GPT-2 (2021). https://huggingface.co/microsoft/CodeGPT-small-java-adaptedGPT2, Accessed 18 May 2021
Proksch, S., Lerch, J., Mezini, M.: Intelligent code completion with bayesian networks. ACM Trans. Softw. Eng. Methodol. 25(1) (2015). ISSN 1049–331X, https://doi.org/10.1145/2744200
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language Models are Unsupervised Multitask Learners (2019). https://openai.com/blog/better-language-models/
Svyatkovskiy, A., Deng, S.K., Fu, S., Sundaresan, N.: Intellicode compose: code generation using transformer. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020, New York, NY, USA, pp. 1433–1443. Association for Computing Machinery (2020). ISBN 9781450370431, https://doi.org/10.1145/3368089.3417058
Svyatkovskiy, A., Zhao, Y., Fu, S., Sundaresan, N.: Pythia: ai-assisted code completion system. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2019). https://doi.org/10.1145/3292500.3330699
Tabnine. Code faster with AI completions (2021). https://www.tabnine.com/, Accessed 18 May 2021
The AI community building the future (2021). https://huggingface.co/, Accessed 18 May 2021
Vaswani, A., et al.: Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, Red Hook, NY, USA, pp. 6000–6010. Curran Associates Inc. (2017). ISBN 9781510860964
Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 32. Curran Associates Inc. (2019). https://proceedings.neurips.cc/paper/2019/file/dc6a7e655d7e5840e66733e9ee67cc69-Paper.pdf

Download references

Author information

Authors and Affiliations

CWI Software, São Leopoldo, Brazil
Gabriel T. Meyrer & Denis A. Araújo
Applied Computing Graduate Program, UNISINOS, São Leopoldo, Brazil
Sandro J. Rigo

Authors

Gabriel T. Meyrer
View author publications
You can also search for this author in PubMed Google Scholar
Denis A. Araújo
View author publications
You can also search for this author in PubMed Google Scholar
Sandro J. Rigo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gabriel T. Meyrer .

Editor information

Editors and Affiliations

Universidade Federal de Sergipe, São Cristóvão, Brazil
André Britto
Universidade de São Paulo, São Paulo, Brazil
Karina Valdivia Delgado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meyrer, G.T., Araújo, D.A., Rigo, S.J. (2021). Code Autocomplete Using Transformers. In: Britto, A., Valdivia Delgado, K. (eds) Intelligent Systems. BRACIS 2021. Lecture Notes in Computer Science(), vol 13074. Springer, Cham. https://doi.org/10.1007/978-3-030-91699-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-91699-2_15
Published: 28 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91698-5
Online ISBN: 978-3-030-91699-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics