Statistical Approach to Increase Source Code Completion Accuracy

Savchenko, Valeriy; Volkov, Alexander

doi:10.1007/978-3-319-74313-4_25

Statistical Approach to Increase Source Code Completion Accuracy

Valeriy Savchenko¹⁵ &
Alexander Volkov¹⁶

Conference paper
First Online: 18 January 2018

1634 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10742))

Abstract

Code completion is an essential feature in every IDE’s toolbox, boosting a developer’s productivity and significantly reducing time spent on code exploration. In this paper, we introduce the extension of a typical code completion system. At each point, we construct a list of all possible functions, which are then sorted according to our probabilistic model. We draw our inspiration from natural language processing (NLP). As the foundation, we select the N-gram model, which works on top of abstract syntax tree (AST) nodes. Since our approach is not bound to any other analyses, our model is language-agnostic, and thus, can be applied to any programming language. Experiments on several well-known open source projects show that the described method is sound. It has an execution time comparable to naïve approaches and achieves much more accurate results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Stack Overflow developer survey 2016 results. http://stackoverflow.com/research/developer-survey-2016
Amann, S., Proksch, S., Nadi, S., Mezini, M.: A study of visual studio usage in practice. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 124–134. IEEE (2016)
Google Scholar
Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2009), pp. 213–222. ACM, New York (2009). http://doi.acm.org/10.1145/1595696.1595728
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 310–318. Association for Computational Linguistics, Stroudsburg (1996). http://dx.doi.org/10.3115/981863.981904
Gao, J., Goodman, J., Miao, J., et al.: The use of clustering techniques for language modeling-application to Asian languages. Comput. Linguist. Chin. Language Process. 6(1), 27–60 (2001)
Google Scholar
Goodman, J.T.: A bit of progress in language modeling. Technical report (2001)
Google Scholar
Gough, B.: GNU Scientific Library Reference Manual, 3rd edn. Network Theory Ltd., Bristol (2009)
Google Scholar
Han, S., Wallace, D.R., Miller, R.C.: Code completion from abbreviated input. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE 2009), pp. 332–343. IEEE Computer Society, Washington (2009). http://dx.doi.org/10.1109/ASE.2009.64
Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), pp. 837–847. IEEE Press, Piscataway (2012). http://dl.acm.org/citation.cfm?id=2337223.2337322
Hou, D., Pletcher, D.M.: An evaluation of the strategies of sorting, filtering, and grouping API methods for code completion. In: ICSM, pp. 233–242. IEEE Computer Society (2011). http://dblp.uni-trier.de/db/conf/icsm/icsm2011.html#HouP11
Hsu, B.J.: Generalized linear interpolation of language models. In: IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2007), pp. 136–140. IEEE (2007)
Google Scholar
Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Gelsema, E.S., Kanal, L.N. (eds.) Proceedings Workshop on Pattern Recognition in Practice, pp. 381–397. North Holland, Amsterdam (1980)
Google Scholar
Kennington, C.R., Kay, M., Friedrich, A.: Suffix trees as language models. In: Calzolari, N., Choukri, K., Declerck, T., Doan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, May 2012
Google Scholar
Klakow, D.: Log-linear interpolation of language models. In: Proceedings of ICSLP 1998, pp. 1695–1698 (1998)
Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Detroit, Michigan, vol. 1, pp. 181–184, May 1995
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Murphy, G.C., Kersten, M., Findlater, L.: How are java software developers using the eclipse IDE? IEEE Softw. 23(4), 76–83 (2006). http://dx.doi.org/10.1109/MS.2006.105
Omar, C., Yoon, Y., LaToza, T.D., Myers, B.A.: Active code completion. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), pp. 859–869. IEEE Press, Piscataway (2012). http://dl.acm.org/citation.cfm?id=2337223.2337324
Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. SIGPLAN Not. 49(6), 419–428 (2014). http://doi.acm.org/10.1145/2666356.2594321
Robbes, R., Lanza, M.: How program history can improve code completion. In: Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008), pp. 317–326. IEEE Computer Society, Washington (2008). http://dx.doi.org/10.1109/ASE.2008.42
Sidorov, G.: Syntactic dependency based n-grams in rule based automatic English as second language grammar correction. Int. J. Comput. Linguist. Appl. 4, 169–188 (2013)
Google Scholar
Steensgaard, B.: Points-to analysis in almost linear time. In: Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 1996), pp. 32–41. ACM, New York (1996). http://doi.acm.org/10.1145/237721.237727
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (Swat 1973) (SWAT 1973), pp. 1–11. IEEE Computer Society, Washington (1973). http://dx.doi.org/10.1109/SWAT.1973.13
Witten, I.H., Bell, T.C.: The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inf. Theor. 37(4), 1085–1094 (2006). http://dx.doi.org/10.1109/18.87000

Download references

Author information

Authors and Affiliations

Institute for System Programming of the Russian Academy of Sciences, 25, Alexander Solzhenitsyn street, Moscow, 109004, Russian Federation
Valeriy Savchenko
Lomonosov Moscow State University, GSP-1, Leninskie Gory, Moscow, 119991, Russian Federation
Alexander Volkov

Authors

Valeriy Savchenko
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Volkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Valeriy Savchenko or Alexander Volkov .

Editor information

Editors and Affiliations

Ivannikov Institute for System Programming of RAS, Moscow, Russia
Alexander K. Petrenko
The University of Manchester, Manchester, United Kingdom
Andrei Voronkov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Savchenko, V., Volkov, A. (2018). Statistical Approach to Increase Source Code Completion Accuracy. In: Petrenko, A., Voronkov, A. (eds) Perspectives of System Informatics. PSI 2017. Lecture Notes in Computer Science(), vol 10742. Springer, Cham. https://doi.org/10.1007/978-3-319-74313-4_25

Download citation

DOI: https://doi.org/10.1007/978-3-319-74313-4_25
Published: 18 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74312-7
Online ISBN: 978-3-319-74313-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics