Abstract
Code completion is an essential feature in every IDE’s toolbox, boosting a developer’s productivity and significantly reducing time spent on code exploration. In this paper, we introduce the extension of a typical code completion system. At each point, we construct a list of all possible functions, which are then sorted according to our probabilistic model. We draw our inspiration from natural language processing (NLP). As the foundation, we select the N-gram model, which works on top of abstract syntax tree (AST) nodes. Since our approach is not bound to any other analyses, our model is language-agnostic, and thus, can be applied to any programming language. Experiments on several well-known open source projects show that the described method is sound. It has an execution time comparable to naïve approaches and achieves much more accurate results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Stack Overflow developer survey 2016 results. http://stackoverflow.com/research/developer-survey-2016
Amann, S., Proksch, S., Nadi, S., Mezini, M.: A study of visual studio usage in practice. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 124–134. IEEE (2016)
Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2009), pp. 213–222. ACM, New York (2009). http://doi.acm.org/10.1145/1595696.1595728
Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 310–318. Association for Computational Linguistics, Stroudsburg (1996). http://dx.doi.org/10.3115/981863.981904
Gao, J., Goodman, J., Miao, J., et al.: The use of clustering techniques for language modeling-application to Asian languages. Comput. Linguist. Chin. Language Process. 6(1), 27–60 (2001)
Goodman, J.T.: A bit of progress in language modeling. Technical report (2001)
Gough, B.: GNU Scientific Library Reference Manual, 3rd edn. Network Theory Ltd., Bristol (2009)
Han, S., Wallace, D.R., Miller, R.C.: Code completion from abbreviated input. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE 2009), pp. 332–343. IEEE Computer Society, Washington (2009). http://dx.doi.org/10.1109/ASE.2009.64
Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), pp. 837–847. IEEE Press, Piscataway (2012). http://dl.acm.org/citation.cfm?id=2337223.2337322
Hou, D., Pletcher, D.M.: An evaluation of the strategies of sorting, filtering, and grouping API methods for code completion. In: ICSM, pp. 233–242. IEEE Computer Society (2011). http://dblp.uni-trier.de/db/conf/icsm/icsm2011.html#HouP11
Hsu, B.J.: Generalized linear interpolation of language models. In: IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2007), pp. 136–140. IEEE (2007)
Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Gelsema, E.S., Kanal, L.N. (eds.) Proceedings Workshop on Pattern Recognition in Practice, pp. 381–397. North Holland, Amsterdam (1980)
Kennington, C.R., Kay, M., Friedrich, A.: Suffix trees as language models. In: Calzolari, N., Choukri, K., Declerck, T., Doan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, May 2012
Klakow, D.: Log-linear interpolation of language models. In: Proceedings of ICSLP 1998, pp. 1695–1698 (1998)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Detroit, Michigan, vol. 1, pp. 181–184, May 1995
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Murphy, G.C., Kersten, M., Findlater, L.: How are java software developers using the eclipse IDE? IEEE Softw. 23(4), 76–83 (2006). http://dx.doi.org/10.1109/MS.2006.105
Omar, C., Yoon, Y., LaToza, T.D., Myers, B.A.: Active code completion. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), pp. 859–869. IEEE Press, Piscataway (2012). http://dl.acm.org/citation.cfm?id=2337223.2337324
Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. SIGPLAN Not. 49(6), 419–428 (2014). http://doi.acm.org/10.1145/2666356.2594321
Robbes, R., Lanza, M.: How program history can improve code completion. In: Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008), pp. 317–326. IEEE Computer Society, Washington (2008). http://dx.doi.org/10.1109/ASE.2008.42
Sidorov, G.: Syntactic dependency based n-grams in rule based automatic English as second language grammar correction. Int. J. Comput. Linguist. Appl. 4, 169–188 (2013)
Steensgaard, B.: Points-to analysis in almost linear time. In: Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 1996), pp. 32–41. ACM, New York (1996). http://doi.acm.org/10.1145/237721.237727
Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (Swat 1973) (SWAT 1973), pp. 1–11. IEEE Computer Society, Washington (1973). http://dx.doi.org/10.1109/SWAT.1973.13
Witten, I.H., Bell, T.C.: The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inf. Theor. 37(4), 1085–1094 (2006). http://dx.doi.org/10.1109/18.87000
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Savchenko, V., Volkov, A. (2018). Statistical Approach to Increase Source Code Completion Accuracy. In: Petrenko, A., Voronkov, A. (eds) Perspectives of System Informatics. PSI 2017. Lecture Notes in Computer Science(), vol 10742. Springer, Cham. https://doi.org/10.1007/978-3-319-74313-4_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-74313-4_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74312-7
Online ISBN: 978-3-319-74313-4
eBook Packages: Computer ScienceComputer Science (R0)