Skip to main content

Statistical Approach to Increase Source Code Completion Accuracy

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10742))

Abstract

Code completion is an essential feature in every IDE’s toolbox, boosting a developer’s productivity and significantly reducing time spent on code exploration. In this paper, we introduce the extension of a typical code completion system. At each point, we construct a list of all possible functions, which are then sorted according to our probabilistic model. We draw our inspiration from natural language processing (NLP). As the foundation, we select the N-gram model, which works on top of abstract syntax tree (AST) nodes. Since our approach is not bound to any other analyses, our model is language-agnostic, and thus, can be applied to any programming language. Experiments on several well-known open source projects show that the described method is sound. It has an execution time comparable to naïve approaches and achieves much more accurate results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://clang.llvm.org/docs/LibTooling.html.

  2. 2.

    https://clang.llvm.org/docs/JSONCompilationDatabase.html.

  3. 3.

    http://www.llvm.org/.

  4. 4.

    https://www.mysql.com/.

  5. 5.

    http://www.opencv.org/.

  6. 6.

    http://www.caffe.berkeleyvision.org/.

References

  1. Stack Overflow developer survey 2016 results. http://stackoverflow.com/research/developer-survey-2016

  2. Amann, S., Proksch, S., Nadi, S., Mezini, M.: A study of visual studio usage in practice. In: 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER), vol. 1, pp. 124–134. IEEE (2016)

    Google Scholar 

  3. Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2009), pp. 213–222. ACM, New York (2009). http://doi.acm.org/10.1145/1595696.1595728

  4. Chen, S.F., Goodman, J.: An empirical study of smoothing techniques for language modeling. In: Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL 1996), pp. 310–318. Association for Computational Linguistics, Stroudsburg (1996). http://dx.doi.org/10.3115/981863.981904

  5. Gao, J., Goodman, J., Miao, J., et al.: The use of clustering techniques for language modeling-application to Asian languages. Comput. Linguist. Chin. Language Process. 6(1), 27–60 (2001)

    Google Scholar 

  6. Goodman, J.T.: A bit of progress in language modeling. Technical report (2001)

    Google Scholar 

  7. Gough, B.: GNU Scientific Library Reference Manual, 3rd edn. Network Theory Ltd., Bristol (2009)

    Google Scholar 

  8. Han, S., Wallace, D.R., Miller, R.C.: Code completion from abbreviated input. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE 2009), pp. 332–343. IEEE Computer Society, Washington (2009). http://dx.doi.org/10.1109/ASE.2009.64

  9. Hindle, A., Barr, E.T., Su, Z., Gabel, M., Devanbu, P.: On the naturalness of software. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), pp. 837–847. IEEE Press, Piscataway (2012). http://dl.acm.org/citation.cfm?id=2337223.2337322

  10. Hou, D., Pletcher, D.M.: An evaluation of the strategies of sorting, filtering, and grouping API methods for code completion. In: ICSM, pp. 233–242. IEEE Computer Society (2011). http://dblp.uni-trier.de/db/conf/icsm/icsm2011.html#HouP11

  11. Hsu, B.J.: Generalized linear interpolation of language models. In: IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2007), pp. 136–140. IEEE (2007)

    Google Scholar 

  12. Jelinek, F., Mercer, R.L.: Interpolated estimation of Markov source parameters from sparse data. In: Gelsema, E.S., Kanal, L.N. (eds.) Proceedings Workshop on Pattern Recognition in Practice, pp. 381–397. North Holland, Amsterdam (1980)

    Google Scholar 

  13. Kennington, C.R., Kay, M., Friedrich, A.: Suffix trees as language models. In: Calzolari, N., Choukri, K., Declerck, T., Doan, M.U., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012). European Language Resources Association (ELRA), Istanbul, May 2012

    Google Scholar 

  14. Klakow, D.: Log-linear interpolation of language models. In: Proceedings of ICSLP 1998, pp. 1695–1698 (1998)

    Google Scholar 

  15. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Detroit, Michigan, vol. 1, pp. 181–184, May 1995

    Google Scholar 

  16. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  17. Murphy, G.C., Kersten, M., Findlater, L.: How are java software developers using the eclipse IDE? IEEE Softw. 23(4), 76–83 (2006). http://dx.doi.org/10.1109/MS.2006.105

  18. Omar, C., Yoon, Y., LaToza, T.D., Myers, B.A.: Active code completion. In: Proceedings of the 34th International Conference on Software Engineering (ICSE 2012), pp. 859–869. IEEE Press, Piscataway (2012). http://dl.acm.org/citation.cfm?id=2337223.2337324

  19. Raychev, V., Vechev, M., Yahav, E.: Code completion with statistical language models. SIGPLAN Not. 49(6), 419–428 (2014). http://doi.acm.org/10.1145/2666356.2594321

  20. Robbes, R., Lanza, M.: How program history can improve code completion. In: Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE 2008), pp. 317–326. IEEE Computer Society, Washington (2008). http://dx.doi.org/10.1109/ASE.2008.42

  21. Sidorov, G.: Syntactic dependency based n-grams in rule based automatic English as second language grammar correction. Int. J. Comput. Linguist. Appl. 4, 169–188 (2013)

    Google Scholar 

  22. Steensgaard, B.: Points-to analysis in almost linear time. In: Proceedings of the 23rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL 1996), pp. 32–41. ACM, New York (1996). http://doi.acm.org/10.1145/237721.237727

  23. Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (Swat 1973) (SWAT 1973), pp. 1–11. IEEE Computer Society, Washington (1973). http://dx.doi.org/10.1109/SWAT.1973.13

  24. Witten, I.H., Bell, T.C.: The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression. IEEE Trans. Inf. Theor. 37(4), 1085–1094 (2006). http://dx.doi.org/10.1109/18.87000

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Valeriy Savchenko or Alexander Volkov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Savchenko, V., Volkov, A. (2018). Statistical Approach to Increase Source Code Completion Accuracy. In: Petrenko, A., Voronkov, A. (eds) Perspectives of System Informatics. PSI 2017. Lecture Notes in Computer Science(), vol 10742. Springer, Cham. https://doi.org/10.1007/978-3-319-74313-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-74313-4_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-74312-7

  • Online ISBN: 978-3-319-74313-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics