Abstract
Automatic code completion is one of the most useful features provided by advanced IDEs. Argument recommendation, as a special kind of code completion, is widely used as well. While existing approaches focus on argument recommendation for popular APIs, a large number of non-API invocations are requesting for accurate argument recommendation as well. To this end, we propose an LSTM-based approach to recommending non-API arguments instantly when method calls are typed in. With data collected from a large corpus of open-source applications, we train an LSTM neural network to recommend actual arguments based on identifiers of the invoked method, the corresponding formal parameter, and a list of syntactically correct candidate arguments. To feed these identifiers into the LSTM neural network, we convert them into fixed-length vectors by Paragraph Vector, an unsupervised neural network based learning algorithm. With the resulting LSTM neural network trained on sample applications, for a given call site we can predict which of the candidate arguments is more likely to be the correct one. We evaluate the proposed approach with tenfold validation on 85 open-source C applications. Results suggest that the proposed approach outperforms the state-of-the-art approaches in recommending non-API arguments. It improves the precision significantly from 71.46% to 83.37%.
Similar content being viewed by others
References
Robillard M, Walker R, Zimmermann T. Recommendation systems for software engineering. IEEE Softw, 2010, 27: 80–86
Murphy G C, Kersten M, Findlater L. How are Java software developers using the Eclipse IDE? IEEE Softw, 2006, 23: 76–83
Liu H, Liu Q, Staicu C A, et al. Nomen est omen: exploring and exploiting similarities between argument and parameter names. In: Proceedings of the 38th International Conference on Software Engineering. New York: ACM, 2016. 1063–1073
Zhang C, Yang J, Zhang Y, et al. Automatic parameter recommendation for practical API usage. In: Proceedings of the 2012 International Conference on Software Engineering. Piscataway: IEEE Press, 2012. 826–836
Asaduzzaman M, Roy C K, Monir S, et al. Exploring API method parameter recommendations. In: Proceedings of 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2015. 271–280
Raychev V, Vechev M, Yahav E. Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM, 2014. 419–428
Hellendoorn V J, Devanbu P. Are deep neural networks the best choice for modeling source code? In: Proceedings of Joint Meeting on Foundations of Software Engineering, 2017. 763–773
Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning, Beijing, 2014. 1188–1196
Kuoa R J, Chen Z Y, Tien F C. Integration of particle swarm optimization and genetic algorithm for dynamic clustering. Inf Sci, 2012, 195: 124–140
Pradel M, Sen K. Deepbugs: a learning approach to name-based bug detection. In: Proceedings of the ACM on Programming Languages, 2018. 1–25
Liu H, Xu Z, Zou Y. Deep learning based feature envy detection. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. New York: ACM, 2018. 385–396
Liu H, Jin J, Xu Z, et al. Deep learning based code smell detection. IEEE Trans Softw Eng, 2019. doi: https://doi.org/10.1109/TSE.2019.2936376
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput, 1997, 9: 1735–1780
Wu D, Chi M. Long short-term memory with quadratic connections in recursive neural networks for representing compositional semantics. IEEE Access, 2017, 5: 16077–16083
Theano Development Team. Theano: a Python framework for fast computation of mathematical expressions. 2016. ArXiv: 1605.02688
Sears A, Shneiderman B. Split menus: effectively using selection frequency to organize menus. ACM Trans Comput-Human Interaction, 1994, 1: 27–51
Butler S, Wermelinger M, Yu Y, et al. Improving the tokenisation of identifier names. In: Proceedings of European Conference on Object-Oriented Programming. Berlin: Springer, 2011. 130–154
Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space. 2013. ArXiv: 1301.3781
Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, 2014. 1532–1543
Joulin A, Grave E, Bojanowski P, et al. Fasttext: compressing text classification models. 2016. ArXiv: 1612.03651
Hindle A, Barr E T, Gabel M, et al. On the naturalness of software. Commun ACM, 2016, 59: 122–131
Tu Z, Su Z, Devanbu P. On the localness of software. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 269–280
Allamanis M, Barr E T, Bird C, et al. Learning natural coding conventions. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. New York: ACM, 2014. 281–293
Allamanis M, Barr E T, Bird C, et al. Suggesting accurate method and class names. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. New York: ACM, 2015. 38–49
Raychev V, Vechev M, Krause A. Predicting program properties from “big code”. In: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. New York: ACM, 2015. 111–124
Lafferty J, McCallum A, Pereira F C. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. New York: ACM, 2001. 282–289
White M, Vendome C, Linares-Vásquez M, et al. Toward deep learning software repositories. In: Proceedings of the 12th Working Conference on Mining Software Repositories. Piscataway: IEEE Press, 2015. 334–345
Murali V, Qi L, Chaudhuri S, et al. Neural sketch learning for conditional program generation. 2017. ArXiv: 1703.05698
Wang K, Singh R, Su Z. Dynamic neural program embedding for program repair. 2017. ArXiv: 1711.07163
Harris Z S. Distributional structure. Word, 1954, 10: 146–162
Acknowledgements
The work was supported by National Natural Science Foundation of China (Grant Nos. 61772071, 61690205, 61832009) and National Key R&D Program (Grant Nos. 2018YFB1003904).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, G., Liu, H., Li, G. et al. LSTM-based argument recommendation for non-API methods. Sci. China Inf. Sci. 63, 190101 (2020). https://doi.org/10.1007/s11432-019-2830-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-019-2830-8