Abstract
We present an algorithm to find nonidiomatic snippets in Python source code and replace them with cleaner and more performant alternatives. The algorithm is divided into three subtasks: (i) the snippets are localized, then for each snippet (ii) the type of the nonidiomatic pattern, and (iii) the key variables are determined. The subtasks of localizing patterns and extracting variables are solved as Sequence Tagging tasks with LSTM networks. Determining the nonidiomatic pattern type is a classification problem, which we solve by training a feedforward neural network. We evaluate the process on a dataset containing more than 13 000 programs coded by students.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aftandilian, E., Sauciuc, R., Priya, S., Krishnan, S.: Building useful program analysis tools using an extensible java compiler. In: 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, pp. 14–23. IEEE (2012)
Ahmed, T., Hellendoorn, V., Devanbu, P.T.: Learning lenient parsing & typing via indirect supervision. CoRR, abs/1910.05879 (2019)
Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)
Danish, M., Allamanis, M., Brockschmidt, M., Rice, A., Orchard, D.: Learning units-of-measure from scientific code. In: 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science (SE4Science), pp. 43–46. IEEE (2019)
Gupta, R., Pal, S., Kanade, A., Shevade, S.: DeepFix: fixing common C language errors by deep learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Habib, A., Pradel, M.: Neural bug finding: a study of opportunities and challenges. CoRR, abs/1906.00307 (2019)
Hellendoorn, V.J., Bird, C., Barr, E.T., Allamanis, M.: Deep type inference. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 152–162 (2018)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kaleeswaran, S., Santhiar, A., Kanade, A., Gulwani, S.: Semi-supervised verified feedback generation. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 739–750 (2016)
Karampatsis, R.-M., Sutton, C.: Maybe deep neural networks are the best choice for modeling source code. CoRR, abs/1903.05734 (2019)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, USA, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)
Pradel, M., Sen, K.: DeepBugs: a learning approach to name-based bug detection. Proc. ACM Program. Lang. 2(OOPSLA), 1–25 (2018)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Vasic, M., Kanade, A., Maniatis, P., Bieber, D., Singh, R.: Neural program repair by jointly learning to localize and repair. CoRR, abs/1904.01720 (2019)
Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw. Eng. 42(8), 707–740 (2016)
Acknowledgments
EFOP-3.6.3-VEKOP-16-2017-00001: Talent Management in Autonomous Vehicle Control Technologies – The Project is supported by the Hungarian Government and co-financed by the European Social Fund.
We would like to express our great appreciation to László Zsakó and Gyula Horváth for providing an enormous amount of Python codes to test our algorithm on. The data is from the Eötvös Loránd University’s programming exercise bank and submission website.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Appendix
A Appendix
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Szalontai, B., Vadász, A., Borsi, Z.R., Várkonyi, T.A., Pintér, B., Gregorics, T. (2022). Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-82193-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-82193-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82192-0
Online ISBN: 978-3-030-82193-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)