Skip to main content

Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 294))

Included in the following conference series:

  • 1544 Accesses

Abstract

We present an algorithm to find nonidiomatic snippets in Python source code and replace them with cleaner and more performant alternatives. The algorithm is divided into three subtasks: (i) the snippets are localized, then for each snippet (ii) the type of the nonidiomatic pattern, and (iii) the key variables are determined. The subtasks of localizing patterns and extracting variables are solved as Sequence Tagging tasks with LSTM networks. Determining the nonidiomatic pattern type is a classification problem, which we solve by training a feedforward neural network. We evaluate the process on a dataset containing more than 13 000 programs coded by students.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.tiobe.com/tiobe-index/.

References

  1. Aftandilian, E., Sauciuc, R., Priya, S., Krishnan, S.: Building useful program analysis tools using an extensible java compiler. In: 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, pp. 14–23. IEEE (2012)

    Google Scholar 

  2. Ahmed, T., Hellendoorn, V., Devanbu, P.T.: Learning lenient parsing & typing via indirect supervision. CoRR, abs/1910.05879 (2019)

    Google Scholar 

  3. Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)

    Google Scholar 

  4. Danish, M., Allamanis, M., Brockschmidt, M., Rice, A., Orchard, D.: Learning units-of-measure from scientific code. In: 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science (SE4Science), pp. 43–46. IEEE (2019)

    Google Scholar 

  5. Gupta, R., Pal, S., Kanade, A., Shevade, S.: DeepFix: fixing common C language errors by deep learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  6. Habib, A., Pradel, M.: Neural bug finding: a study of opportunities and challenges. CoRR, abs/1906.00307 (2019)

    Google Scholar 

  7. Hellendoorn, V.J., Bird, C., Barr, E.T., Allamanis, M.: Deep type inference. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 152–162 (2018)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Kaleeswaran, S., Santhiar, A., Kanade, A., Gulwani, S.: Semi-supervised verified feedback generation. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 739–750 (2016)

    Google Scholar 

  10. Karampatsis, R.-M., Sutton, C.: Maybe deep neural networks are the best choice for modeling source code. CoRR, abs/1903.05734 (2019)

    Google Scholar 

  11. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)

    Google Scholar 

  12. Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, USA, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)

    Google Scholar 

  13. Pradel, M., Sen, K.: DeepBugs: a learning approach to name-based bug detection. Proc. ACM Program. Lang. 2(OOPSLA), 1–25 (2018)

    Google Scholar 

  14. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  15. Vasic, M., Kanade, A., Maniatis, P., Bieber, D., Singh, R.: Neural program repair by jointly learning to localize and repair. CoRR, abs/1904.01720 (2019)

    Google Scholar 

  16. Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw. Eng. 42(8), 707–740 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

EFOP-3.6.3-VEKOP-16-2017-00001: Talent Management in Autonomous Vehicle Control Technologies – The Project is supported by the Hungarian Government and co-financed by the European Social Fund.

We would like to express our great appreciation to László Zsakó and Gyula Horváth for providing an enormous amount of Python codes to test our algorithm on. The data is from the Eötvös Loránd University’s programming exercise bank and submission website.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Balázs Szalontai .

Editor information

Editors and Affiliations

A Appendix

A Appendix

Table 5. Pattern of maximum search with the name of the list, and the name of the maximum value.
Table 6. Pattern of linear search with the name of the list, and the name of the boolean return value.
Table 7. Pattern of summation with the name of the list, and the name of the sum value.
Table 8. Pattern of all with the name of the list, and the name of the boolean return value.
Table 9. Pattern of any with the name of the list, and the name of the Boolean return value.
Fig. 6.
figure 6

An example exercise.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Szalontai, B., Vadász, A., Borsi, Z.R., Várkonyi, T.A., Pintér, B., Gregorics, T. (2022). Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-82193-7_9

Download citation

Publish with us

Policies and ethics