Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning

Szalontai, Balázs; Vadász, András; Borsi, Zsolt Richárd; Várkonyi, Teréz A.; Pintér, Balázs; Gregorics, Tibor

doi:10.1007/978-3-030-82193-7_9

Balázs Szalontai¹⁰,
András Vadász¹⁰,
Zsolt Richárd Borsi¹⁰,
Teréz A. Várkonyi¹⁰,
Balázs Pintér¹⁰ &
…
Tibor Gregorics¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 294))

Included in the following conference series:

Proceedings of SAI Intelligent Systems Conference

1544 Accesses

Abstract

We present an algorithm to find nonidiomatic snippets in Python source code and replace them with cleaner and more performant alternatives. The algorithm is divided into three subtasks: (i) the snippets are localized, then for each snippet (ii) the type of the nonidiomatic pattern, and (iii) the key variables are determined. The subtasks of localizing patterns and extracting variables are solved as Sequence Tagging tasks with LSTM networks. Determining the nonidiomatic pattern type is a classification problem, which we solve by training a feedforward neural network. We evaluate the process on a dataset containing more than 13 000 programs coded by students.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Localizing and Idiomatizing Nonidiomatic Python Code with Deep Learning

Idiomatizing Python Source Code Using Different Recurrent Architectures

SparseCoder: Advancing source code analysis with sparse attention and learned token pruning

Article 10 December 2024

Notes

1.
https://www.tiobe.com/tiobe-index/.

References

Aftandilian, E., Sauciuc, R., Priya, S., Krishnan, S.: Building useful program analysis tools using an extensible java compiler. In: 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation, pp. 14–23. IEEE (2012)
Google Scholar
Ahmed, T., Hellendoorn, V., Devanbu, P.T.: Learning lenient parsing & typing via indirect supervision. CoRR, abs/1910.05879 (2019)
Google Scholar
Allamanis, M., Barr, E.T., Devanbu, P., Sutton, C.: A survey of machine learning for big code and naturalness. ACM Comput. Surv. (CSUR) 51(4), 1–37 (2018)
Google Scholar
Danish, M., Allamanis, M., Brockschmidt, M., Rice, A., Orchard, D.: Learning units-of-measure from scientific code. In: 2019 IEEE/ACM 14th International Workshop on Software Engineering for Science (SE4Science), pp. 43–46. IEEE (2019)
Google Scholar
Gupta, R., Pal, S., Kanade, A., Shevade, S.: DeepFix: fixing common C language errors by deep learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Google Scholar
Habib, A., Pradel, M.: Neural bug finding: a study of opportunities and challenges. CoRR, abs/1906.00307 (2019)
Google Scholar
Hellendoorn, V.J., Bird, C., Barr, E.T., Allamanis, M.: Deep type inference. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 152–162 (2018)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kaleeswaran, S., Santhiar, A., Kanade, A., Gulwani, S.: Semi-supervised verified feedback generation. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 739–750 (2016)
Google Scholar
Karampatsis, R.-M., Sutton, C.: Maybe deep neural networks are the best choice for modeling source code. CoRR, abs/1903.05734 (2019)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (Poster) (2015)
Google Scholar
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL-2002 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, USA, vol. 1, pp. 63–70. Association for Computational Linguistics (2002)
Google Scholar
Pradel, M., Sen, K.: DeepBugs: a learning approach to name-based bug detection. Proc. ACM Program. Lang. 2(OOPSLA), 1–25 (2018)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Vasic, M., Kanade, A., Maniatis, P., Bieber, D., Singh, R.: Neural program repair by jointly learning to localize and repair. CoRR, abs/1904.01720 (2019)
Google Scholar
Wong, W.E., Gao, R., Li, Y., Abreu, R., Wotawa, F.: A survey on software fault localization. IEEE Trans. Softw. Eng. 42(8), 707–740 (2016)
Article Google Scholar

Download references

Acknowledgments

EFOP-3.6.3-VEKOP-16-2017-00001: Talent Management in Autonomous Vehicle Control Technologies – The Project is supported by the Hungarian Government and co-financed by the European Social Fund.

We would like to express our great appreciation to László Zsakó and Gyula Horváth for providing an enormous amount of Python codes to test our algorithm on. The data is from the Eötvös Loránd University’s programming exercise bank and submission website.

Author information

Authors and Affiliations

Faculty of Informatics, Department of Software Technology and Methodology, Eötvös Loránd University, Egyetem tér 1-3, Budapest, 1053, Hungary
Balázs Szalontai, András Vadász, Zsolt Richárd Borsi, Teréz A. Várkonyi, Balázs Pintér & Tibor Gregorics

Authors

Balázs Szalontai
View author publications
You can also search for this author in PubMed Google Scholar
András Vadász
View author publications
You can also search for this author in PubMed Google Scholar
Zsolt Richárd Borsi
View author publications
You can also search for this author in PubMed Google Scholar
Teréz A. Várkonyi
View author publications
You can also search for this author in PubMed Google Scholar
Balázs Pintér
View author publications
You can also search for this author in PubMed Google Scholar
Tibor Gregorics
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Balázs Szalontai .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

A Appendix

Table 5. Pattern of maximum search with the name of the list, and the name of the maximum value.

Full size table

Table 6. Pattern of linear search with the name of the list, and the name of the boolean return value.

Full size table

Table 7. Pattern of summation with the name of the list, and the name of the sum value.

Full size table

Table 8. Pattern of all with the name of the list, and the name of the boolean return value.

Full size table

Table 9. Pattern of any with the name of the list, and the name of the Boolean return value.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szalontai, B., Vadász, A., Borsi, Z.R., Várkonyi, T.A., Pintér, B., Gregorics, T. (2022). Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2021. Lecture Notes in Networks and Systems, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-030-82193-7_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-82193-7_9
Published: 04 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82192-0
Online ISBN: 978-3-030-82193-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Localizing and Idiomatizing Nonidiomatic Python Code with Deep Learning

Idiomatizing Python Source Code Using Different Recurrent Architectures

SparseCoder: Advancing source code analysis with sparse attention and learned token pruning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Detecting and Fixing Nonidiomatic Snippets in Python Source Code with Deep Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Localizing and Idiomatizing Nonidiomatic Python Code with Deep Learning

Idiomatizing Python Source Code Using Different Recurrent Architectures

SparseCoder: Advancing source code analysis with sparse attention and learned token pruning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation