skip to main content
10.1145/3340482.3342746acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

A machine learning based automatic folding of dynamically typed languages

Published: 27 August 2019 Publication History

Abstract

The popularity of dynamically typed languages has been growing strongly lately. Elegant syntax of such languages like javascript, python, PHP and ruby pays back when it comes to finding bugs in large codebases. The analysis is hindered by specific capabilities of dynamically typed languages, such as defining methods dynamically and evaluating string expressions. For finding bugs or investigating unfamiliar classes and libraries in modern IDEs and text editors features for folding unimportant code blocks are implemented. In this work, data on user foldings from real projects were collected and two classifiers were trained on their basis. The input to the classifier is a set of parameters describing the structure and syntax of the code block. These classifiers were subsequently used to identify unimportant code fragments. The implemented approach was tested on JavaScript and Python programs and compared with the best existing algorithm for automatic code folding.

References

[1]
Laurence Tratt. Dynamically typed languages. Advances in Computers, 77:149– 184, 2009.
[2]
Magnus Madsen. Static Analysis of Dynamic Languages. PhD thesis, Department of Computer Science, Aarhus University, 2015.
[3]
Uri Alon, Omer Levy, and Eran Yahav. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400, 2018.
[4]
Jian Li, Yue Wang, Michael R Lyu, and Irwin King. Code completion with neural attention and pointer networks. arXiv preprint arXiv:1711.09573, 2017.
[5]
Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation. In Advances in Neural Information Processing Systems, pages 2547–2557, 2018.
[6]
Pavol Bielik, Veselin Raychev, and Martin Vechev. Learning a static analyzer from data. In International Conference on Computer Aided Verification, pages 233–253. Springer, 2017.
[7]
Rudy Bunel, Alban Desmaison, M Pawan Kumar, Philip HS Torr, and Pushmeet Kohli. Learning to superoptimize programs. arXiv preprint arXiv:1611.01787, 2016.
[8]
Niccolò Marastoni, Roberto Giacobazzi, and Mila Dalla Preda. A deep learning approach to program similarity. In Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis, pages 26–35. ACM, 2018.
[9]
Qingying Chen and Minghui Zhou. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pages 826–831. ACM, 2018.
[10]
Language server protocol. https://microsoft.github.io/language-server-protocol/.
[11]
Jamie Starke, Chris Luce, and Jonathan Sillito. Searching and skimming: An exploratory study. In 2009 IEEE International Conference on Software Maintenance, pages 157–166. IEEE, 2009.
[12]
Mik Kersten and Gail C Murphy. Mylar: a degree-of-interest model for ides. In Proceedings of the 4th international conference on Aspect-oriented software development, pages 159–168. ACM, 2005.
[13]
Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. Compilers, principles, techniques. Addison wesley, 7(8):9, 1986.
[14]
Intellij platform sdk devguide: Psi elements. https://www.jetbrains.org/intellij/ sdk/docs/basics/architectural_overview/psi_elements.html.
[15]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
[16]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):40, 2019.
[17]
Abhinav Jangda and Gaurav Anand. Predicting variable types in dynamically typed programming languages. arXiv preprint arXiv:1901.05138, 2019.
[18]
Phong Le and Willem Zuidema. The inside-outside recursive neural network model for dependency parsing. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 729–739, 2014.
[19]
Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, and Charles Sutton. Autofolding for source code summarization. IEEE Transactions on Software Engineering, 43(12):1095–1109, 2017.
[20]
Xi Qiu and Christopher Stewart. Topic words analysis based on lda model. arXiv preprint arXiv:1405.3726, 2014.
[21]
Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, pages 1–43, 2018.
[22]
Jonathon Shlens. Notes on kullback-leibler divergence and likelihood. arXiv preprint arXiv:1404.2000, 2014.
[23]
Intellij-community: Foldingdescriptor. https://github.com/JetBrains/intellijcommunity/blob/master/platform/core-api/src/com/intellij/lang/folding/ FoldingDescriptor.java.
[24]
Simple classification example with missing feature handling and parameter tuning. https://github.com/catboost/tutorials/blob/ dd481065adfcae83de36e05bca79ab9af1ca0a72/classification/classification_ with_parameter_tuning_tutorial.ipynb.
[25]
Alexander Terenin, Daniel Simpson, and David Draper. Asynchronous gibbs sampling. arXiv preprint arXiv:1509.08999, 2015.

Cited By

View all
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MaLTeSQuE 2019: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation
August 2019
42 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Abstract Syntax tree
  2. Automatic Folding
  3. Dynamically typed languages
  4. JavaScript
  5. Python
  6. Source code analysis

Qualifiers

  • Research-article

Conference

ESEC/FSE '19
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 20 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media