research-article

A machine learning based automatic folding of dynamically typed languages

Authors:

Nickolay Viuginov,

Andrey FilchenkovAuthors Info & Claims

MaLTeSQuE 2019: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation

Pages 31 - 36

https://doi.org/10.1145/3340482.3342746

Published: 27 August 2019 Publication History

Abstract

The popularity of dynamically typed languages has been growing strongly lately. Elegant syntax of such languages like javascript, python, PHP and ruby pays back when it comes to finding bugs in large codebases. The analysis is hindered by specific capabilities of dynamically typed languages, such as defining methods dynamically and evaluating string expressions. For finding bugs or investigating unfamiliar classes and libraries in modern IDEs and text editors features for folding unimportant code blocks are implemented. In this work, data on user foldings from real projects were collected and two classifiers were trained on their basis. The input to the classifier is a set of parameters describing the structure and syntax of the code block. These classifiers were subsequently used to identify unimportant code fragments. The implemented approach was tested on JavaScript and Python programs and compared with the best existing algorithm for automatic code folding.

References

[1]

Laurence Tratt. Dynamically typed languages. Advances in Computers, 77:149– 184, 2009.

[2]

Magnus Madsen. Static Analysis of Dynamic Languages. PhD thesis, Department of Computer Science, Aarhus University, 2015.

[3]

Uri Alon, Omer Levy, and Eran Yahav. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400, 2018.

[4]

Jian Li, Yue Wang, Michael R Lyu, and Irwin King. Code completion with neural attention and pointer networks. arXiv preprint arXiv:1711.09573, 2017.

Digital Library

[5]

Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation. In Advances in Neural Information Processing Systems, pages 2547–2557, 2018.

Digital Library

[6]

Pavol Bielik, Veselin Raychev, and Martin Vechev. Learning a static analyzer from data. In International Conference on Computer Aided Verification, pages 233–253. Springer, 2017.

[7]

Rudy Bunel, Alban Desmaison, M Pawan Kumar, Philip HS Torr, and Pushmeet Kohli. Learning to superoptimize programs. arXiv preprint arXiv:1611.01787, 2016.

[8]

Niccolò Marastoni, Roberto Giacobazzi, and Mila Dalla Preda. A deep learning approach to program similarity. In Proceedings of the 1st International Workshop on Machine Learning and Software Engineering in Symbiosis, pages 26–35. ACM, 2018.

Digital Library

[9]

Qingying Chen and Minghui Zhou. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pages 826–831. ACM, 2018.

Digital Library

[10]

Language server protocol. https://microsoft.github.io/language-server-protocol/.

[11]

Jamie Starke, Chris Luce, and Jonathan Sillito. Searching and skimming: An exploratory study. In 2009 IEEE International Conference on Software Maintenance, pages 157–166. IEEE, 2009.

[12]

Mik Kersten and Gail C Murphy. Mylar: a degree-of-interest model for ides. In Proceedings of the 4th international conference on Aspect-oriented software development, pages 159–168. ACM, 2005.

Digital Library

[13]

Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. Compilers, principles, techniques. Addison wesley, 7(8):9, 1986.

Digital Library

[14]

Intellij platform sdk devguide: Psi elements. https://www.jetbrains.org/intellij/ sdk/docs/basics/architectural_overview/psi_elements.html.

[15]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.

[16]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages, 3(POPL):40, 2019.

Digital Library

[17]

Abhinav Jangda and Gaurav Anand. Predicting variable types in dynamically typed programming languages. arXiv preprint arXiv:1901.05138, 2019.

[18]

Phong Le and Willem Zuidema. The inside-outside recursive neural network model for dependency parsing. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 729–739, 2014.

[19]

Jaroslav Fowkes, Pankajan Chanthirasegaran, Razvan Ranca, Miltiadis Allamanis, Mirella Lapata, and Charles Sutton. Autofolding for source code summarization. IEEE Transactions on Software Engineering, 43(12):1095–1109, 2017.

Digital Library

[20]

Xi Qiu and Christopher Stewart. Topic words analysis based on lda model. arXiv preprint arXiv:1405.3726, 2014.

[21]

Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, and Liang Zhao. Latent dirichlet allocation (lda) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, pages 1–43, 2018.

Digital Library

[22]

Jonathon Shlens. Notes on kullback-leibler divergence and likelihood. arXiv preprint arXiv:1404.2000, 2014.

[23]

Intellij-community: Foldingdescriptor. https://github.com/JetBrains/intellijcommunity/blob/master/platform/core-api/src/com/intellij/lang/folding/ FoldingDescriptor.java.

[24]

Simple classification example with missing feature handling and parameter tuning. https://github.com/catboost/tutorials/blob/ dd481065adfcae83de36e05bca79ab9af1ca0a72/classification/classification_ with_parameter_tuning_tutorial.ipynb.

[25]

Alexander Terenin, Daniel Simpson, and David Draper. Asynchronous gibbs sampling. arXiv preprint arXiv:1509.08999, 2015.

Cited By

Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934

Index Terms

A machine learning based automatic folding of dynamically typed languages
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Software and its engineering
  1. Software notations and tools
    1. Development frameworks and environments
      1. Integrated and visual development environments
    2. Formal language definitions
      1. Syntax

Recommendations

Adding dynamically-typed language support to a statically-typed language compiler: performance evaluation, analysis, and tradeoffs
VEE '12

Applications written in dynamically typed scripting languages are increasingly popular for Web software development. Even on the server side, programmers are using dynamically typed scripting languages such as Ruby and Python to build complex ...
Guest Editors' Introduction: Dynamically Typed Languages

The languages discussed in this special issue have a long history, which is perhaps why some have had several different names over the years. One such language is Lisp, the second-oldest programming language. For years, many somewhat dismissively ...
On the benefits and pitfalls of extending a statically typed language JIT compiler for dynamic scripting languages
OOPSLA '12

Whenever the need to compile a new dynamically typed language arises, an appealing option is to repurpose an existing statically typed language Just-In-Time (JIT) compiler (repurposed JIT compiler). Existing repurposed JIT compilers (RJIT compilers), ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MaLTeSQuE 2019: Proceedings of the 3rd ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation

August 2019

42 pages

ISBN:9781450368551

DOI:10.1145/3340482

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGSOFT: ACM Special Interest Group on Software Engineering

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESEC/FSE '19

Sponsor:

SIGSOFT

ESEC/FSE '19: 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

August 27, 2019

Tallinn, Estonia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
143
Total Downloads

Downloads (Last 12 months)6
Downloads (Last 6 weeks)1

Reflects downloads up to 20 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sharma TKechagia MGeorgiou STiwari RVats IMoazen HSarro F(2024)A survey on machine learning techniques applied to source codeJournal of Systems and Software10.1016/j.jss.2023.111934209:COnline publication date: 14-Mar-2024
https://dl.acm.org/doi/10.1016/j.jss.2023.111934

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents