research-article

Represent Code as Action Sequence for Predicting Next Method Call

Authors:
Yu Jiang

Nanjing University, China

Nanjing University, China
View Profile

,
Liang Wang

Nanjing University, China

Nanjing University, China
View Profile

,
Hao Hu

Nanjing University, China

Nanjing University, China
View Profile

,
Xianping Tao

Nanjing University, China

Nanjing University, China
View Profile

Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on InternetwareJune 2022Pages 45–54https://doi.org/10.1145/3545258.3545263

Published:15 September 2022Publication History

Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on Internetware

Pages 45–54

ABSTRACT

As human beings take actions with a goal in mind, we could predict the following action of a person depending on his previous actions. Inspired by this, after collecting and analyzing more than 13,000 repositories with 441,290 Python source code files from the Internet, we find the actions expressed in code are in the developers’ high-level programming language statements.

Previous code comprehension and code completion research paid little attention to code editing contexts like code file names and repository names while representing code for machine learning models. After modeling code as action sequences and modeling method names, file names and repository names as code editing context, we use modern natural language processing techniques to utilize the huge open source resources from the Internet and train a code completion model which takes the action sequences in code as input to complete code for developers.

In the evaluation part, the experiments we conduct show the GPT-2 model trained with our action sequence code representation achieves 81.92% top-5 accuracy for next method call token prediction, compared to 61.89% of same GPT-2 model trained with same dataset. As for the context of the code we propose, we find it important for machines to comprehend the code better. Given the pre-trained natural language model, the training time of our model for 1,000,000 lines code is less than 16.7 minutes. All the above contribute to code comprehension and enhance code completion via unlimited resources from the Internet.

References

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 38–49.Google ScholarDigital Library
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–37.Google ScholarDigital Library
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International conference on machine learning. PMLR, 2091–2100.Google Scholar
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices 53, 4 (2018), 404–419.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google Scholar
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929(2020).Google Scholar
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 631–642.Google ScholarDigital Library
Abram Hindle, Earl T Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131.Google ScholarDigital Library
Michael Hucka. 2018. Spiral: splitters for identifiers in source code files. Journal of Open Source Software 3, 24 (2018), 653. https://doi.org/10.21105/joss.00653Google ScholarCross Ref
Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436(2019).Google Scholar
Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big code!= big vocabulary: Open-vocabulary models for source code. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 1073–1085.Google ScholarDigital Library
Triet HM Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.Google ScholarDigital Library
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A neural model for generating natural language summaries of program subroutines. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 795–806.Google ScholarDigital Library
Fang Liu, Ge Li, Bolin Wei, Xin Xia, Zhiyi Fu, and Zhi Jin. 2020. A self-attentional neural architecture for code completion with multi-task learning. In Proceedings of the 28th International Conference on Program Comprehension. 37–47.Google ScholarDigital Library
Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. arxiv:cs/0205028 [cs.CL]Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google Scholar
Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. arxiv:1508.07909 [cs.CL]Google Scholar
Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Vicente Franco, and Miltiadis Allamanis. 2021. Fast and memory-efficient neural code completion. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 329–340.Google ScholarCross Ref
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 37–42. https://doi.org/10.18653/v1/P19-3007Google ScholarCross Ref
Ke Wang and Zhendong Su. 2020. Blended, precise semantic program embeddings. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 121–134.Google ScholarDigital Library
Yanlin Wang and Hui Li. 2021. Code completion by modeling flattened abstract syntax trees as graphs. Proceedings of AAAIConference on Artificial Intellegence (2021).Google ScholarCross Ref
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6Google ScholarCross Ref

Index Terms

Represent Code as Action Sequence for Predicting Next Method Call
1. Computing methodologies
  1. Machine learning
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
    2. Software post-development issues
  2. Software notations and tools
    1. General programming languages

Index terms have been assigned to the content through auto-classification.

Recommendations

Code smells detection via modern code review: a study of the OpenStack and Qt communities
Abstract
Code review plays an important role in software quality control. A typical review process involves a careful check of a piece of code in an attempt to detect and locate defects and other quality issues/violations. One type of issue that may impact ...
Read More
A large-scale empirical study on the lifecycle of code smell co-occurrences
Abstract Context
Code smells are suboptimal design or implementation choices made by programmers during the development of a software system that possibly lead to low code maintainability and higher maintenance costs.
...
Read More
An Exploratory Study of the Impact of Code Smells on Software Change-proneness
WCRE '09: Proceedings of the 2009 16th Working Conference on Reverse Engineering

Code smells are poor implementation choices, thought to make object-oriented systems hard to maintain. In this study, we investigate if classes with code smells are more change-prone than classes without smells. Specifically, we test the general ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on Internetware
June 2022
291 pages
ISBN:9781450397803
DOI:10.1145/3545258

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 September 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
mining software repositories
software engineering with AI
software engineering with Big data
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate55of111submissions,50%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 62
  Total Downloads
- Downloads (Last 12 months)35
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Represent Code as Action Sequence for Predicting Next Method Call

Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Code smells detection via modern code review: a study of the OpenStack and Qt communities

A large-scale empirical study on the lifecycle of code smell co-occurrences

An Exploratory Study of the Impact of Code Smells on Software Change-proneness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Represent Code as Action Sequence for Predicting Next Method Call

Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on Internetware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Code smells detection via modern code review: a study of the OpenStack and Qt communities

A large-scale empirical study on the lifecycle of code smell co-occurrences

An Exploratory Study of the Impact of Code Smells on Software Change-proneness

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media