skip to main content
10.1145/3545258.3545263acmotherconferencesArticle/Chapter ViewAbstractPublication PagesinternetwareConference Proceedingsconference-collections
research-article

Represent Code as Action Sequence for Predicting Next Method Call

Published:15 September 2022Publication History

ABSTRACT

As human beings take actions with a goal in mind, we could predict the following action of a person depending on his previous actions. Inspired by this, after collecting and analyzing more than 13,000 repositories with 441,290 Python source code files from the Internet, we find the actions expressed in code are in the developers’ high-level programming language statements.

Previous code comprehension and code completion research paid little attention to code editing contexts like code file names and repository names while representing code for machine learning models. After modeling code as action sequences and modeling method names, file names and repository names as code editing context, we use modern natural language processing techniques to utilize the huge open source resources from the Internet and train a code completion model which takes the action sequences in code as input to complete code for developers.

In the evaluation part, the experiments we conduct show the GPT-2 model trained with our action sequence code representation achieves 81.92% top-5 accuracy for next method call token prediction, compared to 61.89% of same GPT-2 model trained with same dataset. As for the context of the code we propose, we find it important for machines to comprehend the code better. Given the pre-trained natural language model, the training time of our model for 1,000,000 lines code is less than 16.7 minutes. All the above contribute to code comprehension and enhance code completion via unlimited resources from the Internet.

References

  1. Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 38–49.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 1–37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International conference on machine learning. PMLR, 2091–2100.Google ScholarGoogle Scholar
  4. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices 53, 4 (2018), 404–419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).Google ScholarGoogle Scholar
  6. Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929(2020).Google ScholarGoogle Scholar
  7. Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT international symposium on foundations of software engineering. 631–642.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Abram Hindle, Earl T Barr, Mark Gabel, Zhendong Su, and Premkumar Devanbu. 2016. On the naturalness of software. Commun. ACM 59, 5 (2016), 122–131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Michael Hucka. 2018. Spiral: splitters for identifiers in source code files. Journal of Open Source Software 3, 24 (2018), 653. https://doi.org/10.21105/joss.00653Google ScholarGoogle ScholarCross RefCross Ref
  10. Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, and Marc Brockschmidt. 2019. Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436(2019).Google ScholarGoogle Scholar
  11. Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. 2020. Big code!= big vocabulary: Open-vocabulary models for source code. In 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE). IEEE, 1073–1085.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Triet HM Le, Hao Chen, and Muhammad Ali Babar. 2020. Deep learning for source code modeling and generation: Models, applications, and challenges. ACM Computing Surveys (CSUR) 53, 3 (2020), 1–38.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A neural model for generating natural language summaries of program subroutines. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE). IEEE, 795–806.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fang Liu, Ge Li, Bolin Wei, Xin Xia, Zhiyi Fu, and Zhi Jin. 2020. A self-attentional neural architecture for code completion with multi-task learning. In Proceedings of the 28th International Conference on Program Comprehension. 37–47.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Edward Loper and Steven Bird. 2002. NLTK: The Natural Language Toolkit. arxiv:cs/0205028 [cs.CL]Google ScholarGoogle Scholar
  16. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  17. Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural Machine Translation of Rare Words with Subword Units. arxiv:1508.07909 [cs.CL]Google ScholarGoogle Scholar
  18. Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Vicente Franco, and Miltiadis Allamanis. 2021. Fast and memory-efficient neural code completion. In 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 329–340.Google ScholarGoogle ScholarCross RefCross Ref
  19. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  20. Jesse Vig. 2019. A Multiscale Visualization of Attention in the Transformer Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Florence, Italy, 37–42. https://doi.org/10.18653/v1/P19-3007Google ScholarGoogle ScholarCross RefCross Ref
  21. Ke Wang and Zhendong Su. 2020. Blended, precise semantic program embeddings. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation. 121–134.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yanlin Wang and Hui Li. 2021. Code completion by modeling flattened abstract syntax trees as graphs. Proceedings of AAAIConference on Artificial Intellegence (2021).Google ScholarGoogle ScholarCross RefCross Ref
  23. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Represent Code as Action Sequence for Predicting Next Method Call
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on Internetware
            June 2022
            291 pages
            ISBN:9781450397803
            DOI:10.1145/3545258

            Copyright © 2022 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 September 2022

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            Overall Acceptance Rate55of111submissions,50%
          • Article Metrics

            • Downloads (Last 12 months)35
            • Downloads (Last 6 weeks)6

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format