skip to main content
10.1145/3611643.3613895acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

AdaptivePaste: Intelligent Copy-Paste in IDE

Published:30 November 2023Publication History

ABSTRACT

In software development, it is common for programmers to copy-paste or port code snippets and then adapt them to their use case. This scenario motivates the code adaptation task – a variant of program repair which aims to adapt variable identifiers in a pasted snippet of code to the surrounding, preexisting context. However, no existing approach has been shown to effectively address this task. In this paper, we introduce AdaptivePaste, a learning-based approach to source code adaptation, based on transformers and a dedicated dataflow-aware deobfuscation pre-training task to learn meaningful representations of variable usage patterns. We demonstrate that AdaptivePaste can learn to adapt Python source code snippets with 67.8% exact match accuracy. We study the impact of confidence thresholds on the model predictions, showing the model precision can be further improved to 85.9% with our parallel-decoder transformer model in a selective code adaptation setting. To assess the practical use of AdaptivePaste we perform a user study among Python software developers on real-world copy-paste instances. The results show that AdaptivePaste reduces dwell time to nearly half the time it takes to port code manually, and helps to avoid bugs. In addition, we utilize the participant feedback to identify potential avenues for improvement.

References

  1. Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 281–293. https://doi.org/10.1145/2635868.2635883 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 38–49. https://doi.org/10.1145/2786805.2786849 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to adapt source code. arXiv preprint arXiv:1705.07867, https://doi.org/10.48550/arXiv.1705.07867 Google ScholarGoogle ScholarCross RefCross Ref
  4. Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1711.00740 Google ScholarGoogle ScholarCross RefCross Ref
  5. Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. 2021. Self-Supervised Bug Detection and Repair. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2105.12787 Google ScholarGoogle ScholarCross RefCross Ref
  6. Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices, 53, 4 (2018), 404–419. https://doi.org/10.48550/arXiv.1803.09544 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Earl T Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke. 2015. Automated software transplantation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. 257–269. https://doi.org/10.1145/2771783.2771796 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv:1809.05193, https://doi.org/10.48550/arXiv.1809.05193 Google ScholarGoogle ScholarCross RefCross Ref
  9. Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2009. Relating identifier naming flaws and code quality: An empirical study. In 2009 16th Working Conference on Reverse Engineering. 31–35. https://doi.org/10.1109/WCRE.2009.50 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, https://doi.org/10.48550/arXiv.2107.03374 Google ScholarGoogle ScholarCross RefCross Ref
  11. Colin Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: multi-mode translation of natural language and Python code with transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9052–9065. https://doi.org/10.48550/arXiv.2010.03150 Google ScholarGoogle ScholarCross RefCross Ref
  12. Colin Clement, Shuai Lu, Xiaoyu Liu, Michele Tufano, Dawn Drain, Nan Duan, Neel Sundaresan, and Alexey Svyatkovskiy. 2021. Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4713–4722. https://doi.org/10.48550/arXiv.2109.08780 Google ScholarGoogle ScholarCross RefCross Ref
  13. Yaniv David, Uri Alon, and Eran Yahav. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–28. https://doi.org/10.1145/3428293 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, https://doi.org/10.48550/arXiv.1810.04805 Google ScholarGoogle ScholarCross RefCross Ref
  15. Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. 2020. Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations (ICLR). Google ScholarGoogle Scholar
  16. Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 Google ScholarGoogle ScholarCross RefCross Ref
  17. Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2020. GraphCodeBERT: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366, https://doi.org/10.48550/arXiv.2009.08366 Google ScholarGoogle ScholarCross RefCross Ref
  18. Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. 2022. Learning to Complete Code with Sketches. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2106.10158 Google ScholarGoogle ScholarCross RefCross Ref
  19. Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. In International conference on learning representations. Google ScholarGoogle Scholar
  20. Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. 2021. DOBF: A Deobfuscation Pre-Training Objective for Programming Languages. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2102.07492 Google ScholarGoogle ScholarCross RefCross Ref
  21. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, https://doi.org/10.48550/arXiv.1910.13461 Google ScholarGoogle ScholarCross RefCross Ref
  22. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arxiv:1907.11692. Google ScholarGoogle ScholarCross RefCross Ref
  23. Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). https://doi.org/10.48550/arXiv.2102.04664 Google ScholarGoogle ScholarCross RefCross Ref
  24. Dan Popper and David Gibson. 2021. How often do people actually copy and paste from Stack Overflow? Now we know.. https://stackoverflow.blog/2021/12/30/how-often-do-people-actually-copy-and-paste-from-stack-overflow-now-we-know/ Google ScholarGoogle Scholar
  25. Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25. https://doi.org/10.1145/3276517 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–16. https://doi.org/10.48550/arXiv.1910.02054 Google ScholarGoogle ScholarCross RefCross Ref
  27. Baishakhi Ray and Miryung Kim. 2012. A Case Study of Cross-System Porting in Forked Projects. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12). Association for Computing Machinery, New York, NY, USA. Article 53, 11 pages. isbn:9781450316149 https://doi.org/10.1145/2393596.2393659 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and characterizing semantic inconsistencies in ported code. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 367–377. https://doi.org/10.1109/ASE.2013.6693095 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from" big code". ACM SIGPLAN Notices, 50, 1 (2015), 111–124. https://doi.org/10.1145/2676726.2677009 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1157–1168. https://doi.org/10.1145/2884781.2884877 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Armando Solar-Lezama. 2008. Program synthesis by sketching. University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443. https://doi.org/10.1145/3368089.3417058 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. Generating accurate assert statements for unit test cases using pretrained transformers. arXiv preprint arXiv:2009.05634, https://doi.org/10.1145/3524481.3527220 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh Singh. 2019. Neural program repair by jointly learning to localize and repair. arXiv preprint arXiv:1904.01720, https://doi.org//10.48550/arXiv.1904.01720 Google ScholarGoogle Scholar
  35. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008. https://doi.org/10.48550/arXiv.1706.03762 Google ScholarGoogle ScholarCross RefCross Ref
  36. Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. https://doi.org/10.48550/arXiv.2109.00859 arxiv:2109.00859. Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. AdaptivePaste: Intelligent Copy-Paste in IDE

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
          November 2023
          2215 pages
          ISBN:9798400703270
          DOI:10.1145/3611643

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 30 November 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate112of543submissions,21%
        • Article Metrics

          • Downloads (Last 12 months)131
          • Downloads (Last 6 weeks)9

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader