research-article

AdaptivePaste: Intelligent Copy-Paste in IDE

Authors:
Xiaoyu Liu

Microsoft, Redmond, United States

Microsoft, Redmond, United States
View Profile

,
Jinu Jang

Microsoft, Redmond, United States

Microsoft, Redmond, United States
View Profile

,
Neel Sundaresan

Microsoft, Redmond, United States

Microsoft, Redmond, United States
View Profile

,
Miltiadis Allamanis

Google Research, Cambridge, UK

Google Research, Cambridge, UK
View Profile

,
Alexey Svyatkovskiy

Microsoft, Redmond, United States

Microsoft, Redmond, United States
View Profile

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software EngineeringNovember 2023Pages 1844–1854https://doi.org/10.1145/3611643.3613895

Published:30 November 2023Publication History

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Pages 1844–1854

ABSTRACT

In software development, it is common for programmers to copy-paste or port code snippets and then adapt them to their use case. This scenario motivates the code adaptation task – a variant of program repair which aims to adapt variable identifiers in a pasted snippet of code to the surrounding, preexisting context. However, no existing approach has been shown to effectively address this task. In this paper, we introduce AdaptivePaste, a learning-based approach to source code adaptation, based on transformers and a dedicated dataflow-aware deobfuscation pre-training task to learn meaningful representations of variable usage patterns. We demonstrate that AdaptivePaste can learn to adapt Python source code snippets with 67.8% exact match accuracy. We study the impact of confidence thresholds on the model predictions, showing the model precision can be further improved to 85.9% with our parallel-decoder transformer model in a selective code adaptation setting. To assess the practical use of AdaptivePaste we perform a user study among Python software developers on real-world copy-paste instances. The results show that AdaptivePaste reduces dwell time to nearly half the time it takes to port code manually, and helps to avoid bugs. In addition, we utilize the participant feedback to identify potential avenues for improvement.

References

Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2014. Learning natural coding conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. 281–293. https://doi.org/10.1145/2635868.2635883 Google ScholarDigital Library
Miltiadis Allamanis, Earl T Barr, Christian Bird, and Charles Sutton. 2015. Suggesting accurate method and class names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 38–49. https://doi.org/10.1145/2786805.2786849 Google ScholarDigital Library
Miltiadis Allamanis and Marc Brockschmidt. 2017. SmartPaste: Learning to adapt source code. arXiv preprint arXiv:1705.07867, https://doi.org/10.48550/arXiv.1705.07867 Google ScholarCross Ref
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2018. Learning to Represent Programs with Graphs. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.1711.00740 Google ScholarCross Ref
Miltiadis Allamanis, Henry Jackson-Flux, and Marc Brockschmidt. 2021. Self-Supervised Bug Detection and Repair. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2105.12787 Google ScholarCross Ref
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices, 53, 4 (2018), 404–419. https://doi.org/10.48550/arXiv.1803.09544 Google ScholarDigital Library
Earl T Barr, Mark Harman, Yue Jia, Alexandru Marginean, and Justyna Petke. 2015. Automated software transplantation. In Proceedings of the 2015 International Symposium on Software Testing and Analysis. 257–269. https://doi.org/10.1145/2771783.2771796 Google ScholarDigital Library
Rohan Bavishi, Michael Pradel, and Koushik Sen. 2018. Context2Name: A deep learning-based approach to infer natural variable names from usage contexts. arXiv preprint arXiv:1809.05193, https://doi.org/10.48550/arXiv.1809.05193 Google ScholarCross Ref
Simon Butler, Michel Wermelinger, Yijun Yu, and Helen Sharp. 2009. Relating identifier naming flaws and code quality: An empirical study. In 2009 16th Working Conference on Reverse Engineering. 31–35. https://doi.org/10.1109/WCRE.2009.50 Google ScholarDigital Library
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, and Greg Brockman. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, https://doi.org/10.48550/arXiv.2107.03374 Google ScholarCross Ref
Colin Clement, Dawn Drain, Jonathan Timcheck, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. PyMT5: multi-mode translation of natural language and Python code with transformers. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 9052–9065. https://doi.org/10.48550/arXiv.2010.03150 Google ScholarCross Ref
Colin Clement, Shuai Lu, Xiaoyu Liu, Michele Tufano, Dawn Drain, Nan Duan, Neel Sundaresan, and Alexey Svyatkovskiy. 2021. Long-Range Modeling of Source Code Files with eWASH: Extended Window Access by Syntax Hierarchy. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 4713–4722. https://doi.org/10.48550/arXiv.2109.08780 Google ScholarCross Ref
Yaniv David, Uri Alon, and Eran Yahav. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages, 4, OOPSLA (2020), 1–28. https://doi.org/10.1145/3428293 Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, https://doi.org/10.48550/arXiv.1810.04805 Google ScholarCross Ref
Elizabeth Dinella, Hanjun Dai, Ziyang Li, Mayur Naik, Le Song, and Ke Wang. 2020. Hoppity: Learning graph transformations to detect and fix bugs in programs. In International Conference on Learning Representations (ICLR). Google Scholar
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, and Daxin Jiang. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Findings. 1536–1547. https://doi.org/10.18653/v1/2020.findings-emnlp.139 Google ScholarCross Ref
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Alexey Svyatkovskiy, and Shengyu Fu. 2020. GraphCodeBERT: Pre-training code representations with data flow. arXiv preprint arXiv:2009.08366, https://doi.org/10.48550/arXiv.2009.08366 Google ScholarCross Ref
Daya Guo, Alexey Svyatkovskiy, Jian Yin, Nan Duan, Marc Brockschmidt, and Miltiadis Allamanis. 2022. Learning to Complete Code with Sketches. In International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2106.10158 Google ScholarCross Ref
Vincent J Hellendoorn, Charles Sutton, Rishabh Singh, Petros Maniatis, and David Bieber. 2019. Global relational models of source code. In International conference on learning representations. Google Scholar
Marie-Anne Lachaux, Baptiste Roziere, Marc Szafraniec, and Guillaume Lample. 2021. DOBF: A Deobfuscation Pre-Training Objective for Programming Languages. Advances in Neural Information Processing Systems, 34 (2021), https://doi.org/10.48550/arXiv.2102.07492 Google ScholarCross Ref
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, https://doi.org/10.48550/arXiv.1910.13461 Google ScholarCross Ref
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692 arxiv:1907.11692. Google ScholarCross Ref
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin Clement, Dawn Drain, Daxin Jiang, and Duyu Tang. 2021. CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1). https://doi.org/10.48550/arXiv.2102.04664 Google ScholarCross Ref
Dan Popper and David Gibson. 2021. How often do people actually copy and paste from Stack Overflow? Now we know.. https://stackoverflow.blog/2021/12/30/how-often-do-people-actually-copy-and-paste-from-stack-overflow-now-we-know/ Google Scholar
Michael Pradel and Koushik Sen. 2018. Deepbugs: A learning approach to name-based bug detection. Proceedings of the ACM on Programming Languages, 2, OOPSLA (2018), 1–25. https://doi.org/10.1145/3276517 Google ScholarDigital Library
Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, and Yuxiong He. 2020. Zero: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis. 1–16. https://doi.org/10.48550/arXiv.1910.02054 Google ScholarCross Ref
Baishakhi Ray and Miryung Kim. 2012. A Case Study of Cross-System Porting in Forked Projects. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering (FSE ’12). Association for Computing Machinery, New York, NY, USA. Article 53, 11 pages. isbn:9781450316149 https://doi.org/10.1145/2393596.2393659 Google ScholarDigital Library
Baishakhi Ray, Miryung Kim, Suzette Person, and Neha Rungta. 2013. Detecting and characterizing semantic inconsistencies in ported code. In 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE). 367–377. https://doi.org/10.1109/ASE.2013.6693095 Google ScholarDigital Library
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting program properties from" big code". ACM SIGPLAN Notices, 50, 1 (2015), 111–124. https://doi.org/10.1145/2676726.2677009 Google ScholarDigital Library
Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K. Roy, and Cristina V. Lopes. 2016. SourcererCC: Scaling Code Clone Detection to Big-Code. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). 1157–1168. https://doi.org/10.1145/2884781.2884877 Google ScholarDigital Library
Armando Solar-Lezama. 2008. Program synthesis by sketching. University of California, Berkeley. Google ScholarDigital Library
Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, and Neel Sundaresan. 2020. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 1433–1443. https://doi.org/10.1145/3368089.3417058 Google ScholarDigital Library
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, and Neel Sundaresan. 2020. Generating accurate assert statements for unit test cases using pretrained transformers. arXiv preprint arXiv:2009.05634, https://doi.org/10.1145/3524481.3527220 Google ScholarDigital Library
Marko Vasic, Aditya Kanade, Petros Maniatis, David Bieber, and Rishabh Singh. 2019. Neural program repair by jointly learning to localize and repair. arXiv preprint arXiv:1904.01720, https://doi.org//10.48550/arXiv.1904.01720 Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998–6008. https://doi.org/10.48550/arXiv.1706.03762 Google ScholarCross Ref
Yue Wang, Weishi Wang, Shafiq Joty, and Steven C. H. Hoi. 2021. CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation. https://doi.org/10.48550/arXiv.2109.00859 arxiv:2109.00859. Google ScholarCross Ref

Index Terms

AdaptivePaste: Intelligent Copy-Paste in IDE
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Software and its engineering
  1. Software creation and management
    1. Software development techniques
      1. Error handling and recovery
  2. Software notations and tools
    1. Formal language definitions
      1. Syntax

Recommendations

Shaping program repair space with existing patches and similar code
ISSTA 2018: Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis

Automated program repair (APR) has great potential to reduce bug-fixing effort and many approaches have been proposed in recent years. APRs are often treated as a search problem where the search space consists of all the possible patches and the goal is ...
Read More
Cleaning up copy---paste clones with interactive merging

Copy-paste-modify is a form of software reuse in which developers explicitly duplicate source code. This duplicated source code, amounting to a code clone, is adapted for a new purpose. Copy-paste-modify is popular among software developers, however, ...
Read More
Copy and paste redeemed
ASE '15: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering

Modern software development relies on code reuse, which software engineers typically realise through hand-written abstractions, such as functions, methods, or classes. However, such abstractions can be challenging to develop and maintain. One ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering
November 2023
2215 pages
ISBN:9798400703270
DOI:10.1145/3611643
General Chair:
Satish Chandra
Google, USA
,
Program Chairs:
Kelly Blincoe
University of Auckland, New Zealand
,
Paolo Tonella
USI Lugano, Switzerland
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 30 November 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Code adaptation
Machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate112of543submissions,21%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 131
  Total Downloads
- Downloads (Last 12 months)131
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

AdaptivePaste: Intelligent Copy-Paste in IDE

ESEC/FSE 2023: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Shaping program repair space with existing patches and similar code

Cleaning up copy---paste clones with interactive merging

Copy and paste redeemed