research-article

XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training

Authors:
Zehao Lin

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

0000-0002-4726-7867
View Profile

,
Guodun Li

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
View Profile

,
Jingfeng Zhang

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
View Profile

,
Yue Deng

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
View Profile

,
Xiangji Zeng

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
View Profile

,
Yin Zhang

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China

College of Computer Science and Technology, Zhejiang University, Hangzhou, Zhejiang, China
View Profile

,
Yao Wan

School of Computer Sci. and Tech., Huazhong University of Science and Technology, Wuhan, Hubei, China

School of Computer Sci. and Tech., Huazhong University of Science and Technology, Wuhan, Hubei, China
View Profile

ACM Transactions on Software Engineering and Methodology Volume 31 Issue 3Article No.: 52pp 1–44https://doi.org/10.1145/3506696

Published:09 April 2022Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Source code representation learning is the basis of applying artificial intelligence to many software engineering tasks such as code clone detection, algorithm classification, and code summarization. Recently, many works have tried to improve the performance of source code representation from various perspectives, e.g., introducing the structural information of programs into latent representation. However, when dealing with rapidly expanded unlabeled cross-language source code datasets from the Internet, there are still two issues. Firstly, deep learning models for many code-specific tasks still suffer from the lack of high-quality labels. Secondly, the structural differences among programming languages make it more difficult to process multiple languages in a single neural architecture.

To address these issues, in this article, we propose a novel Cross-language Code representation with a large-scale pre-training (XCode) method. Concretely, we propose to use several abstract syntax trees and ELMo-enhanced variational autoencoders to obtain multiple pre-trained source code language models trained on about 1.5 million code snippets. To fully utilize the knowledge across programming languages, we further propose a Shared Encoder-Decoder (SED) architecture which uses the multi-teacher single-student method to transfer knowledge from the aforementioned pre-trained models to the distilled SED. The pre-trained models and SED will cooperate to better represent the source code. For evaluation, we examine our approach on three typical downstream cross-language tasks, i.e., source code translation, code clone detection, and code-to-code search, on a real-world dataset composed of programming exercises with multiple solutions. Experimental results demonstrate the effectiveness of our proposed approach on cross-language code representations. Meanwhile, our approach performs significantly better than several code representation baselines on different downstream tasks in terms of multiple automatic evaluation metrics.

REFERENCES

[1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, Kudlur Manjunath, Levenberg Josh, Monga Rajat, Moore Sherry, Murray Derek Gordon, Steiner Benoit, Tucker Paul A., Vasudevan Vijay, Warden Pete, Wicke Martin, Yu Yuan, and Zheng Xiaoqiang. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, November 2-4, 2016, Keeton Kimberly and Roscoe Timothy (Eds.), USENIX Association, 265–283. Retrieved from https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.Google Scholar
[2] Allamanis Miltiadis, Barr Earl T, Devanbu Premkumar, and Sutton Charles. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 81.Google ScholarDigital Library
[3] Alon Uri, Brody Shaked, Levy Omer, and Yahav Eran. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=H1gKYo09tX.Google Scholar
[4] Alon Uri, Zilberstein Meital, Levy Omer, and Yahav Eran. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices 53, 4 (2018), 404–419.Google ScholarDigital Library
[5] Alon Uri, Zilberstein Meital, Levy Omer, and Yahav Eran. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 1–29.Google ScholarDigital Library
[6] Aung Thazin Win Win, Wan Yao, Huo Huan, and Sui Yulei. 2022. Multi-triage: A multi-task learning framework for bug triage. Journal of Systems and Software 184 (2022), 111133.Google ScholarDigital Library
[7] Bahuleyan Hareesh, Mou Lili, Vechtomova Olga, and Poupart Pascal. 2018. Variational attention for sequence-to-sequence models. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, 1672–1682. Retrieved from https://aclanthology.org/C18-1142.Google Scholar
[8] Banerjee Satanjeev and Lavie Alon. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, Ann Arbor, Michigan, June 29, 2005, Goldstein Jade, Lavie Alon, Lin Chin-Yew, and Voss Clare R. (Eds.), Association for Computational Linguistics, 65–72. https://www.aclweb.org/anthology/W05-0909/.Google Scholar
[9] Barbez Antoine, Khomh Foutse, and Guéhéneuc Yann-Gaël. 2019. Deep learning anti-patterns from code metrics history. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, September 29 - October 4, 2019. IEEE, 114–124.Google ScholarCross Ref
[10] Bhoopchand Avishkar, Rocktäschel Tim, Barr Earl T., and Riedel Sebastian. 2016. Learning python code suggestion with a sparse pointer network. arXiv:1611.08307. Retrieved from http://arxiv.org/abs/1611.08307.Google Scholar
[11] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, Agarwal Sandhini, Herbert-Voss Ariel, Krueger Gretchen, Henighan Tom, Child Rewon, Ramesh Aditya, Ziegler Daniel M., Wu Jeffrey, Winter Clemens, Hesse Christopher, Chen Mark, Sigler Eric, Litwin Mateusz, Gray Scott, Chess Benjamin, Clark Jack, Berner Christopher, McCandlish Sam, Radford Alec, Sutskever Ilya, and Amodei Dario. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.) Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.Google Scholar
[12] Bui Nghi D. Q., Yu Yijun, and Jiang Lingxiao. 2021. InferCode: Self-supervised learning of code representations by predicting subtrees. In 43rd IEEE/ACM International Conference on Software Engineering (ICSE’21), Madrid, Spain, 22–30 May 2021. IEEE, 1186–1197. DOI:Google ScholarDigital Library
[13] Cambronero José, Li Hongyu, Kim Seohyun, Sen Koushik, and Chandra Satish. 2019. When deep learning met code search. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Dumas Marlon, Pfahl Dietmar, Apel Sven, and Russo Alessandra (Eds.), ACM, 964–974.Google ScholarDigital Library
[14] Casale Francesco Paolo, Dalca Adrian V., Saglietti Luca, Listgarten Jennifer, and Fusi Nicoló. 2018. Gaussian process prior variational autoencoders. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Bengio Samy, Wallach Hanna M., Larochelle Hugo, Grauman Kristen, Cesa-Bianchi Nicolò, and Garnett Roman (Eds.). 10390–10401. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/1c336b8080f82bcc2cd2499b4c57261d-Abstract.html.Google Scholar
[15] Chen Junjie, Wu Zhuo, Wang Zan, You Hanmo, Zhang Lingming, and Yan Ming. 2020. Practical accuracy estimation for efficient deep neural network testing. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 30:1–30:35.Google ScholarDigital Library
[16] Chen Mingda, Tang Qingming, Wiseman Sam, and Gimpel Kevin. 2019. Controllable paraphrase generation with a syntactic exemplar. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Korhonen Anna, Traum David R., and Màrquez Lluís (Eds.), Association for Computational Linguistics, 5972–5984.Google ScholarCross Ref
[17] Chen Mingda, Tang Qingming, Wiseman Sam, and Gimpel Kevin. 2019. A multi-task approach for disentangling syntax and semantics in sentence representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers), Burstein Jill, Doran Christy, and Solorio Thamar (Eds.), Association for Computational Linguistics, 2453–2464.Google ScholarCross Ref
[18] Chen Qingying and Zhou Minghui. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Huchard Marianne, Kästner Christian, and Fraser Gordon (Eds.), ACM, 826–831.Google ScholarDigital Library
[19] Chen Xinyun, Liu Chang, and Song Dawn. 2018. Tree-to-tree neural networks for program translation. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Bengio Samy, Wallach Hanna M., Larochelle Hugo, Grauman Kristen, Cesa-Bianchi Nicolò, and Garnett Roman (Eds.), 2552–2562. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/d759175de8ea5b1d9a2660e45554894f-Abstract.html.Google Scholar
[20] Chi Zewen, Dong Li, Wei Furu, Wang Wenhui, Mao Xian-Ling, and Huang Heyan. 2020. Cross-lingual natural language generation via pre-training. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, The 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 7570–7577. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6256.Google Scholar
[21] Choi Yunjey, Choi Min-Je, Kim Munyoung, Ha Jung-Woo, Kim Sunghun, and Choo Jaegul. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18), Salt Lake City, UT, USA, June 18–22, 2018. Computer Vision Foundation/IEEE Computer Society, 8789–8797. DOI:Google ScholarCross Ref
[22] Collard Michael L., Decker Michael John, and Maletic Jonathan I.. 2013. srcML: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. In IEEE International Conference on Software Maintenance, Eindhoven, The Netherlands, September 22–28, 2013. IEEE Computer Society, 516–519. DOI:Google ScholarDigital Library
[23] Conneau Alexis and Lample Guillaume. 2019. Cross-lingual language model pretraining. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Wallach Hanna M., Larochelle Hugo, Beygelzimer Alina, d’Alché-Buc Florence, Fox Emily B., and Garnett Roman (Eds.), 7057–7067. Retrieved from https://proceedings.neurips.cc/paper/2019/hash/c04c19c2c2474dbf5f7ac4372c5b9af1-Abstract.html.Google Scholar
[24] Cummins Chris, Petoumenos Pavlos, Murray Alastair, and Leather Hugh. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018, Tip Frank and Bodden Eric (Eds.), ACM, 95–105.Google ScholarDigital Library
[25] Cvitkovic Milan, Singh Badal, and Anandkumar Animashree. 2019. Open vocabulary learning on source code with a graph-structured cache. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Chaudhuri Kamalika and Salakhutdinov Ruslan (Eds.), PMLR, 1475–1485. Retrieved from http://proceedings.mlr.press/v97/cvitkovic19b.html.Google Scholar
[26] David Yaniv, Alon Uri, and Yahav Eran. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 225:1–225:28.Google ScholarDigital Library
[27] DAVIS AL. 1982. Data flow program graphs. Computer 15, 2 (1982), 26–41. DOI:Google ScholarDigital Library
[28] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers), Burstein Jill, Doran Christy, and Solorio Thamar (Eds.), Association for Computational Linguistics, 4171–4186.Google Scholar
[29] Eric Mihail, Goel Rahul, Paul Shachi, Sethi Abhishek, Agarwal Sanchit, Gao Shuyang, Kumar Adarsh, Goyal Anuj Kumar, Ku Peter, and Hakkani-Tür Dilek. 2020. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020, Calzolari Nicoletta, Béchet Frédéric, Blache Philippe, Choukri Khalid, Cieri Christopher, Declerck Thierry, Goggi Sara, Isahara Hitoshi, Maegaard Bente, Mariani Joseph, Mazo Hélène, Moreno Asunción, Odijk Jan, and Piperidis Stelios (Eds.), European Language Resources Association, 422–428.Google Scholar
[30] Fan Yang, Tian Fei, Qin Tao, Li Xiang-Yang, and Liu Tie-Yan. 2018. Learning to teach. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Retrieved from https://openreview.net/forum?id=HJewuJWCZ.Google Scholar
[31] Feng Zhangyin, Guo Daya, Tang Duyu, Duan Nan, Feng Xiaocheng, Gong Ming, Shou Linjun, Qin Bing, Liu Ting, Jiang Daxin, and Zhou Ming. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. DOI:Google ScholarCross Ref
[32] Firat Orhan, Cho Kyunghyun, and Bengio Yoshua. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, June 12-17, 2016, Knight Kevin, Nenkova Ani, and Rambow Owen (Eds.), The Association for Computational Linguistics, 866–875.Google ScholarCross Ref
[33] Ganin Yaroslav, Ustinova Evgeniya, Ajakan Hana, Germain Pascal, Larochelle Hugo, Laviolette François, Marchand Mario, and Lempitsky Victor S.. 2017. Domain-adversarial training of neural networks. In Proceedings of the Domain Adaptation in Computer Vision Applications, Csurka Gabriela (Ed.), Springer, 189–209.Google ScholarCross Ref
[34] Gao Yi, Wang Zan, Liu Shuang, Yang Lin, Sang Wei, and Cai Yuanfang. 2019. TECCD: A tree embedding approach for code clone detection. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, September 29 - October 4, 2019. IEEE, 145–156.Google ScholarCross Ref
[35] Gao Zhipeng, Xia Xin, Grundy John, Lo David, and Li Yuan-Fang. 2020. Generating question titles for stack overflow from mined code snippets. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 26:1–26:37.Google ScholarDigital Library
[36] Godefroid Patrice, Peleg Hila, and Singh Rishabh. 2017. Learn&Fuzz: Machine learning for input fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, October 30 - November 03, 2017, Rosu Grigore, Penta Massimiliano Di, and Nguyen Tien N. (Eds.), IEEE Computer Society, 50–59.Google ScholarCross Ref
[37] Gu Xiaodong, Zhang Hongyu, and Kim Sunghun. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Chaudron Michel, Crnkovic Ivica, Chechik Marsha, and Harman Mark (Eds.), ACM, 933–944.Google ScholarDigital Library
[38] Guo Daya, Ren Shuo, Lu Shuai, Feng Zhangyin, Tang Duyu, Liu Shujie, Zhou Long, Duan Nan, Svyatkovskiy Alexey, Fu Shengyu, Tufano Michele, Deng Shao Kun, Clement Colin B., Drain Dawn, Sundaresan Neel, Yin Jian, Jiang Daxin, and Zhou Ming. 2021. GraphCodeBERT: Pre-training code representations with data flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net.Google Scholar
[39] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778. DOI:Google ScholarCross Ref
[40] Hindle Abram, Barr Earl T, Su Zhendong, Gabel Mark, and Devanbu Premkumar. 2012. On the naturalness of software. In Proceedings of the 2012 34th International Conference on Software Engineering, Martin Glinz, Gail C. Murphy, and Mauro Pezzè (Eds.). IEEE, 837–847. DOI:Google ScholarCross Ref
[41] Hinton Geoffrey E., Vinyals Oriol, and Dean Jeffrey. 2015. Distilling the knowledge in a neural network. arxiv:1503.02531. Retrieved from http://arxiv.org/abs/1503.02531.Google Scholar
[42] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735–1780.Google ScholarDigital Library
[43] Hong Haiwen, Zhang Jingfeng, Zhang Yin, Wan Yao, and Sui Yulei. 2021. Fix-filter-fix: Intuitively connect any models for effective bug fixing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Moens Marie-Francine, Huang Xuanjing, Specia Lucia, and Yih Scott Wen-tau (Eds.), Association for Computational Linguistics, 3495–3504. Retrieved from https://aclanthology.org/2021.emnlp-main.282.Google ScholarCross Ref
[44] Hsu Wei-Ning, Zhang Yu, and Glass James R.. 2017. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017. IEEE, 16–23.Google ScholarCross Ref
[45] Hu Xing, Li Ge, Xia Xin, Lo David, and Jin Zhi. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018, Khomh Foutse, Roy Chanchal K., and Siegmund Janet (Eds.), ACM, 200–210.Google ScholarDigital Library
[46] Hu Zhiting, Yang Zichao, Liang Xiaodan, Salakhutdinov Ruslan, and Xing Eric P.. 2017. Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017(Proceedings of Machine Learning Research, Vol. 70), Precup Doina and Teh Yee Whye (Eds.), PMLR, 1587–1596. Retrieved from http://proceedings.mlr.press/v70/hu17e.html.Google Scholar
[47] Huang Huaibo, Li Zhihang, He Ran, Sun Zhenan, and Tan Tieniu. 2018. IntroVAE: Introspective variational autoencoders for photographic image synthesis. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Bengio Samy, Wallach Hanna M., Larochelle Hugo, Grauman Kristen, Cesa-Bianchi Nicolò, and Garnett Roman (Eds.), 52–63. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.html.Google Scholar
[48] Huo Xuan, Thung Ferdian, Li Ming, Lo David, and Shi Shu-Ting. 2021. Deep transfer bug localization. IEEE Transactions on Software Engineering 47, 7 (2021), 1368–1380.Google ScholarCross Ref
[49] Jiang Lucy, Rewcastle Robert, Denny Paul, and Tempero Ewan D.. 2020. CompareCFG: Providing visual feedback on code quality using control flow graphs. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education, ITiCSE 2020, Trondheim, Norway, June 15-19, 2020, Giannakos Michail N., Sindre Guttorm, Luxton-Reilly Andrew, and Divitini Monica (Eds.), ACM, 493–499.Google ScholarDigital Library
[50] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David A., Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, Boyle Rick, Cantin Pierre-luc, Chao Clifford, Clark Chris, Coriell Jeremy, Daley Mike, Dau Matt, Dean Jeffrey, Gelb Ben, Ghaemmaghami Tara Vazir, Gottipati Rajendra, Gulland William, Hagmann Robert, Ho C. Richard, Hogberg Doug, Hu John, Hundt Robert, Hurt Dan, Ibarz Julian, Jaffey Aaron, Jaworski Alek, Kaplan Alexander, Khaitan Harshit, Killebrew Daniel, Koch Andy, Kumar Naveen, Lacy Steve, Laudon James, Law James, Le Diemthu, Leary Chris, Liu Zhuyuan, Lucke Kyle, Lundin Alan, MacKean Gordon, Maggiore Adriana, Mahony Maire, Miller Kieran, Nagarajan Rahul, Narayanaswami Ravi, Ni Ray, Nix Kathy, Norrie Thomas, Omernick Mark, Penukonda Narayana, Phelps Andy, Ross Jonathan, Ross Matt, Salek Amir, Samadiani Emad, Severn Chris, Sizikov Gregory, Snelham Matthew, Souter Jed, Steinberg Dan, Swing Andy, Tan Mercedes, Thorson Gregory, Tian Bo, Toma Horia, Tuttle Erick, Vasudevan Vijay, Walter Richard, Wang Walter, Wilcox Eric, and Yoon Doe Hyun. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. ACM, 1–12.Google ScholarDigital Library
[51] Kim Kisub, Kim Dongsun, Bissyandé Tegawendé F., Choi Eunjong, Li Li, Klein Jacques, and Traon Yves Le. 2018. FaCoY: A code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Chaudron Michel, Crnkovic Ivica, Chechik Marsha, and Harman Mark (Eds.), ACM, 946–957.Google Scholar
[52] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Moschitti Alessandro, Pang Bo, and Daelemans Walter (Eds.), ACL, 1746–1751.Google ScholarCross Ref
[53] Kim Yoon and Rush Alexander M.. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, November 1-4, 2016, Su Jian, Carreras Xavier, and Duh Kevin (Eds.), The Association for Computational Linguistics, 1317–1327. Retrieved from http://aclweb.org/anthology/D/D16/D16-1139.pdf.Google ScholarCross Ref
[54] Kingma Diederik P. and Welling Max. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Bengio Yoshua and LeCun Yann (Eds.).Google Scholar
[55] LeCun Yann and Bengio Yoshua. 1998. Convolutional Networks for Images, Speech, and Time Series. MIT Press, Cambridge, MA, 255–258.Google Scholar
[56] Lee Jason, Cho Kyunghyun, and Hofmann Thomas. 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365–378. Retrieved from https://transacl.org/ojs/index.php/tacl/article/view/1051.Google ScholarCross Ref
[57] Lin Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81.Google Scholar
[58] Lin Zehao, Huang Xinjing, Ji Feng, Chen Haiqing, and Zhang Yin. 2019. Task-oriented conversation generation using heterogeneous memory networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Inui Kentaro, Jiang Jing, Ng Vincent, and Wan Xiaojun (Eds.), Association for Computational Linguistics, 4557–4566.Google ScholarCross Ref
[59] Liu Fang, Zhang Lu, and Jin Zhi. 2020. Modeling programs hierarchically with stack-augmented LSTM. Journal of Systems and Software 164 (2020), 110547. DOI:Google ScholarCross Ref
[60] Liu Pengfei, Qiu Xipeng, and Huang Xuanjing. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, 9-15 July 2016, Kambhampati Subbarao (Ed.), IJCAI/AAAI Press, 2873–2879.Google Scholar
[61] Liu Shangqing, Gao Cuiyun, Chen Sen, Nie Lun Yiu, and Liu Yang. 2020. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering 1 (2020), 1–1.Google Scholar
[62] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692.Google Scholar
[63] Liu Zihan, Winata Genta Indra, Xu Peng, Lin Zhaojiang, and Fung Pascale. 2020. Cross-lingual spoken language understanding with regularized representation alignment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Webber Bonnie, Cohn Trevor, He Yulan, and Liu Yang (Eds.), Association for Computational Linguistics, 7241–7251.Google ScholarCross Ref
[64] Lu Shuai, Guo Daya, Ren Shuo, Huang Junjie, Svyatkovskiy Alexey, Blanco Ambrosio, Clement Colin B., Drain Dawn, Jiang Daxin, Tang Duyu, Li Ge, Zhou Lidong, Shou Linjun, Zhou Long, Tufano Michele, Gong Ming, Zhou Ming, Duan Nan, Sundaresan Neel, Deng Shao Kun, Fu Shengyu, and Liu Shujie. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).Google Scholar
[65] Matiisen Tambet, Oliver A., Cohen T., and Schulman John. 2020. Teacher-student curriculum learning. IEEE Transactions on Neural Networks and Learning Systems 31, 9 (2020), 3732–3740. DOI:Google ScholarCross Ref
[66] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2-4, 2013, Workshop Track Proceedings, Bengio Yoshua and LeCun Yann (Eds.), http://arxiv.org/abs/1301.3781.Google Scholar
[67] Mou Lili, Li Ge, Zhang Lu, Wang Tao, and Jin Zhi. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the 13th AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, Schuurmans Dale and Wellman Michael P. (Eds.), AAAI Press, 1287–1293. Retrieved from http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11775.Google ScholarCross Ref
[68] Nafi Kawser Wazed, Kar Tonny Shekha, Roy Banani, Roy Chanchal K., and Schneider Kevin A.. 2019. CLCDSA: Cross language code clone detection using syntactical features and API documentation. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, November 11-15, 2019. IEEE, 1026–1037.Google ScholarDigital Library
[69] Neamtiu Iulian, Foster Jeffrey S., and Hicks Michael. 2005. Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Software Engineering Notes 30, 4 (2005), 1–5.Google ScholarDigital Library
[70] Nghi Bui D. Q., Yu Yijun, and Jiang Lingxiao. 2019. Bilateral dependency neural networks for cross-language algorithm classification. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, Wang Xinyu, Lo David, and Shihab Emad (Eds.). IEEE, 422–433.Google Scholar
[71] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA311–318. Retrieved from http://www.aclweb.org/anthology/P02-1040.pdf.Google Scholar
[72] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, Desmaison Alban, Köpf Andreas, Yang Edward Z., DeVito Zachary, Raison Martin, Tejani Alykhan, Chilamkurthy Sasank, Steiner Benoit, Fang Lu, Bai Junjie, and Chintala Soumith. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Wallach Hanna M., Larochelle Hugo, Beygelzimer Alina, d’Alché-Buc Florence, Fox Emily B., and Garnett Roman (Eds.), 8024–8035. Retrieved from https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html.Google Scholar
[73] Peng Shuke, Ji Feng, Lin Zehao, Cui Shaobo, Chen Haiqing, and Zhang Yin. 2020. MTSS: Learn from multiple domain teachers and become a multi-domain dialogue expert. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 8608–8615. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6384.Google Scholar
[74] Phannachitta Passakorn. 2020. On an optimal analogy-based software effort estimation. Information and Software Technology 125 (2020), 106330. DOI:Google ScholarCross Ref
[75] Rahman Md, Watanobe Yutaka, and Nakamura Keita. 2020. Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Applied Sciences 10, 8 (2020), 2973.Google ScholarCross Ref
[76] Rozière Baptiste, Lachaux Marie-Anne, Chanussot Lowik, and Lample Guillaume. 2020. Unsupervised translation of programming languages. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/ed23fbf18c2cd35f8c7f8de44f85c08d-Abstract.html.Google Scholar
[77] Rumelhart David E., Hinton Geoffrey E., and Williams Ronald J.. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533–536. Retrieved from http://www.nature.com/articles/323533a0.Google ScholarCross Ref
[78] Schuster Mike and Paliwal Kuldip K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 2673–2681.Google ScholarDigital Library
[79] Shido Yusuke, Kobayashi Yasuaki, Yamamoto Akihiro, Miyamoto Atsushi, and Matsumura Tadayuki. 2019. Automatic source code summarization with extended tree-LSTM. In Proceedings of the International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019. IEEE, 1–8.Google ScholarCross Ref
[80] Szegedy Christian, Ioffe Sergey, Vanhoucke Vincent, and Alemi Alexander A.. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, Singh Satinder P. and Markovitch Shaul (Eds.), AAAI Press, 4278–4284. Retrieved from http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14806.Google ScholarCross Ref
[81] Tan Mingxing and Le Quoc V.. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Chaudhuri Kamalika and Salakhutdinov Ruslan (Eds.), PMLR, 6105–6114. Retrieved from http://proceedings.mlr.press/v97/tan19a.html.Google Scholar
[82] Tan Xu, Ren Yi, He Di, Qin Tao, Zhao Zhou, and Liu Tie-Yan. 2019. Multilingual neural machine translation with knowledge distillation. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, May 6-9, 2019. OpenReview.net. Retrieved from https://openreview.net/forum?id=S1gUsoR9YX.Google Scholar
[83] Tian Gang, Wang Qibo, Zhao Yi, Guo Lantian, Sun Zhonglin, and Lv Liangyu. 2020. Smart contract classification with a Bi-LSTM based approach. IEEE Access 8 (2020), 43806–43816.Google ScholarCross Ref
[84] Tölke Jonas. 2010. Implementation of a lattice boltzmann kernel using the compute unified device architecture developed by nVIDIA. Computing and Visualization in Science 13, 1 (2010), 29.Google ScholarCross Ref
[85] Tong Haonan, Liu Bin, and Wang Shihai. 2018. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Information and Software Technology 96 (2018), 94–111.Google ScholarDigital Library
[86] Tran Chau, Tang Yuqing, Li Xian, and Gu Jiatao. 2020. Cross-lingual retrieval for iterative self-supervised training. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1763ea5a7e72dd7ee64073c2dda7a7a8-Abstract.html.Google Scholar
[87] Tufano Michele, Watson Cody, Bavota Gabriele, Penta Massimiliano Di, White Martin, and Poshyvanyk Denys. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology 28, 4 (2019), 19:1–19:29. DOI:Google ScholarDigital Library
[88] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,Guyon Isabelle, Luxburg Ulrike von, Bengio Samy, Wallach Hanna M., Fergus Rob, Vishwanathan S. V. N., and Garnett Roman (Eds.), 5998–6008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.Google Scholar
[89] Vedantam Ramakrishna, Zitnick C. Lawrence, and Parikh Devi. 2015. CIDEr: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, June 7-12, 2015. IEEE Computer Society, 4566–4575.Google ScholarCross Ref
[90] VenkataKeerthy S., Aggarwal R., Jain S., Desarkar Maunendra Sankar, Upadrasta Ramakrishna, and Srikant Y. N.. 2019. IR2Vec: A flow analysis based scalable infrastructure for program encodings. CoRR abs/1909.06228.Google Scholar
[91] Wan Yao, Shu Jingdong, Sui Yulei, Xu Guandong, Zhao Zhou, Wu Jian, and Yu Philip S.. 2019. Multi-modal attention network learning for semantic source code retrieval. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, November 11-15, 2019. IEEE, 13–25.Google ScholarDigital Library
[92] Wan Yao, Zhao Zhou, Yang Min, Xu Guandong, Ying Haochao, Wu Jian, and Yu Philip S. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 397–407.Google ScholarDigital Library
[93] Wang Senzhang, Cao Jiannong, and Yu Philip. 2020. Deep learning for spatio-temporal data mining: A survey. IEEE Transactions on Knowledge and Data Engineering (2020). https://ieeexplore.ieee.org/document/9204396/citations#citations.Google ScholarCross Ref
[94] Wang Wenhan, Li Ge, Shen Sijie, Xia Xin, and Jin Zhi. 2020. Modular tree network for source code representation learning. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 1–23.Google ScholarDigital Library
[95] Wang Wenhan, Zhang Kechi, Li Ge, and Jin Zhi. 2020. Learning to represent programs with heterogeneous graphs. CoRR.Google Scholar
[96] Wang Xin, Huang Qiuyuan, Celikyilmaz Asli, Gao Jianfeng, Shen Dinghan, Wang Yuan-Fang, Wang William Yang, and Zhang Lei. 2019. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6629–6638.Google ScholarCross Ref
[97] Wang Xin, Wang Yasheng, Mi Fei, Zhou Pingyi, Wan Yao, Liu Xiao, Li Li, Wu Hao, Liu Jin, and Jiang Xin. 2021. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv:2108.04556. Retrieved from https://arxiv.org/abs/2108.04556.Google Scholar
[98] White Martin, Vendome Christopher, Linares-Vásquez Mario, and Poshyvanyk Denys. 2015. Toward deep learning software repositories. In Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 334–345.Google ScholarCross Ref
[99] Xiao Yan, Keung Jacky, Bennin Kwabena E, and Mi Qing. 2019. Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software Technology 105 (2019), 17–29.Google ScholarCross Ref
[100] Xu A., Dai T., Chen Huajun, Ming Zhe, and Li W.. 2018. Vulnerability detection for source code using contextual LSTM. In Proceedings of the 2018 5th International Conference on Systems and Informatics. 1225–1230.Google Scholar
[101] Xu Bowen, Ye Deheng, Xing Zhenchang, Xia Xin, Chen Guibin, and Li Shanping. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016, Lo David, Apel Sven, and Khurshid Sarfraz (Eds.), ACM, 51–62.Google ScholarDigital Library
[102] Xu Jiacheng and Durrett Greg. 2018. Spherical latent spaces for stable variational autoencoders. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4503–4513.Google ScholarCross Ref
[103] Yang Yanming, Xia Xin, Lo David, and Grundy John C.. 2021. A survey on deep learning for software engineering. ACM Comput. Surv. (2021).Google Scholar
[104] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime, Salakhutdinov Russ R, and Le Quoc V.. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems. 5753–5763.Google Scholar
[105] Yin Pengcheng, Zhou Chunting, He Junxian, and Neubig Graham. 2018. StructVAE: Tree-structured latent variable models for semi-supervised semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Gurevych Iryna and Miyao Yusuke (Eds.), Association for Computational Linguistics, 754–765. Retrieved from https://www.aclweb.org/anthology/P18-1070/.Google ScholarCross Ref
[106] Yuan Cangzhou, Wei Shenhong, Wang Yutong, You Yue, and ZiLiang ShangGuan. 2016. Android applications categorization using bayesian classification. In Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2016, Chengdu, China, October 13-15, 2016, Xie Bin and Xu Xiaolong (Eds.), IEEE, 173–176.Google ScholarCross Ref
[107] Zhang Jingfeng, Hong Haiwen, Zhang Yin, Wan Yao, Liu Ye, and Sui Yulei. 2021. Disentangled code representation learning for multiple programming languages. In Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Zong Chengqing, Xia Fei, Li Wenjie, and Navigli Roberto (Eds.), Association for Computational Linguistics, 4454–4466.Google ScholarCross Ref
[108] Zhao Dehai, Xing Zhenchang, Chen Chunyang, Xia Xin, and Li Guoqiang. 2019. ActionNet: Vision-based workflow action recognition from programming screencasts. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Atlee Joanne M., Bultan Tevfik, and Whittle Jon (Eds.), IEEE / ACM, 350–361.Google ScholarDigital Library
[109] Zhao Gang and Huang Jeff. 2018. DeepSim: Deep learning code functional similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL) . Association for Computing Machinery, New York, NY, 141–151.Google ScholarDigital Library
[110] Zhou Yaqin, Liu Shangqing, Siow Jingkai, Du Xiaoning, and Liu Yang. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 10197–10207.Google Scholar

Index Terms

XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training
1. Computing methodologies
  1. Artificial intelligence
2. Software and its engineering
  1. Software creation and management

Recommendations

Automating code review activities by large-scale pre-training
ESEC/FSE 2022: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Code review is an essential part to software development lifecycle since it aims at guaranteeing the quality of codes. Modern code review activities necessitate developers viewing, understanding and even running the programs to assess logic, ...
Read More
Universal Representation for Code
Advances in Knowledge Discovery and Data Mining
Abstract
Learning from source code usually requires a large amount of labeled data. Despite the possible scarcity of labeled data, the trained model is highly task-specific and lacks transferability to different tasks. In this work, we present effective ...
Read More
CodeEditor: Learning to Edit Source Code with Pre-trained Models
Developers often perform repetitive code editing activities (up to 70%) for various reasons (e.g., code refactoring) during software development. Many deep learning (DL) models have been proposed to automate code editing by learning from the code editing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Software Engineering and Methodology Volume 31, Issue 3
July 2022
912 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/3514181
Editor:
Mauro Pezzè
USI Università della Svizzera italiana and SIT Schaffhausen Institute of Technology, Switzerland
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2022
- Accepted: 1 December 2021
- Revised: 1 November 2021
- Received: 1 December 2020
Published in tosem Volume 31, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Deep learning
neural networks
code representation
cross-language
pre-training
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,477
  Total Downloads
- Downloads (Last 12 months)503
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

HTML Format

View this article in HTML Format .

View HTML Format

XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training

ACM Transactions on Software Engineering and Methodology

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

Automating code review activities by large-scale pre-training

Universal Representation for Code

CodeEditor: Learning to Edit Source Code with Pre-trained Models