skip to main content
research-article

XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training

Authors Info & Claims
Published:09 April 2022Publication History
Skip Abstract Section

Abstract

Source code representation learning is the basis of applying artificial intelligence to many software engineering tasks such as code clone detection, algorithm classification, and code summarization. Recently, many works have tried to improve the performance of source code representation from various perspectives, e.g., introducing the structural information of programs into latent representation. However, when dealing with rapidly expanded unlabeled cross-language source code datasets from the Internet, there are still two issues. Firstly, deep learning models for many code-specific tasks still suffer from the lack of high-quality labels. Secondly, the structural differences among programming languages make it more difficult to process multiple languages in a single neural architecture.

To address these issues, in this article, we propose a novel Cross-language Code representation with a large-scale pre-training (XCode) method. Concretely, we propose to use several abstract syntax trees and ELMo-enhanced variational autoencoders to obtain multiple pre-trained source code language models trained on about 1.5 million code snippets. To fully utilize the knowledge across programming languages, we further propose a Shared Encoder-Decoder (SED) architecture which uses the multi-teacher single-student method to transfer knowledge from the aforementioned pre-trained models to the distilled SED. The pre-trained models and SED will cooperate to better represent the source code. For evaluation, we examine our approach on three typical downstream cross-language tasks, i.e., source code translation, code clone detection, and code-to-code search, on a real-world dataset composed of programming exercises with multiple solutions. Experimental results demonstrate the effectiveness of our proposed approach on cross-language code representations. Meanwhile, our approach performs significantly better than several code representation baselines on different downstream tasks in terms of multiple automatic evaluation metrics.

REFERENCES

  1. [1] Abadi Martín, Barham Paul, Chen Jianmin, Chen Zhifeng, Davis Andy, Dean Jeffrey, Devin Matthieu, Ghemawat Sanjay, Irving Geoffrey, Isard Michael, Kudlur Manjunath, Levenberg Josh, Monga Rajat, Moore Sherry, Murray Derek Gordon, Steiner Benoit, Tucker Paul A., Vasudevan Vijay, Warden Pete, Wicke Martin, Yu Yuan, and Zheng Xiaoqiang. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2016, Savannah, GA, November 2-4, 2016, Keeton Kimberly and Roscoe Timothy (Eds.), USENIX Association, 265283. Retrieved from https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi.Google ScholarGoogle Scholar
  2. [2] Allamanis Miltiadis, Barr Earl T, Devanbu Premkumar, and Sutton Charles. 2018. A survey of machine learning for big code and naturalness. ACM Computing Surveys (CSUR) 51, 4 (2018), 81.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [3] Alon Uri, Brody Shaked, Levy Omer, and Yahav Eran. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net. Retrieved from https://openreview.net/forum?id=H1gKYo09tX.Google ScholarGoogle Scholar
  4. [4] Alon Uri, Zilberstein Meital, Levy Omer, and Yahav Eran. 2018. A general path-based representation for predicting program properties. ACM SIGPLAN Notices 53, 4 (2018), 404419.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Alon Uri, Zilberstein Meital, Levy Omer, and Yahav Eran. 2019. code2vec: Learning distributed representations of code. Proceedings of the ACM on Programming Languages 3, POPL (2019), 129.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. [6] Aung Thazin Win Win, Wan Yao, Huo Huan, and Sui Yulei. 2022. Multi-triage: A multi-task learning framework for bug triage. Journal of Systems and Software 184 (2022), 111133.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [7] Bahuleyan Hareesh, Mou Lili, Vechtomova Olga, and Poupart Pascal. 2018. Variational attention for sequence-to-sequence models. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, 16721682. Retrieved from https://aclanthology.org/C18-1142.Google ScholarGoogle Scholar
  8. [8] Banerjee Satanjeev and Lavie Alon. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, Ann Arbor, Michigan, June 29, 2005, Goldstein Jade, Lavie Alon, Lin Chin-Yew, and Voss Clare R. (Eds.), Association for Computational Linguistics, 6572. https://www.aclweb.org/anthology/W05-0909/.Google ScholarGoogle Scholar
  9. [9] Barbez Antoine, Khomh Foutse, and Guéhéneuc Yann-Gaël. 2019. Deep learning anti-patterns from code metrics history. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, September 29 - October 4, 2019. IEEE, 114124.Google ScholarGoogle ScholarCross RefCross Ref
  10. [10] Bhoopchand Avishkar, Rocktäschel Tim, Barr Earl T., and Riedel Sebastian. 2016. Learning python code suggestion with a sparse pointer network. arXiv:1611.08307. Retrieved from http://arxiv.org/abs/1611.08307.Google ScholarGoogle Scholar
  11. [11] Brown Tom B., Mann Benjamin, Ryder Nick, Subbiah Melanie, Kaplan Jared, Dhariwal Prafulla, Neelakantan Arvind, Shyam Pranav, Sastry Girish, Askell Amanda, Agarwal Sandhini, Herbert-Voss Ariel, Krueger Gretchen, Henighan Tom, Child Rewon, Ramesh Aditya, Ziegler Daniel M., Wu Jeffrey, Winter Clemens, Hesse Christopher, Chen Mark, Sigler Eric, Litwin Mateusz, Gray Scott, Chess Benjamin, Clark Jack, Berner Christopher, McCandlish Sam, Radford Alec, Sutskever Ilya, and Amodei Dario. 2020. Language models are few-shot learners. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.) Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.Google ScholarGoogle Scholar
  12. [12] Bui Nghi D. Q., Yu Yijun, and Jiang Lingxiao. 2021. InferCode: Self-supervised learning of code representations by predicting subtrees. In 43rd IEEE/ACM International Conference on Software Engineering (ICSE’21), Madrid, Spain, 22–30 May 2021. IEEE, 1186–1197. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. [13] Cambronero José, Li Hongyu, Kim Seohyun, Sen Koushik, and Chandra Satish. 2019. When deep learning met code search. In Proceedings of the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2019, Tallinn, Estonia, August 26-30, 2019, Dumas Marlon, Pfahl Dietmar, Apel Sven, and Russo Alessandra (Eds.), ACM, 964974.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. [14] Casale Francesco Paolo, Dalca Adrian V., Saglietti Luca, Listgarten Jennifer, and Fusi Nicoló. 2018. Gaussian process prior variational autoencoders. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Bengio Samy, Wallach Hanna M., Larochelle Hugo, Grauman Kristen, Cesa-Bianchi Nicolò, and Garnett Roman (Eds.). 1039010401. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/1c336b8080f82bcc2cd2499b4c57261d-Abstract.html.Google ScholarGoogle Scholar
  15. [15] Chen Junjie, Wu Zhuo, Wang Zan, You Hanmo, Zhang Lingming, and Yan Ming. 2020. Practical accuracy estimation for efficient deep neural network testing. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 30:1–30:35.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. [16] Chen Mingda, Tang Qingming, Wiseman Sam, and Gimpel Kevin. 2019. Controllable paraphrase generation with a syntactic exemplar. In Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Korhonen Anna, Traum David R., and Màrquez Lluís (Eds.), Association for Computational Linguistics, 59725984.Google ScholarGoogle ScholarCross RefCross Ref
  17. [17] Chen Mingda, Tang Qingming, Wiseman Sam, and Gimpel Kevin. 2019. A multi-task approach for disentangling syntax and semantics in sentence representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers), Burstein Jill, Doran Christy, and Solorio Thamar (Eds.), Association for Computational Linguistics, 24532464.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Chen Qingying and Zhou Minghui. 2018. A neural framework for retrieval and summarization of source code. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, Huchard Marianne, Kästner Christian, and Fraser Gordon (Eds.), ACM, 826831.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. [19] Chen Xinyun, Liu Chang, and Song Dawn. 2018. Tree-to-tree neural networks for program translation. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Bengio Samy, Wallach Hanna M., Larochelle Hugo, Grauman Kristen, Cesa-Bianchi Nicolò, and Garnett Roman (Eds.), 25522562. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/d759175de8ea5b1d9a2660e45554894f-Abstract.html.Google ScholarGoogle Scholar
  20. [20] Chi Zewen, Dong Li, Wei Furu, Wang Wenhui, Mao Xian-Ling, and Huang Heyan. 2020. Cross-lingual natural language generation via pre-training. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, The 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 75707577. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6256.Google ScholarGoogle Scholar
  21. [21] Choi Yunjey, Choi Min-Je, Kim Munyoung, Ha Jung-Woo, Kim Sunghun, and Choo Jaegul. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18), Salt Lake City, UT, USA, June 18–22, 2018. Computer Vision Foundation/IEEE Computer Society, 8789–8797. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Collard Michael L., Decker Michael John, and Maletic Jonathan I.. 2013. srcML: An infrastructure for the exploration, analysis, and manipulation of source code: A tool demonstration. In IEEE International Conference on Software Maintenance, Eindhoven, The Netherlands, September 22–28, 2013. IEEE Computer Society, 516–519. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. [23] Conneau Alexis and Lample Guillaume. 2019. Cross-lingual language model pretraining. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Wallach Hanna M., Larochelle Hugo, Beygelzimer Alina, d’Alché-Buc Florence, Fox Emily B., and Garnett Roman (Eds.), 70577067. Retrieved from https://proceedings.neurips.cc/paper/2019/hash/c04c19c2c2474dbf5f7ac4372c5b9af1-Abstract.html.Google ScholarGoogle Scholar
  24. [24] Cummins Chris, Petoumenos Pavlos, Murray Alastair, and Leather Hugh. 2018. Compiler fuzzing through deep learning. In Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018, Tip Frank and Bodden Eric (Eds.), ACM, 95105.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Cvitkovic Milan, Singh Badal, and Anandkumar Animashree. 2019. Open vocabulary learning on source code with a graph-structured cache. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Chaudhuri Kamalika and Salakhutdinov Ruslan (Eds.), PMLR, 14751485. Retrieved from http://proceedings.mlr.press/v97/cvitkovic19b.html.Google ScholarGoogle Scholar
  26. [26] David Yaniv, Alon Uri, and Yahav Eran. 2020. Neural reverse engineering of stripped binaries using augmented control flow graphs. Proceedings of the ACM on Programming Languages 4, OOPSLA (2020), 225:1–225:28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] DAVIS AL. 1982. Data flow program graphs. Computer 15, 2 (1982), 2641. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. [28] Devlin Jacob, Chang Ming-Wei, Lee Kenton, and Toutanova Kristina. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, June 2-7, 2019, Volume 1 (Long and Short Papers), Burstein Jill, Doran Christy, and Solorio Thamar (Eds.), Association for Computational Linguistics, 41714186.Google ScholarGoogle Scholar
  29. [29] Eric Mihail, Goel Rahul, Paul Shachi, Sethi Abhishek, Agarwal Sanchit, Gao Shuyang, Kumar Adarsh, Goyal Anuj Kumar, Ku Peter, and Hakkani-Tür Dilek. 2020. MultiWOZ 2.1: A consolidated multi-domain dialogue dataset with state corrections and state tracking baselines. In Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, May 11-16, 2020, Calzolari Nicoletta, Béchet Frédéric, Blache Philippe, Choukri Khalid, Cieri Christopher, Declerck Thierry, Goggi Sara, Isahara Hitoshi, Maegaard Bente, Mariani Joseph, Mazo Hélène, Moreno Asunción, Odijk Jan, and Piperidis Stelios (Eds.), European Language Resources Association, 422428.Google ScholarGoogle Scholar
  30. [30] Fan Yang, Tian Fei, Qin Tao, Li Xiang-Yang, and Liu Tie-Yan. 2018. Learning to teach. In Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net. Retrieved from https://openreview.net/forum?id=HJewuJWCZ.Google ScholarGoogle Scholar
  31. [31] Feng Zhangyin, Guo Daya, Tang Duyu, Duan Nan, Feng Xiaocheng, Gong Ming, Shou Linjun, Qin Bing, Liu Ting, Jiang Daxin, and Zhou Ming. 2020. CodeBERT: A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Trevor Cohn, Yulan He, and Yang Liu (Eds.). Association for Computational Linguistics, 1536–1547. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  32. [32] Firat Orhan, Cho Kyunghyun, and Bengio Yoshua. 2016. Multi-way, multilingual neural machine translation with a shared attention mechanism. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, June 12-17, 2016, Knight Kevin, Nenkova Ani, and Rambow Owen (Eds.), The Association for Computational Linguistics, 866875.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Ganin Yaroslav, Ustinova Evgeniya, Ajakan Hana, Germain Pascal, Larochelle Hugo, Laviolette François, Marchand Mario, and Lempitsky Victor S.. 2017. Domain-adversarial training of neural networks. In Proceedings of the Domain Adaptation in Computer Vision Applications, Csurka Gabriela (Ed.), Springer, 189209.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] Gao Yi, Wang Zan, Liu Shuang, Yang Lin, Sang Wei, and Cai Yuanfang. 2019. TECCD: A tree embedding approach for code clone detection. In Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, September 29 - October 4, 2019. IEEE, 145156.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] Gao Zhipeng, Xia Xin, Grundy John, Lo David, and Li Yuan-Fang. 2020. Generating question titles for stack overflow from mined code snippets. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 26:1–26:37.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. [36] Godefroid Patrice, Peleg Hila, and Singh Rishabh. 2017. Learn&Fuzz: Machine learning for input fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, October 30 - November 03, 2017, Rosu Grigore, Penta Massimiliano Di, and Nguyen Tien N. (Eds.), IEEE Computer Society, 5059.Google ScholarGoogle ScholarCross RefCross Ref
  37. [37] Gu Xiaodong, Zhang Hongyu, and Kim Sunghun. 2018. Deep code search. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Chaudron Michel, Crnkovic Ivica, Chechik Marsha, and Harman Mark (Eds.), ACM, 933944.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. [38] Guo Daya, Ren Shuo, Lu Shuai, Feng Zhangyin, Tang Duyu, Liu Shujie, Zhou Long, Duan Nan, Svyatkovskiy Alexey, Fu Shengyu, Tufano Michele, Deng Shao Kun, Clement Colin B., Drain Dawn, Sundaresan Neel, Yin Jian, Jiang Daxin, and Zhou Ming. 2021. GraphCodeBERT: Pre-training code representations with data flow. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3–7, 2021. OpenReview.net.Google ScholarGoogle Scholar
  39. [39] He Kaiming, Zhang Xiangyu, Ren Shaoqing, and Sun Jian. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16), Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  40. [40] Hindle Abram, Barr Earl T, Su Zhendong, Gabel Mark, and Devanbu Premkumar. 2012. On the naturalness of software. In Proceedings of the 2012 34th International Conference on Software Engineering, Martin Glinz, Gail C. Murphy, and Mauro Pezzè (Eds.). IEEE, 837847. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  41. [41] Hinton Geoffrey E., Vinyals Oriol, and Dean Jeffrey. 2015. Distilling the knowledge in a neural network. arxiv:1503.02531. Retrieved from http://arxiv.org/abs/1503.02531.Google ScholarGoogle Scholar
  42. [42] Hochreiter Sepp and Schmidhuber Jürgen. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 17351780.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. [43] Hong Haiwen, Zhang Jingfeng, Zhang Yin, Wan Yao, and Sui Yulei. 2021. Fix-filter-fix: Intuitively connect any models for effective bug fixing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Moens Marie-Francine, Huang Xuanjing, Specia Lucia, and Yih Scott Wen-tau (Eds.), Association for Computational Linguistics, 34953504. Retrieved from https://aclanthology.org/2021.emnlp-main.282.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Hsu Wei-Ning, Zhang Yu, and Glass James R.. 2017. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017. IEEE, 1623.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] Hu Xing, Li Ge, Xia Xin, Lo David, and Jin Zhi. 2018. Deep code comment generation. In Proceedings of the 26th Conference on Program Comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018, Khomh Foutse, Roy Chanchal K., and Siegmund Janet (Eds.), ACM, 200210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. [46] Hu Zhiting, Yang Zichao, Liang Xiaodan, Salakhutdinov Ruslan, and Xing Eric P.. 2017. Toward controlled generation of text. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017(Proceedings of Machine Learning Research, Vol. 70), Precup Doina and Teh Yee Whye (Eds.), PMLR, 15871596. Retrieved from http://proceedings.mlr.press/v70/hu17e.html.Google ScholarGoogle Scholar
  47. [47] Huang Huaibo, Li Zhihang, He Ran, Sun Zhenan, and Tan Tieniu. 2018. IntroVAE: Introspective variational autoencoders for photographic image synthesis. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, Bengio Samy, Wallach Hanna M., Larochelle Hugo, Grauman Kristen, Cesa-Bianchi Nicolò, and Garnett Roman (Eds.), 5263. Retrieved from https://proceedings.neurips.cc/paper/2018/hash/093f65e080a295f8076b1c5722a46aa2-Abstract.html.Google ScholarGoogle Scholar
  48. [48] Huo Xuan, Thung Ferdian, Li Ming, Lo David, and Shi Shu-Ting. 2021. Deep transfer bug localization. IEEE Transactions on Software Engineering 47, 7 (2021), 13681380.Google ScholarGoogle ScholarCross RefCross Ref
  49. [49] Jiang Lucy, Rewcastle Robert, Denny Paul, and Tempero Ewan D.. 2020. CompareCFG: Providing visual feedback on code quality using control flow graphs. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education, ITiCSE 2020, Trondheim, Norway, June 15-19, 2020, Giannakos Michail N., Sindre Guttorm, Luxton-Reilly Andrew, and Divitini Monica (Eds.), ACM, 493499.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. [50] Jouppi Norman P., Young Cliff, Patil Nishant, Patterson David A., Agrawal Gaurav, Bajwa Raminder, Bates Sarah, Bhatia Suresh, Boden Nan, Borchers Al, Boyle Rick, Cantin Pierre-luc, Chao Clifford, Clark Chris, Coriell Jeremy, Daley Mike, Dau Matt, Dean Jeffrey, Gelb Ben, Ghaemmaghami Tara Vazir, Gottipati Rajendra, Gulland William, Hagmann Robert, Ho C. Richard, Hogberg Doug, Hu John, Hundt Robert, Hurt Dan, Ibarz Julian, Jaffey Aaron, Jaworski Alek, Kaplan Alexander, Khaitan Harshit, Killebrew Daniel, Koch Andy, Kumar Naveen, Lacy Steve, Laudon James, Law James, Le Diemthu, Leary Chris, Liu Zhuyuan, Lucke Kyle, Lundin Alan, MacKean Gordon, Maggiore Adriana, Mahony Maire, Miller Kieran, Nagarajan Rahul, Narayanaswami Ravi, Ni Ray, Nix Kathy, Norrie Thomas, Omernick Mark, Penukonda Narayana, Phelps Andy, Ross Jonathan, Ross Matt, Salek Amir, Samadiani Emad, Severn Chris, Sizikov Gregory, Snelham Matthew, Souter Jed, Steinberg Dan, Swing Andy, Tan Mercedes, Thorson Gregory, Tian Bo, Toma Horia, Tuttle Erick, Vasudevan Vijay, Walter Richard, Wang Walter, Wilcox Eric, and Yoon Doe Hyun. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. ACM, 112.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Kim Kisub, Kim Dongsun, Bissyandé Tegawendé F., Choi Eunjong, Li Li, Klein Jacques, and Traon Yves Le. 2018. FaCoY: A code-to-code search engine. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Chaudron Michel, Crnkovic Ivica, Chechik Marsha, and Harman Mark (Eds.), ACM, 946957.Google ScholarGoogle Scholar
  52. [52] Kim Yoon. 2014. Convolutional neural networks for sentence classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, Moschitti Alessandro, Pang Bo, and Daelemans Walter (Eds.), ACL, 17461751.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Kim Yoon and Rush Alexander M.. 2016. Sequence-level knowledge distillation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, November 1-4, 2016, Su Jian, Carreras Xavier, and Duh Kevin (Eds.), The Association for Computational Linguistics, 13171327. Retrieved from http://aclweb.org/anthology/D/D16/D16-1139.pdf.Google ScholarGoogle ScholarCross RefCross Ref
  54. [54] Kingma Diederik P. and Welling Max. 2014. Auto-encoding variational bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Bengio Yoshua and LeCun Yann (Eds.).Google ScholarGoogle Scholar
  55. [55] LeCun Yann and Bengio Yoshua. 1998. Convolutional Networks for Images, Speech, and Time Series. MIT Press, Cambridge, MA, 255258.Google ScholarGoogle Scholar
  56. [56] Lee Jason, Cho Kyunghyun, and Hofmann Thomas. 2017. Fully character-level neural machine translation without explicit segmentation. Transactions of the Association for Computational Linguistics 5 (2017), 365378. Retrieved from https://transacl.org/ojs/index.php/tacl/article/view/1051.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Lin Chin-Yew. 2004. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 7481.Google ScholarGoogle Scholar
  58. [58] Lin Zehao, Huang Xinjing, Ji Feng, Chen Haiqing, and Zhang Yin. 2019. Task-oriented conversation generation using heterogeneous memory networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Inui Kentaro, Jiang Jing, Ng Vincent, and Wan Xiaojun (Eds.), Association for Computational Linguistics, 45574566.Google ScholarGoogle ScholarCross RefCross Ref
  59. [59] Liu Fang, Zhang Lu, and Jin Zhi. 2020. Modeling programs hierarchically with stack-augmented LSTM. Journal of Systems and Software 164 (2020), 110547. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  60. [60] Liu Pengfei, Qiu Xipeng, and Huang Xuanjing. 2016. Recurrent neural network for text classification with multi-task learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, 9-15 July 2016, Kambhampati Subbarao (Ed.), IJCAI/AAAI Press, 28732879.Google ScholarGoogle Scholar
  61. [61] Liu Shangqing, Gao Cuiyun, Chen Sen, Nie Lun Yiu, and Liu Yang. 2020. ATOM: Commit message generation based on abstract syntax tree and hybrid ranking. IEEE Transactions on Software Engineering 1 (2020), 1–1.Google ScholarGoogle Scholar
  62. [62] Liu Yinhan, Ott Myle, Goyal Naman, Du Jingfei, Joshi Mandar, Chen Danqi, Levy Omer, Lewis Mike, Zettlemoyer Luke, and Stoyanov Veselin. 2019. RoBERTa: A robustly optimized BERT pretraining approach. CoRR abs/1907.11692.Google ScholarGoogle Scholar
  63. [63] Liu Zihan, Winata Genta Indra, Xu Peng, Lin Zhaojiang, and Fung Pascale. 2020. Cross-lingual spoken language understanding with regularized representation alignment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Webber Bonnie, Cohn Trevor, He Yulan, and Liu Yang (Eds.), Association for Computational Linguistics, 72417251.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Lu Shuai, Guo Daya, Ren Shuo, Huang Junjie, Svyatkovskiy Alexey, Blanco Ambrosio, Clement Colin B., Drain Dawn, Jiang Daxin, Tang Duyu, Li Ge, Zhou Lidong, Shou Linjun, Zhou Long, Tufano Michele, Gong Ming, Zhou Ming, Duan Nan, Sundaresan Neel, Deng Shao Kun, Fu Shengyu, and Liu Shujie. 2021. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1).Google ScholarGoogle Scholar
  65. [65] Matiisen Tambet, Oliver A., Cohen T., and Schulman John. 2020. Teacher-student curriculum learning. IEEE Transactions on Neural Networks and Learning Systems 31, 9 (2020), 37323740. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Mikolov Tomas, Chen Kai, Corrado Greg, and Dean Jeffrey. 2013. Efficient estimation of word representations in vector space. In Proceedings of the 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, May 2-4, 2013, Workshop Track Proceedings, Bengio Yoshua and LeCun Yann (Eds.), http://arxiv.org/abs/1301.3781.Google ScholarGoogle Scholar
  67. [67] Mou Lili, Li Ge, Zhang Lu, Wang Tao, and Jin Zhi. 2016. Convolutional neural networks over tree structures for programming language processing. In Proceedings of the 13th AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, Schuurmans Dale and Wellman Michael P. (Eds.), AAAI Press, 12871293. Retrieved from http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11775.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Nafi Kawser Wazed, Kar Tonny Shekha, Roy Banani, Roy Chanchal K., and Schneider Kevin A.. 2019. CLCDSA: Cross language code clone detection using syntactical features and API documentation. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, November 11-15, 2019. IEEE, 10261037.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. [69] Neamtiu Iulian, Foster Jeffrey S., and Hicks Michael. 2005. Understanding source code evolution using abstract syntax tree matching. ACM SIGSOFT Software Engineering Notes 30, 4 (2005), 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. [70] Nghi Bui D. Q., Yu Yijun, and Jiang Lingxiao. 2019. Bilateral dependency neural networks for cross-language algorithm classification. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, Wang Xinyu, Lo David, and Shihab Emad (Eds.). IEEE, 422433.Google ScholarGoogle Scholar
  71. [71] Papineni Kishore, Roukos Salim, Ward Todd, and Zhu Wei-Jing. 2002. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA311318. Retrieved from http://www.aclweb.org/anthology/P02-1040.pdf.Google ScholarGoogle Scholar
  72. [72] Paszke Adam, Gross Sam, Massa Francisco, Lerer Adam, Bradbury James, Chanan Gregory, Killeen Trevor, Lin Zeming, Gimelshein Natalia, Antiga Luca, Desmaison Alban, Köpf Andreas, Yang Edward Z., DeVito Zachary, Raison Martin, Tejani Alykhan, Chilamkurthy Sasank, Steiner Benoit, Fang Lu, Bai Junjie, and Chintala Soumith. 2019. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, Wallach Hanna M., Larochelle Hugo, Beygelzimer Alina, d’Alché-Buc Florence, Fox Emily B., and Garnett Roman (Eds.), 80248035. Retrieved from https://proceedings.neurips.cc/paper/2019/hash/bdbca288fee7f92f2bfa9f7012727740-Abstract.html.Google ScholarGoogle Scholar
  73. [73] Peng Shuke, Ji Feng, Lin Zehao, Cui Shaobo, Chen Haiqing, and Zhang Yin. 2020. MTSS: Learn from multiple domain teachers and become a multi-domain dialogue expert. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI 2020, the 32nd Innovative Applications of Artificial Intelligence Conference, IAAI 2020, the 10th AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, February 7-12, 2020. AAAI Press, 86088615. Retrieved from https://aaai.org/ojs/index.php/AAAI/article/view/6384.Google ScholarGoogle Scholar
  74. [74] Phannachitta Passakorn. 2020. On an optimal analogy-based software effort estimation. Information and Software Technology 125 (2020), 106330. DOI:Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Rahman Md, Watanobe Yutaka, and Nakamura Keita. 2020. Source code assessment and classification based on estimated error probability using attentive LSTM language model and its application in programming education. Applied Sciences 10, 8 (2020), 2973.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Rozière Baptiste, Lachaux Marie-Anne, Chanussot Lowik, and Lample Guillaume. 2020. Unsupervised translation of programming languages. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/ed23fbf18c2cd35f8c7f8de44f85c08d-Abstract.html.Google ScholarGoogle Scholar
  77. [77] Rumelhart David E., Hinton Geoffrey E., and Williams Ronald J.. 1986. Learning representations by back-propagating errors. Nature 323, 6088 (1986), 533536. Retrieved from http://www.nature.com/articles/323533a0.Google ScholarGoogle ScholarCross RefCross Ref
  78. [78] Schuster Mike and Paliwal Kuldip K. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 11 (1997), 26732681.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. [79] Shido Yusuke, Kobayashi Yasuaki, Yamamoto Akihiro, Miyamoto Atsushi, and Matsumura Tadayuki. 2019. Automatic source code summarization with extended tree-LSTM. In Proceedings of the International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019. IEEE, 18.Google ScholarGoogle ScholarCross RefCross Ref
  80. [80] Szegedy Christian, Ioffe Sergey, Vanhoucke Vincent, and Alemi Alexander A.. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, Singh Satinder P. and Markovitch Shaul (Eds.), AAAI Press, 42784284. Retrieved from http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14806.Google ScholarGoogle ScholarCross RefCross Ref
  81. [81] Tan Mingxing and Le Quoc V.. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California(Proceedings of Machine Learning Research, Vol. 97), Chaudhuri Kamalika and Salakhutdinov Ruslan (Eds.), PMLR, 61056114. Retrieved from http://proceedings.mlr.press/v97/tan19a.html.Google ScholarGoogle Scholar
  82. [82] Tan Xu, Ren Yi, He Di, Qin Tao, Zhao Zhou, and Liu Tie-Yan. 2019. Multilingual neural machine translation with knowledge distillation. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, May 6-9, 2019. OpenReview.net. Retrieved from https://openreview.net/forum?id=S1gUsoR9YX.Google ScholarGoogle Scholar
  83. [83] Tian Gang, Wang Qibo, Zhao Yi, Guo Lantian, Sun Zhonglin, and Lv Liangyu. 2020. Smart contract classification with a Bi-LSTM based approach. IEEE Access 8 (2020), 4380643816.Google ScholarGoogle ScholarCross RefCross Ref
  84. [84] Tölke Jonas. 2010. Implementation of a lattice boltzmann kernel using the compute unified device architecture developed by nVIDIA. Computing and Visualization in Science 13, 1 (2010), 29.Google ScholarGoogle ScholarCross RefCross Ref
  85. [85] Tong Haonan, Liu Bin, and Wang Shihai. 2018. Software defect prediction using stacked denoising autoencoders and two-stage ensemble learning. Information and Software Technology 96 (2018), 94111.Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. [86] Tran Chau, Tang Yuqing, Li Xian, and Gu Jiatao. 2020. Cross-lingual retrieval for iterative self-supervised training. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Larochelle Hugo, Ranzato Marc’Aurelio, Hadsell Raia, Balcan Maria-Florina, and Lin Hsuan-Tien (Eds.). Retrieved from https://proceedings.neurips.cc/paper/2020/hash/1763ea5a7e72dd7ee64073c2dda7a7a8-Abstract.html.Google ScholarGoogle Scholar
  87. [87] Tufano Michele, Watson Cody, Bavota Gabriele, Penta Massimiliano Di, White Martin, and Poshyvanyk Denys. 2019. An empirical study on learning bug-fixing patches in the wild via neural machine translation. ACM Transactions on Software Engineering and Methodology 28, 4 (2019), 19:1–19:29. DOI:Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. [88] Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N., Kaiser Lukasz, and Polosukhin Illia. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA,Guyon Isabelle, Luxburg Ulrike von, Bengio Samy, Wallach Hanna M., Fergus Rob, Vishwanathan S. V. N., and Garnett Roman (Eds.), 59986008. Retrieved from http://papers.nips.cc/paper/7181-attention-is-all-you-need.Google ScholarGoogle Scholar
  89. [89] Vedantam Ramakrishna, Zitnick C. Lawrence, and Parikh Devi. 2015. CIDEr: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, June 7-12, 2015. IEEE Computer Society, 45664575.Google ScholarGoogle ScholarCross RefCross Ref
  90. [90] VenkataKeerthy S., Aggarwal R., Jain S., Desarkar Maunendra Sankar, Upadrasta Ramakrishna, and Srikant Y. N.. 2019. IR2Vec: A flow analysis based scalable infrastructure for program encodings. CoRR abs/1909.06228.Google ScholarGoogle Scholar
  91. [91] Wan Yao, Shu Jingdong, Sui Yulei, Xu Guandong, Zhao Zhou, Wu Jian, and Yu Philip S.. 2019. Multi-modal attention network learning for semantic source code retrieval. In Proceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering, ASE 2019, San Diego, CA, November 11-15, 2019. IEEE, 1325.Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. [92] Wan Yao, Zhao Zhou, Yang Min, Xu Guandong, Ying Haochao, Wu Jian, and Yu Philip S. 2018. Improving automatic source code summarization via deep reinforcement learning. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 397407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. [93] Wang Senzhang, Cao Jiannong, and Yu Philip. 2020. Deep learning for spatio-temporal data mining: A survey. IEEE Transactions on Knowledge and Data Engineering (2020). https://ieeexplore.ieee.org/document/9204396/citations#citations.Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Wang Wenhan, Li Ge, Shen Sijie, Xia Xin, and Jin Zhi. 2020. Modular tree network for source code representation learning. ACM Transactions on Software Engineering and Methodology 29, 4 (2020), 123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. [95] Wang Wenhan, Zhang Kechi, Li Ge, and Jin Zhi. 2020. Learning to represent programs with heterogeneous graphs. CoRR.Google ScholarGoogle Scholar
  96. [96] Wang Xin, Huang Qiuyuan, Celikyilmaz Asli, Gao Jianfeng, Shen Dinghan, Wang Yuan-Fang, Wang William Yang, and Zhang Lei. 2019. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 66296638.Google ScholarGoogle ScholarCross RefCross Ref
  97. [97] Wang Xin, Wang Yasheng, Mi Fei, Zhou Pingyi, Wan Yao, Liu Xiao, Li Li, Wu Hao, Liu Jin, and Jiang Xin. 2021. SynCoBERT: Syntax-guided multi-modal contrastive pre-training for code representation. arXiv:2108.04556. Retrieved from https://arxiv.org/abs/2108.04556.Google ScholarGoogle Scholar
  98. [98] White Martin, Vendome Christopher, Linares-Vásquez Mario, and Poshyvanyk Denys. 2015. Toward deep learning software repositories. In Proceedings of the 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories. IEEE, 334345.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Xiao Yan, Keung Jacky, Bennin Kwabena E, and Mi Qing. 2019. Improving bug localization with word embedding and enhanced convolutional neural networks. Information and Software Technology 105 (2019), 1729.Google ScholarGoogle ScholarCross RefCross Ref
  100. [100] Xu A., Dai T., Chen Huajun, Ming Zhe, and Li W.. 2018. Vulnerability detection for source code using contextual LSTM. In Proceedings of the 2018 5th International Conference on Systems and Informatics. 12251230.Google ScholarGoogle Scholar
  101. [101] Xu Bowen, Ye Deheng, Xing Zhenchang, Xia Xin, Chen Guibin, and Li Shanping. 2016. Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, Singapore, September 3-7, 2016, Lo David, Apel Sven, and Khurshid Sarfraz (Eds.), ACM, 5162.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. [102] Xu Jiacheng and Durrett Greg. 2018. Spherical latent spaces for stable variational autoencoders. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 45034513.Google ScholarGoogle ScholarCross RefCross Ref
  103. [103] Yang Yanming, Xia Xin, Lo David, and Grundy John C.. 2021. A survey on deep learning for software engineering. ACM Comput. Surv. (2021).Google ScholarGoogle Scholar
  104. [104] Yang Zhilin, Dai Zihang, Yang Yiming, Carbonell Jaime, Salakhutdinov Russ R, and Le Quoc V.. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. In Proceedings of the Advances in Neural Information Processing Systems. 57535763.Google ScholarGoogle Scholar
  105. [105] Yin Pengcheng, Zhou Chunting, He Junxian, and Neubig Graham. 2018. StructVAE: Tree-structured latent variable models for semi-supervised semantic parsing. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Volume 1: Long Papers, Gurevych Iryna and Miyao Yusuke (Eds.), Association for Computational Linguistics, 754765. Retrieved from https://www.aclweb.org/anthology/P18-1070/.Google ScholarGoogle ScholarCross RefCross Ref
  106. [106] Yuan Cangzhou, Wei Shenhong, Wang Yutong, You Yue, and ZiLiang ShangGuan. 2016. Android applications categorization using bayesian classification. In Proceedings of the International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, CyberC 2016, Chengdu, China, October 13-15, 2016, Xie Bin and Xu Xiaolong (Eds.), IEEE, 173176.Google ScholarGoogle ScholarCross RefCross Ref
  107. [107] Zhang Jingfeng, Hong Haiwen, Zhang Yin, Wan Yao, Liu Ye, and Sui Yulei. 2021. Disentangled code representation learning for multiple programming languages. In Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021(Findings of ACL, Vol. ACL/IJCNLP 2021), Zong Chengqing, Xia Fei, Li Wenjie, and Navigli Roberto (Eds.), Association for Computational Linguistics, 44544466.Google ScholarGoogle ScholarCross RefCross Ref
  108. [108] Zhao Dehai, Xing Zhenchang, Chen Chunyang, Xia Xin, and Li Guoqiang. 2019. ActionNet: Vision-based workflow action recognition from programming screencasts. In Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Atlee Joanne M., Bultan Tevfik, and Whittle Jon (Eds.), IEEE / ACM, 350361.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. [109] Zhao Gang and Huang Jeff. 2018. DeepSim: Deep learning code functional similarity. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL) . Association for Computing Machinery, New York, NY, 141151.Google ScholarGoogle ScholarDigital LibraryDigital Library
  110. [110] Zhou Yaqin, Liu Shangqing, Siow Jingkai, Du Xiaoning, and Liu Yang. 2019. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In Proceedings of the Advances in Neural Information Processing Systems. 1019710207.Google ScholarGoogle Scholar

Index Terms

  1. XCode: Towards Cross-Language Code Representation with Large-Scale Pre-Training

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 31, Issue 3
        July 2022
        912 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/3514181
        • Editor:
        • Mauro Pezzè
        Issue’s Table of Contents

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 April 2022
        • Accepted: 1 December 2021
        • Revised: 1 November 2021
        • Received: 1 December 2020
        Published in tosem Volume 31, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      View Full Text

      HTML Format

      View this article in HTML Format .

      View HTML Format