Abstract
Code summarization plays a pivotal role in the field of software engineering by offering developers a concise natural language comprehension of source code semantics. As software complexity continues to escalate, code summarization confronts various challenges, including discrepancies between source code and summarization, the absence of crucial or up-to-date information, and the inefficiency and resource demands of manual summarization. To address these challenges, Automatic Source Code Summarization (ASCS) has garnered widespread attention. This paper presents a comprehensive review and synthesis of ASCS research. It aims to provide an in-depth understanding of the core issues and challenges inherent in each phase of ASCS, illustrated with specific examples and application scenarios. Around of the core phases of ASCS including data collection, source code modeling, the generation of code summaries, and the assessment of their quality, the paper thoroughly compiles and assesses existing datasets, categorizes and examines prevalent source code modeling techniques, and delves into the methods for generating and evaluating the quality of code summaries. Concluding with an exploration of future research avenues and emerging trends, this paper serves as a guide for readers to grasp the cutting-edge developments in this field, enriched by the analysis of pivotal research contributions.











Similar content being viewed by others
Data Availability Statement
Data availability is not applicable to this article as no new data were created or analyzed in this study.
References
Abid NJ, Dragan N, Collard ML, Maletic JI (2015) Using stereotypes in the automatic generation of natural language summaries for C++ methods. In: Koschke R, Krinke J, Robillard MP (eds) 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29 - October 1, 2015, IEEE Computer Society, pp 561–565, https://doi.org/10.1109/ICSM.2015.7332514
Ahmad WU, Chakraborty S, Ray B, Chang K (2020) A transformer-based approach for source code summarization. In: Jurafsky D, Chai J, Schluter N, Tetreault JR (eds) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, Association for Computational Linguistics, pp 4998–5007, https://doi.org/10.18653/V1/2020.ACL-MAIN.449
Ahmed T, Devanbu PT (2022a) Few-shot training llms for project-specific code-summarization. In: 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022, ACM, pp 177:1–177:5, https://doi.org/10.1145/3551349.3559555
Ahmed T, Devanbu PT (2022b) Learning code summarization from a small and local dataset. https://doi.org/10.48550/ARXIV.2206.00804, arXiv:2206.00804
Ahmed T, Pai KS, Devanbu PT, Barr ET (2024) Automatic semantic augmentation of language model prompts (for code summarization). In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024, ACM, pp 220:1–220:13, https://doi.org/10.1145/3597503.3639183
Al-Msie’deen R, Blasi AH (2019) Supporting software documentation with source code summarization. arXiv:1901.01186
Allamanis M, Peng H, Sutton C (2016) A convolutional attention network for extreme summarization of source code. In: Balcan M, Weinberger KQ (eds) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR.org, JMLR Workshop and Conference Proceedings, vol 48, pp 2091–2100, http://proceedings.mlr.press/v48/allamanis16.html
Allamanis M, Brockschmidt M, Khademi M (2018) Learning to represent programs with graphs. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, OpenReview.net, https://openreview.net/forum?id=BJOFETxR-
Alon U, Brody S, Levy O, Yahav E (2019a) code2seq: Generating sequences from structured representations of code. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, https://openreview.net/forum?id=H1gKYo09tX
Alon U, Zilberstein M, Levy O, Yahav E (2019b) code2vec: learning distributed representations of code. Proc ACM Program Lang 3(POPL):40:1–40:29, https://doi.org/10.1145/3290353
Bai Y, Zhang L, Zhao F (2019) A survey on research of code comment. In: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences, ICMSS 2019, Wuhan, China, January 12-14, 2019, ACM, pp 45–51, https://doi.org/10.1145/3312662.3312710
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Goldstein J, Lavie A, Lin C, Voss CR (eds) Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, Ann Arbor, Michigan, USA, June 29, 2005, Association for Computational Linguistics, pp 65–72, https://aclanthology.org/W05-0909/
Barone AVM, Sennrich R (2017) A parallel corpus of python functions and documentation strings for automated code documentation and code generation. In: Kondrak G, Watanabe T (eds) Proceedings of the Eighth International Joint Conference on Natural Language Processing, IJCNLP 2017, Taipei, Taiwan, November 27 - December 1, 2017, Volume 2: Short Papers, Asian Federation of Natural Language Processing, pp 314–319, https://aclanthology.org/I17-2053/
Bui NDQ, Yu Y, Jiang L (2021) Self-supervised contrastive learning for code retrieval and summarization via semantic-preserving transformations. In: Diaz F, Shah C, Suel T, Castells P, Jones R, Sakai T (eds) SIGIR ’21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, Canada, July 11-15, 2021, ACM, pp 511–521, https://doi.org/10.1145/3404835.3462840
Chen F, Kim M, Choo J (2021a) Novel natural language summarization of program code via leveraging multiple input representations. In: Moens M, Huang X, Specia L, Yih SW (eds) Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, Association for Computational Linguistics, pp 2510–2520, https://doi.org/10.18653/v1/2021.findings-emnlp.214
Chen M, Tworek J, Jun H, Yuan Q, de Oliveira Pinto HP, Kaplan J, Edwards H, Burda Y, Joseph N, Brockman G, Ray A, Puri R, Krueger G, Petrov M, Khlaaf H, Sastry G, Mishkin P, Chan B, Gray S, Ryder N, Pavlov M, Power A, Kaiser L, Bavarian M, Winter C, Tillet P, Such FP, Cummings D, Plappert M, Chantzis F, Barnes E, Herbert-Voss A, Guss WH, Nichol A, Paino A, Tezak N, Tang J, Babuschkin I, Balaji S, Jain S, Saunders W, Hesse C, Carr AN, Leike J, Achiam J, Misra V, Morikawa E, Radford A, Knight M, Brundage M, Murati M, Mayer K, Welinder P, McGrew B, Amodei D, McCandlish S, Sutskever I, Zaremba W (2021b) Evaluating large language models trained on code. arXiv:2107.03374
Chen Q, Zhou M (2018) A neural framework for retrieval and summarization of source code. In: Huchard M, Kästner C, Fraser G (eds) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, ACM, pp 826–831, https://doi.org/10.1145/3238147.3240471
Chen Q, Hu H, Liu Z (2019) Code summarization with abstract syntax tree. In: Gedeon T, Wong KW, Lee M (eds) Neural Information Processing - 26th International Conference, ICONIP 2019, Sydney, NSW, Australia, December 12-15, 2019, Proceedings, Part V, Springer, Communications in Computer and Information Science, vol 1143, pp 652–660, https://doi.org/10.1007/978-3-030-36802-9_69
Chen Q, Xia X, Hu H, Lo D, Li S (2021c) Why my code summarization model does not work: Code comment improvement with category prediction. ACM Trans Softw Eng Methodol 30(2):25:1–25:29, https://doi.org/10.1145/3434280
Chen Z, Monperrus M (2019) A literature study of embeddings on source code. arXiv:1904.03061
Cheng J, Fostiropoulos I, Boehm BW (2021) Gn-transformer: Fusing sequence and graph representation for improved code summarization. arXiv:2111.08874
Cheng W, Hu P, Wei S, Mo R (2022) Keyword-guided abstractive code summarization via incorporating structural and contextual information. Inf Softw Technol 150:106987. https://doi.org/10.1016/J.INFSOF.2022.106987
Choi Y, Kim S, Lee J (2020) Source code summarization using attention-based keyword memory networks. In: Lee W, Chen L, Moon Y, Bourgeois J, Bennis M, Li Y, Ha Y, Kwon H, Cuzzocrea A (eds) 2020 IEEE International Conference on Big Data and Smart Computing, BigComp 2020, Busan, Korea (South), February 19-22, 2020, IEEE, pp 564–570. https://doi.org/10.1109/BigComp48618.2020.00011
Choi Y, Bak J, Na C, Lee J (2021) Learning sequential and structural information for source code summarization. In: Zong C, Xia F, Li W, Navigli R (eds) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, Association for Computational Linguistics, Findings of ACL, vol ACL/IJCNLP 2021, pp 2842–2851, https://doi.org/10.18653/v1/2021.findings-acl.251
Choi Y, Na C, Kim H, Lee J (2023) READSUM: retrieval-augmented adaptive transformer for source code summarization. IEEE Access 11:51155–51165. https://doi.org/10.1109/ACCESS.2023.3271992
Cortes-Coy LF, Vásquez ML, Aponte J, Poshyvanyk D (2014) On automatically generating commit messages via summarization of source code changes. In: 14th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2014, Victoria, BC, Canada, September 28-29, 2014, IEEE Computer Society, pp 275–284, https://doi.org/10.1109/SCAM.2014.14
Eberhart Z, LeClair A, McMillan C (2020) Automatically extracting subroutine summary descriptions from unstructured comments. In: Kontogiannis K, Khomh F, Chatzigeorgiou A, Fokaefs M, Zhou M (eds) 27th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2020, London, ON, Canada, February 18-21, 2020, IEEE, pp 35–46, https://doi.org/10.1109/SANER48275.2020.9054789
Eddy BP, Robinson JA, Kraft NA, Carver JC (2013) Evaluating source code summarization techniques: Replication and expansion. In: IEEE 21st International Conference on Program Comprehension, ICPC 2013, San Francisco, CA, USA, 20-21 May, 2013, IEEE Computer Society, pp 13–22, https://doi.org/10.1109/ICPC.2013.6613829,
Feng Z, Guo D, Tang D, Duan N, Feng X, Gong M, Shou L, Qin B, Liu T, Jiang D, Zhou M (2020) Codebert: A pre-trained model for programming and natural languages. In: Cohn T, He Y, Liu Y (eds) Findings of the Association for Computational Linguistics: EMNLP 2020, Online Event, 16-20 November 2020, Association for Computational Linguistics, Findings of ACL, vol EMNLP 2020, pp 1536–1547, https://doi.org/10.18653/v1/2020.findings-emnlp.139
Fernandes P, Allamanis M, Brockschmidt M (2019) Structured neural summarization. In: 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019, OpenReview.net, https://openreview.net/forum?id=H1ersoRqtm
Ferretti C, Saletta M (2023) Naturalness in source code summarization. how significant is it? In: 31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023, Melbourne, Australia, May 15-16, 2023, IEEE, pp 125–134, https://doi.org/10.1109/ICPC58990.2023.00027
Fluri B, Würsch M, Gall HC (2007) Do code and comments co-evolve? on the relation between source code and comment changes. In: 14th Working Conference on Reverse Engineering (WCRE 2007), 28-31 October 2007, Vancouver, BC, Canada, IEEE Computer Society, pp 70–79, https://doi.org/10.1109/WCRE.2007.21
Fowkes JM, Chanthirasegaran P, Ranca R, Allamanis M, Lapata M, Sutton C (2016) TASSAL: autofolding for source code summarization. In: Dillon LK, Visser W, Williams LA (eds) Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016 - Companion Volume, ACM, pp 649–652, https://doi.org/10.1145/2889160.2889171
Fowkes JM, Chanthirasegaran P, Ranca R, Allamanis M, Lapata M, Sutton C (2017) Autofolding for source code summarization. IEEE Trans Software Eng 43(12):1095–1109. https://doi.org/10.1109/TSE.2017.2664836
Gao S, Gao C, He Y, Zeng J, Nie L, Xia X, Lyu MR (2023a) Code structure-guided transformer for source code summarization. ACM Trans Softw Eng Methodol 32(1):23:1–23:32, https://doi.org/10.1145/3522674
Gao X, Jiang X, Wu Q, Wang X, Lyu C, Lyu L (2022) Gt-simnet: Improving code automatic summarization via multi-modal similarity networks. J Syst Softw 194:111495. https://doi.org/10.1016/j.jss.2022.111495
Gao Y, Zhang H, Lyu C (2023) Encosum: enhanced semantic features for multi-scale multi-modal source code summarization. Empir Softw Eng 28(5):126. https://doi.org/10.1007/s10664-023-10384-x
Geng M, Wang S, Dong D, Wang H, Cao S, Zhang K, Jin Z (2023) Interpretation-based code summarization. In: 31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023, Melbourne, Australia, May 15-16, 2023, IEEE, pp 113–124, https://doi.org/10.1109/ICPC58990.2023.00026
Gros D, Sezhiyan H, Devanbu P, Yu Z (2020) Code to comment ”translation”: Data, metrics, baselining & evaluation. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020, IEEE, pp 746–757, https://doi.org/10.1145/3324884.3416546
Guo D, Ren S, Lu S, Feng Z, Tang D, Liu S, Zhou L, Duan N, Svyatkovskiy A, Fu S, Tufano M, Deng SK, Clement CB, Drain D, Sundaresan N, Yin J, Jiang D, Zhou M (2021) Graphcodebert: Pre-training code representations with data flow. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net, https://openreview.net/forum?id=jLoC4ez43PZ
Guo J, Liu J, Wan Y, Li L, Zhou P (2022) Modeling hierarchical syntax structure with triplet position for source code summarization. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Association for Computational Linguistics, pp 486–500, https://doi.org/10.18653/v1/2022.acl-long.37
Guo Y, Chai Y, Zhang L, Li H, Luo M, Guo S (2024) Context-based transfer learning for low resource code summarization. Softw Pract Exp 54(3):465–482. https://doi.org/10.1002/spe.3288
Haiduc S, Aponte J, Marcus A (2010a) Supporting program comprehension with source code summarization. In: Kramer J, Bishop J, Devanbu PT, Uchitel S (eds) Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, ICSE 2010, Cape Town, South Africa, 1-8 May 2010, ACM, pp 223–226, https://doi.org/10.1145/1810295.1810335
Haiduc S, Aponte J, Moreno L, Marcus A (2010b) On the use of automated text summarization techniques for summarizing source code. In: Antoniol G, Pinzger M, Chikofsky EJ (eds) 17th Working Conference on Reverse Engineering, WCRE 2010, 13-16 October 2010, Beverly, MA, USA, IEEE Computer Society, pp 35–44, https://doi.org/10.1109/WCRE.2010.13
Haque S, Bansal A, Wu L, McMillan C (2021) Action word prediction for neural source code summarization. In: 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2021, Honolulu, HI, USA, March 9-12, 2021, IEEE, pp 330–341, https://doi.org/10.1109/SANER50967.2021.00038
Haque S, Eberhart Z, Bansal A, McMillan C (2022) Semantic similarity metrics for evaluating source code summarization. In: Rastogi A, Tufano R, Bavota G, Arnaoudova V, Haiduc S (eds) Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Virtual Event, May 16-17, 2022, ACM, pp 36–47, https://doi.org/10.1145/3524610.3527909
Hill E, Pollock LL, Vijay-Shanker K (2009) Automatically capturing source code context of nl-queries for software maintenance and reuse. In: 31st International Conference on Software Engineering, ICSE 2009, May 16-24, 2009, Vancouver, Canada, Proceedings, IEEE, pp 232–242, https://doi.org/10.1109/ICSE.2009.5070524
Hu X, Li G, Xia X, Lo D, Jin Z (2018a) Deep code comment generation. In: Khomh F, Roy CK, Siegmund J (eds) Proceedings of the 26th Conference on Program Comprehension, ICPC 2018, Gothenburg, Sweden, May 27-28, 2018, ACM, pp 200–210, https://doi.org/10.1145/3196321.3196334
Hu X, Li G, Xia X, Lo D, Lu S, Jin Z (2018b) Summarizing source code with transferred API knowledge. In: Lang J (ed) Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, ijcai.org, pp 2269–2275, https://doi.org/10.24963/ijcai.2018/314
Hu X, Li G, Xia X, Lo D, Jin Z (2020) Deep code comment generation with hybrid lexical and syntactical information. Empir Softw Eng 25(3):2179–2217. https://doi.org/10.1007/s10664-019-09730-9
Hu X, Zhang X, Lin Z, Zhou D (2024) Reduce redundancy then rerank: Enhancing code summarization with a novel pipeline framework. In: Calzolari N, Kan M, Hoste V, Lenci A, Sakti S, Xue N (eds) Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024, 20-25 May, 2024, Torino, Italy, ELRA and ICCL, pp 13722–13733, https://aclanthology.org/2024.lrec-main.1198
Hu Y, Yan M, Liu Z, Chen Q, Wang B (2021) Improving code summarization through automated quality assurance. In: Jin Z, Li X, Xiang J, Mariani L, Liu T, Yu X, Ivaki N (eds) 32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021, Wuhan, China, October 25-28, 2021, IEEE, pp 486–497, https://doi.org/10.1109/ISSRE52982.2021.00057
Huang Y, Zheng Q, Chen X, Xiong Y, Liu Z, Luo X (2017) Mining version control system for automatically generating commit comment. In: Bener A, Turhan B, Biffl S (eds) 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2017, Toronto, ON, Canada, November 9-10, 2017, IEEE Computer Society, pp 414–423, https://doi.org/10.1109/ESEM.2017.56
Huang Y, Huang S, Chen H, Chen X, Zheng Z, Luo X, Jia N, Hu X, Zhou X (2020) Towards automatically generating block comments for code snippets. Inf Softw Technol 127:106373. https://doi.org/10.1016/j.infsof.2020.106373
Husain H, Wu H, Gazit T, Allamanis M, Brockschmidt M (2019) Codesearchnet challenge: Evaluating the state of semantic code search. arXiv:1909.09436
Hussain Y, Huang Z, Zhou Y, Wang S (2020) Codegru: Context-aware deep learning with gated recurrent unit for source code modeling. Inf Softw Technol 125:106309. https://doi.org/10.1016/j.infsof.2020.106309
Iyer S, Konstas I, Cheung A, Zettlemoyer L (2016) Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers, The Association for Computer Linguistics, https://doi.org/10.18653/v1/p16-1195
Iyer S, Konstas I, Cheung A, Zettlemoyer L (2018) Mapping language to code in programmatic context. In: Riloff E, Chiang D, Hockenmaier J, Tsujii J (eds) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, Association for Computational Linguistics, pp 1643–1652, https://doi.org/10.18653/v1/d18-1192
Ji R, Tong Z, Luo T, Liu J, Zhang L (2023) A semantic and structural transformer for code summarization generation. In: International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18-23, 2023, IEEE, pp 1–9, https://doi.org/10.1109/IJCNN54540.2023.10191735
Jiang S, Armaly A, McMillan C (2017) Automatically generating commit messages from diffs using neural machine translation. In: Rosu G, Penta MD, Nguyen TN (eds) Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017, IEEE Computer Society, pp 135–146, https://doi.org/10.1109/ASE.2017.8115626
Jiang S, Shen J, Wu S, Cai Y, Yu Y, Zhou Y (2023) Towards usable neural comment generation via code-comment linkage interpretation: Method and empirical study. IEEE Trans Software Eng 49(4):2239–2254. https://doi.org/10.1109/TSE.2022.3214859
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, OpenReview.net, https://openreview.net/forum?id=SJU4ayYgl
Kumar J, Chimalakonda S (2024) Code summarization without direct access to code - towards exploring federated llms for software engineering. In: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, EASE 2024, Salerno, Italy, June 18-21, 2024, ACM, pp 100–109, https://doi.org/10.1145/3661167.3661210
LeClair A, McMillan C (2019) Recommendations for datasets for source code summarization. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp 3931–3937, https://doi.org/10.18653/v1/n19-1394
LeClair A, Jiang S, McMillan C (2019) A neural model for generating natural language summaries of program subroutines. In: Atlee JM, Bultan T, Whittle J (eds) Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, IEEE / ACM, pp 795–806, https://doi.org/10.1109/ICSE.2019.00087
LeClair A, Haque S, Wu L, McMillan C (2020) Improved code summarization via a graph neural network. In: ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020, ACM, pp 184–195, https://doi.org/10.1145/3387904.3389268
LeClair A, Bansal A, McMillan C (2021) Ensemble models for neural source code summarization of subroutines. In: IEEE International Conference on Software Maintenance and Evolution, ICSME 2021, Luxembourg, September 27 - October 1, 2021, IEEE, pp 286–297, https://doi.org/10.1109/ICSME52107.2021.00032
Li J, Li Y, Li G, Hu X, Xia X, Jin Z (2021) Editsum: A retrieve-and-edit framework for source code summarization. In: 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021, IEEE, pp 155–166, https://doi.org/10.1109/ASE51524.2021.9678724
Li J, Li L, Zhu H, Zhang X (2023a) Graphplbart: Code summarization based on graph embedding and pre-trained model. In: Chang S (ed) The 35th International Conference on Software Engineering and Knowledge Engineering, SEKE 2023, KSIR Virtual Conference Center, USA, July 1-10, 2023, KSI Research Inc., pp 304–309, https://doi.org/10.18293/SEKE2023-192
Li J, Zhang Y, Karas Z, McMillan C, Leach K, Huang Y (2024a) Do machines and humans focus on similar code? exploring explainability of large language models in code summarization. In: Steinmacher I, Linares-Vásquez M, Moran KP, Baysal O (eds) Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension, ICPC 2024, Lisbon, Portugal, April 15-16, 2024, ACM, pp 47–51, https://doi.org/10.1145/3643916.3644434
Li L, Li J, Xu Y, Zhu H, Zhang X (2023) Enhancing code summarization with graph embedding and pre-trained model. Int J Softw Eng Knowl Eng 33(11 &12):1765–1786. https://doi.org/10.1142/S0218194023410024
Li M, Yu H, Fan G, Zhou Z, Huang J (2023) Classsum: a deep learning model for class-level code summarization. Neural Comput Appl 35(4):3373–3393. https://doi.org/10.1007/S00521-022-07877-Z
Li M, Yu H, Fan G, Zhou Z, Huang Z (2024) Enhancing code summarization with action word prediction. Neurocomputing 563:126777. https://doi.org/10.1016/j.neucom.2023.126777
Liang H, Huang C (2024) Integrating non-fourier and ast-structural relative position representations into transformer-based model for source code summarization. IEEE Access 12:9871–9889. https://doi.org/10.1109/ACCESS.2024.3354390
Liang Y, Zhu KQ (2018) Automatic generation of text descriptive comments for code blocks. In: McIlraith SA, Weinberger KQ (eds) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, AAAI Press, pp 5229–5236, https://doi.org/10.1609/aaai.v32i1.11963
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out, pp 74–81
Lin L, Huang Z, Yu Y, Liu Y (2022) Multi-modal code summarization with retrieved summary. In: 22nd IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2021, Limassol, Cyprus, October 3, 2022, IEEE, pp 132–142, https://doi.org/10.1109/SCAM55253.2022.00020
Liu B, Wang T, Zhang X, Fan Q, Yin G, Deng J (2019) A neural-network based code summarization approach by using source code and its call dependencies. In: Internetware ’19: The 11th Asia-Pacific Symposium on Internetware, Fukuoka, Japan, October 28-29, 2019, ACM, pp 12:1–12:10, https://doi.org/10.1145/3361242.3362774
Liu S, Chen Y, Xie X, Siow JK, Liu Y (2021a) Retrieval-augmented generation for code summarization via hybrid GNN. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net, https://openreview.net/forum?id=zv-typ1gPxA
Liu S, Chen Y, Xie X, Siow JK, Liu Y (2021b) Retrieval-augmented generation for code summarization via hybrid GNN. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, OpenReview.net, https://openreview.net/forum?id=zv-typ1gPxA
Liu Z, Xia X, Hassan AE, Lo D, Xing Z, Wang X (2018) Neural-machine-translation-based commit message generation: how far are we? In: Huchard M, Kästner C, Fraser G (eds) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, ACM, pp 373–384, https://doi.org/10.1145/3238147.3238190
Lu S, Guo D, Ren S, Huang J, Svyatkovskiy A, Blanco A, Clement CB, Drain D, Jiang D, Tang D, Li G, Zhou L, Shou L, Zhou L, Tufano M, Gong M, Zhou M, Duan N, Sundaresan N, Deng SK, Fu S, Liu S (2021) Codexglue: A machine learning benchmark dataset for code understanding and generation. In: Vanschoren J, Yeung S (eds) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021, virtual, https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/c16a5320fa475530d9583c34fd356ef5-Abstract-round1.html
Lu X, Niu J (2023) Enhancing source code summarization from structure and semantics. In: International Joint Conference on Neural Networks, IJCNN 2023, Gold Coast, Australia, June 18-23, 2023, IEEE, pp 1–7, https://doi.org/10.1109/IJCNN54540.2023.10191872
Lyu C, Wang R, Zhang H, Zhang H, Hu S (2021) Embedding API dependency graph for neural code generation. Empir Softw Eng 26(4):61. https://doi.org/10.1007/S10664-021-09968-2
Ma Z, Gao Y, Lyu L, Lyu C (2022) MMF3: neural code summarization based on multi-modal fine-grained feature fusion. In: Madeiral F, Lassenius C, Conte T, Männistö T (eds) ESEM ’22: ACM / IEEE International Symposium on Empirical Software Engineering and Measurement, Helsinki Finland, September 19 - 23, 2022, ACM, pp 171–182, https://doi.org/10.1145/3544902.3546251
Malhotra M, Chhabra JK (2018) Micro level source code summarization of optimal set of object oriented classes. Webology 15(2), http://www.webology.org/2018/v15n2/a175.pdf
Mayer R, Moser M, Geist V (2023a) Leveraging and evaluating automatic code summarization for JPA program comprehension. In: Zhang T, Xia X, Novielli N (eds) IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, IEEE, pp 768–772, https://doi.org/10.1109/SANER56733.2023.00088
Mayer R, Moser M, Geist V (2023b) Leveraging and evaluating automatic code summarization for JPA program comprehension. In: Zhang T, Xia X, Novielli N (eds) IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2023, Taipa, Macao, March 21-24, 2023, IEEE, pp 768–772, https://doi.org/10.1109/SANER56733.2023.00088
McBurney PW, McMillan C (2014) Automatic documentation generation via source code summarization of method context. In: Roy CK, Begel A, Moonen L (eds) 22nd International Conference on Program Comprehension, ICPC 2014, Hyderabad, India, June 2-3, 2014, ACM, pp 279–290, https://doi.org/10.1145/2597008.2597149
McBurney PW, McMillan C (2016) Automatic source code summarization of context for java methods. IEEE Trans Software Eng 42(2):103–119. https://doi.org/10.1109/TSE.2015.2465386
McBurney PW, Liu C, McMillan C, Weninger T (2014) Improving topic model source code summarization. In: Roy CK, Begel A, Moonen L (eds) 22nd International Conference on Program Comprehension, ICPC 2014, Hyderabad, India, June 2-3, 2014, ACM, pp 291–294, https://doi.org/10.1145/2597008.2597793
Moore J, Gelman B, Slater D (2019) A convolutional neural network for language-agnostic source code summarization. In: Damiani E, Spanoudakis G, Maciaszek LA (eds) Proceedings of the 14th International Conference on Evaluation of Novel Approaches to Software Engineering, ENASE 2019, Heraklion, Crete, Greece, May 4-5, 2019, SciTePress, pp 15–26, https://doi.org/10.5220/0007678100150026
Moreno L, Aponte J, Sridhara G, Marcus A, Pollock LL, Vijay-Shanker K (2013) Automatic generation of natural language summaries for java classes. In: IEEE 21st International Conference on Program Comprehension, ICPC 2013, San Francisco, CA, USA, 20-21 May, 2013, IEEE Computer Society, pp 23–32, https://doi.org/10.1109/ICPC.2013.6613830
Movshovitz-Attias D, Cohen WW (2013) Natural language models for predicting programming comments. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, 4-9 August 2013, Sofia, Bulgaria, Volume 2: Short Papers, The Association for Computer Linguistics, pp 35–40, https://aclanthology.org/P13-2007/
Nazar N, Hu Y, Jiang H (2016) Summarizing software artifacts: A literature review. J Comput Sci Technol 31(5):883–909. https://doi.org/10.1007/s11390-016-1671-1
Nazar N, Jiang H, Gao G, Zhang T, Li X, Ren Z (2016) Source code fragment summarization with small-scale crowdsourcing based features. Frontiers Comput Sci 10(3):504–517. https://doi.org/10.1007/s11704-015-4409-2
Nie P, Zhang J, Li JJ, Mooney RJ, Gligoric M (2022) Impact of evaluation methodologies on code summarization. In: Muresan S, Nakov P, Villavicencio A (eds) Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, Association for Computational Linguistics, pp 4936–4960, https://doi.org/10.18653/v1/2022.acl-long.339
Niu C, Li C, Ng V, Ge J, Huang L, Luo B (2024) Passsum: Leveraging paths of abstract syntax trees and self-supervision for code summarization. J Softw Evol Process 36(6), https://doi.org/10.1002/smr.2620
Panichella S (2018) Summarization techniques for code, change, testing, and user feedback (invited paper). In: Artho C, Ramler R (eds) 2018 IEEE Workshop on Validation, Analysis and Evolution of Software Tests, VST@SANER 2018, Campobasso, Italy, March 20, 2018, IEEE, pp 1–5, https://doi.org/10.1109/VST.2018.8327148
Panichella S, Aponte J, Penta MD, Marcus A, Canfora G (2012) Mining source code descriptions from developer communications. In: Beyer D, van Deursen A, Godfrey MW (eds) IEEE 20th International Conference on Program Comprehension, ICPC 2012, Passau, Germany, June 11-13, 2012, IEEE Computer Society, pp 63–72, https://doi.org/10.1109/ICPC.2012.6240510
Papineni K, Roukos S, Ward T, Zhu W (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, July 6-12, 2002, Philadelphia, PA, USA, ACL, pp 311–318, https://doi.org/10.3115/1073083.1073135, https://aclanthology.org/P02-1040/
Parvez MR, Ahmad WU, Chakraborty S, Ray B, Chang K (2021) Retrieval augmented code generation and summarization. In: Moens M, Huang X, Specia L, Yih SW (eds) Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 16-20 November, 2021, Association for Computational Linguistics, pp 2719–2734, https://doi.org/10.18653/v1/2021.findings-emnlp.232
Rahman MM, Roy CK, Keivanloo I (2015) Recommending insightful comments for source code using crowdsourced knowledge. In: Godfrey MW, Lo D, Khomh F (eds) 15th IEEE International Working Conference on Source Code Analysis and Manipulation, SCAM 2015, Bremen, Germany, September 27-28, 2015, IEEE Computer Society, pp 81–90, https://doi.org/10.1109/SCAM.2015.7335404
Rai S, Gaikwad T, Jain S, Gupta A (2017) Method level text summarization for java code using nano-patterns. In: Lv J, Zhang HJ, Hinchey M, Liu X (eds) 24th Asia-Pacific Software Engineering Conference, APSEC 2017, Nanjing, China, December 4-8, 2017, IEEE Computer Society, pp 199–208, https://doi.org/10.1109/APSEC.2017.26
Rani P, Blasi A, Stulova N, Panichella S, Gorla A, Nierstrasz O (2023) A decade of code comment quality assessment: A systematic literature review. J Syst Softw 195:111515. https://doi.org/10.1016/j.jss.2022.111515
Ren S, Guo D, Lu S, Zhou L, Liu S, Tang D, Sundaresan N, Zhou M, Blanco A, Ma S (2020) Codebleu: a method for automatic evaluation of code synthesis. arXiv:2009.10297
Rodeghero P, McMillan C, McBurney PW, Bosch N, D’Mello SK (2014) Improving automated source code summarization via an eye-tracking study of programmers. In: Jalote P, Briand LC, van der Hoek A (eds) 36th International Conference on Software Engineering, ICSE ’14, Hyderabad, India - May 31 - June 07, 2014, ACM, pp 390–401, https://doi.org/10.1145/2568225.2568247
Rodeghero P, Liu C, McBurney PW, McMillan C (2015) An eye-tracking study of java programmers and application to source code summarization. IEEE Trans Software Eng 41(11):1038–1054. https://doi.org/10.1109/TSE.2015.2442238
Roy D, Fakhoury S, Arnaoudova V (2021) Reassessing automatic evaluation metrics for code summarization tasks. In: Spinellis D, Gousios G, Chechik M, Penta MD (eds) ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021, ACM, pp 1105–1116, https://doi.org/10.1145/3468264.3468588
Shahbazi R, Fard FH (2023) Apicontext2com: Code comment generation by incorporating pre-defined API documentation. In: 31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023, Melbourne, Australia, May 15-16, 2023, IEEE, pp 13–24, https://doi.org/10.1109/ICPC58990.2023.00012
Shen J, Sun X, Li B, Yang H, Hu J (2016) On automatic summarization of what and why information in source code changes. In: 40th IEEE Annual Computer Software and Applications Conference, COMPSAC 2016, Atlanta, GA, USA, June 10-14, 2016, IEEE Computer Society, pp 103–112, https://doi.org/10.1109/COMPSAC.2016.162
Shen J, Zhou Y, Wang Y, Chen X, Han T, Chen T (2021) Evaluating code summarization with improved correlation with human assessment. In: 21st IEEE International Conference on Software Quality, Reliability and Security, QRS 2021, Hainan, China, December 6-10, 2021, IEEE, pp 990–1001, https://doi.org/10.1109/QRS54544.2021.00108
Shi C, Xiang Y, Yu J, Gao L (2022a) Towards accurate knowledge transfer between transformer-based models for code summarization. In: Peng R, Pantoja CE, Kamthan P (eds) The 34th International Conference on Software Engineering and Knowledge Engineering, SEKE 2022, KSIR Virtual Conference Center, USA, July 1 - July 10, 2022, KSI Research Inc., pp 91–94, https://doi.org/10.18293/SEKE2022-111
Shi C, Cai B, Zhao Y, Gao L, Sood K, Xiang Y (2023) Coss: Leveraging statement semantics for code summarization. IEEE Trans Software Eng 49(6):3472–3486. https://doi.org/10.1109/TSE.2023.3256362
Shi E, Wang Y, Du L, Chen J, Han S, Zhang H, Zhang D, Sun H (2022b) On the evaluation of neural code summarization. In: 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, ACM, pp 1597–1608, https://doi.org/10.1145/3510003.3510060
Shi L, Mu F, Chen X, Wang S, Wang J, Yang Y, Li G, Xia X, Wang Q (2022c) Are we building on the rock? on the importance of data preprocessing for code summarization. In: Roychoudhury A, Cadar C, Kim M (eds) Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022, ACM, pp 107–119, https://doi.org/10.1145/3540250.3549145
Shido Y, Kobayashi Y, Yamamoto A, Miyamoto A, Matsumura T (2019) Automatic source code summarization with extended tree-lstm. In: International Joint Conference on Neural Networks, IJCNN 2019 Budapest, Hungary, July 14-19, 2019, IEEE, pp 1–8, https://doi.org/10.1109/IJCNN.2019.8851751
Son J, Hahn J, Seo H, Han Y (2022) Boosting code summarization by embedding code structures. In: Calzolari N, Huang C, Kim H, Pustejovsky J, Wanner L, Choi K, Ryu P, Chen H, Donatelli L, Ji H, Kurohashi S, Paggio P, Xue N, Kim S, Hahm Y, He Z, Lee TK, Santus E, Bond F, Na S (eds) Proceedings of the 29th International Conference on Computational Linguistics, COLING 2022, Gyeongju, Republic of Korea, October 12-17, 2022, International Committee on Computational Linguistics, pp 5966–5977, https://aclanthology.org/2022.coling-1.521
Song X, Sun H, Wang X, Yan J (2019) A survey of automatic generation of source code comments: Algorithms and techniques. IEEE Access 7:111411–111428. https://doi.org/10.1109/ACCESS.2019.2931579
Song Z, Shang X, Li M, Chen R, Li H, Guo S (2022) Do not have enough data? an easy data augmentation for code summarization. In: 13th IEEE International Symposium on Parallel Architectures, Algorithms and Programming, PAAP 2022, Beijing, China, November 25-27, 2022, IEEE, pp 1–6, https://doi.org/10.1109/PAAP56126.2022.10010698
Song Z, Zeng H, Shang X, Li G, Li H, Guo S (2023) An data augmentation method for source code summarization. Neurocomputing 549:126385. https://doi.org/10.1016/j.neucom.2023.126385
Sridhara G, Hill E, Muppaneni D, Pollock LL, Vijay-Shanker K (2010) Towards automatically generating summary comments for java methods. In: Pecheur C, Andrews J, Nitto ED (eds) ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, Antwerp, Belgium, September 20-24, 2010, ACM, pp 43–52, https://doi.org/10.1145/1858996.1859006
Sridhara G, Pollock LL, Vijay-Shanker K (2011a) Automatically detecting and describing high level actions within methods. In: Taylor RN, Gall HC, Medvidovic N (eds) Proceedings of the 33rd International Conference on Software Engineering, ICSE 2011, Waikiki, Honolulu , HI, USA, May 21-28, 2011, ACM, pp 101–110, https://doi.org/10.1145/1985793.1985808
Sridhara G, Pollock LL, Vijay-Shanker K (2011b) Generating parameter comments and integrating with method summaries. In: The 19th IEEE International Conference on Program Comprehension, ICPC 2011, Kingston, ON, Canada, June 22-24, 2011, IEEE Computer Society, pp 71–80, https://doi.org/10.1109/ICPC.2011.28
Stapleton S, Gambhir Y, LeClair A, Eberhart Z, Weimer W, Leach K, Huang Y (2020) A human study of comprehension and code summarization. In: ICPC ’20: 28th International Conference on Program Comprehension, Seoul, Republic of Korea, July 13-15, 2020, ACM, pp 2–13, https://doi.org/10.1145/3387904.3389258
Su C, McMillan C (2024) Distilled GPT for source code summarization. Autom Softw Eng 31(1):22. https://doi.org/10.1007/s10515-024-00421-4
Sun W, Fang C, You Y, Miao Y, Liu Y, Li Y, Deng G, Huang S, Chen Y, Zhang Q, Qian H, Liu Y, Chen Z (2023) Automatic code summarization via chatgpt: How far are we? CoRR abs/2305.12865, https://doi.org/10.48550/ARXIV.2305.12865, arXiv:2305.12865
Sun W, Fang C, Chen Y, Zhang Q, Tao G, You Y, Han T, Ge Y, Hu Y, Luo B, Chen Z (2024) An extractive-and-abstractive framework for source code summarization. ACM Trans Softw Eng Methodol 33(3):75:1–75:39, https://doi.org/10.1145/3632742
Tang Z, Shen X, Li C, Ge J, Huang L, Zhu Z, Luo B (2022) Ast-trans: Code summarization with efficient tree-structured attention. In: 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022, ACM, pp 150–162, https://doi.org/10.1145/3510003.3510224
Tufano M, Watson C, Bavota G, Penta MD, White M, Poshyvanyk D (2018) Deep learning similarities from different representations of source code. In: Zaidman A, Kamei Y, Hill E (eds) Proceedings of the 15th International Conference on Mining Software Repositories, MSR 2018, Gothenburg, Sweden, May 28-29, 2018, ACM, pp 542–553, https://doi.org/10.1145/3196398.3196431
Vassallo C, Panichella S, Penta MD, Canfora G (2014) CODES: mining source code descriptions from developers discussions. In: Roy CK, Begel A, Moonen L (eds) 22nd International Conference on Program Comprehension, ICPC 2014, Hyderabad, India, June 2-3, 2014, ACM, pp 106–109, https://doi.org/10.1145/2597008.2597799
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp 5998–6008, https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Vedantam R, Zitnick CL, Parikh D (2015) Cider: Consensus-based image description evaluation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, IEEE Computer Society, pp 4566–4575, https://doi.org/10.1109/CVPR.2015.7299087
Wan Y, Zhao Z, Yang M, Xu G, Ying H, Wu J, Yu PS (2018) Improving automatic source code summarization via deep reinforcement learning. In: Huchard M, Kästner C, Fraser G (eds) Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, Montpellier, France, September 3-7, 2018, ACM, pp 397–407, https://doi.org/10.1145/3238147.3238206
WANG J, XUE X, WENG W (2015) Source code summarization technology based on syntactic analysis. J Comput Appl 35(7):1999
Wang R, Zhang H, Lu G, Lyu L, Lyu C (2020) Fret: Functional reinforced transformer with BERT for code summarization. IEEE Access 8:135591–135604. https://doi.org/10.1109/ACCESS.2020.3011744
Wang W, Zhang Y, Zeng Z, Xu G (2020b) Trans \(^{\wedge } \) 3: A transformer-based framework for unifying code summarization and code search. arXiv:2003.03238
Wang W, Zhang Y, Sui Y, Wan Y, Zhao Z, Wu J, Yu PS, Xu G (2022) Reinforcement-learning-guided source code summarization using hierarchical attention. IEEE Trans Software Eng 48(2):102–119. https://doi.org/10.1109/TSE.2020.2979701
Wang X, Pollock LL, Vijay-Shanker K (2017) Automatically generating natural language descriptions for object-related statement sequences. In: Pinzger M, Bavota G, Marcus A (eds) IEEE 24th International Conference on Software Analysis, Evolution and Reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, 2017, IEEE Computer Society, pp 205–216, https://doi.org/10.1109/SANER.2017.7884622
Wang Y, Wang W, Joty SR, Hoi SCH (2021) Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Moens M, Huang X, Specia L, Yih SW (eds) Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Association for Computational Linguistics, pp 8696–8708, https://doi.org/10.18653/v1/2021.emnlp-main.685
Wang Y, Dong Y, Lu X, Zhou A (2022b) Gypsum: learning hybrid representations for code summarization. In: Rastogi A, Tufano R, Bavota G, Arnaoudova V, Haiduc S (eds) Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Virtual Event, May 16-17, 2022, ACM, pp 12–23, https://doi.org/10.1145/3524610.3527903
Wang Y, Le H, Gotmare A, Bui NDQ, Li J, Hoi SCH (2023) Codet5+: Open code large language models for code understanding and generation. In: Bouamor H, Pino J, Bali K (eds) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, Association for Computational Linguistics, pp 1069–1088, https://doi.org/10.18653/v1/2023.emnlp-main.68
Wei B, Li G, Xia X, Fu Z, Jin Z (2019) Code generation as a dual task of code summarization. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp 6559–6569, https://proceedings.neurips.cc/paper/2019/hash/e52ad5c9f751f599492b4f087ed7ecfc-Abstract.html
Wong E, Yang J, Tan L (2013) Autocomment: Mining question and answer sites for automatic comment generation. In: Denney E, Bultan T, Zeller A (eds) 2013 28th IEEE/ACM International Conference on Automated Software Engineering, ASE 2013, Silicon Valley, CA, USA, November 11-15, 2013, IEEE, pp 562–567, https://doi.org/10.1109/ASE.2013.6693113
Wong E, Liu T, Tan L (2015) Clocom: Mining existing source code for automatic comment generation. In: Guéhéneuc Y, Adams B, Serebrenik A (eds) 22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015, IEEE Computer Society, pp 380–389, https://doi.org/10.1109/SANER.2015.7081848
Wu H, Zhao H, Zhang M (2021) Code summarization with structure-induced transformer. In: Zong C, Xia F, Li W, Navigli R (eds) Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, Online Event, August 1-6, 2021, Association for Computational Linguistics, Findings of ACL, vol ACL/IJCNLP 2021, pp 1078–1090, https://doi.org/10.18653/v1/2021.findings-acl.93
Xia X, Bao L, Lo D, Xing Z, Hassan AE, Li S (2018) Measuring program comprehension: A large-scale field study with professionals. IEEE Trans Software Eng 44(10):951–976. https://doi.org/10.1109/TSE.2017.2734091
Yang K, Mao X, Wang S, Qin Y, Zhang T, Lu Y, Al-Sabahi K (2023a) An extensive study of the structure features in transformer-based code semantic summarization. In: 31st IEEE/ACM International Conference on Program Comprehension, ICPC 2023, Melbourne, Australia, May 15-16, 2023, IEEE, pp 89–100, https://doi.org/10.1109/ICPC58990.2023.00024
Yang K, Wang J, Song Z (2023) Learning a holistic and comprehensive code representation for code summarization. J Syst Softw 203:111746. https://doi.org/10.1016/j.jss.2023.111746
Ye W, Xie R, Zhang J, Hu T, Wang X, Zhang S (2020) Leveraging code generation to improve code retrieval and summarization via dual learning. In: Huang Y, King I, Liu T, van Steen M (eds) WWW ’20: The Web Conference 2020, Taipei, Taiwan, April 20-24, 2020, ACM / IW3C2, pp 2309–2319, https://doi.org/10.1145/3366423.3380295
Zeng J, Zhang T, Xu Z (2021) Dg-trans: Automatic code summarization via dynamic graph attention-based transformer. In: 21st IEEE International Conference on Software Quality, Reliability and Security, QRS 2021, Hainan, China, December 6-10, 2021, IEEE, pp 786–795, https://doi.org/10.1109/QRS54544.2021.00088
Zeng J, He Y, Zhang T, Xu Z, Han Q (2023) Clg-trans: Contrastive learning for code summarization via graph attention-based transformer. Sci Comput Program 226:102925. https://doi.org/10.1016/j.scico.2023.102925
Zeng J, Qu Z, Cai B (2023) Structure and sequence aligned code summarization with prefix and suffix balanced strategy. Entropy 25(4):570. https://doi.org/10.3390/e25040570
Zeng L, Zhang X, Wang T, Li X, Yu J, Wang H (2018) Improving code summarization by combining deep learning and empirical knowledge (S). In: Pereira ÓM (ed) The 30th International Conference on Software Engineering and Knowledge Engineering, Hotel Pullman, Redwood City, California, USA, July 1-3, 2018, KSI Research Inc. and Knowledge Systems Institute Graduate School, pp 566–565, https://doi.org/10.18293/SEKE2018-191
Zhang C, Wang J, Zhou Q, Xu T, Tang K, Gui H, Liu F (2022) A survey of automatic source code summarization. Symmetry 14(3):471. https://doi.org/10.3390/sym14030471
Zhang C, Zhou Q, Qiao M, Tang K, Xu L, Liu F (2022) Re_trans: Combined retrieval and transformer model for source code summarization. Entropy 24(10):1372
Zhang J, Wang X, Zhang H, Sun H, Wang K, Liu X (2019) A novel neural source code representation based on abstract syntax tree. In: Atlee JM, Bultan T, Whittle J (eds) Proceedings of the 41st International Conference on Software Engineering, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, IEEE / ACM, pp 783–794, https://doi.org/10.1109/ICSE.2019.00086
Zhang J, Wang X, Zhang H, Sun H, Liu X (2020) Retrieval-based neural source code summarization. In: Rothermel G, Bae D (eds) ICSE ’20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, ACM, pp 1385–1397, https://doi.org/10.1145/3377811.3380383
Zhang M, Zhou G, Yu W, Huang N, Liu W (2023a) GA-SCS: graph-augmented source code summarization. ACM Trans Asian Low Resour Lang Inf Process 22(2):53:1–53:19, https://doi.org/10.1145/3554820
Zhang X, Yang S, Duan L, Lang Z, Shi Z, Sun L (2021) Transformer-xl with graph neural network for source code summarization. In: 2021 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2021, Melbourne, Australia, October 17-20, 2021, IEEE, pp 3436–3441, https://doi.org/10.1109/SMC52423.2021.9658619
Zhang X, Chen L, Zou W, Cao Y, Ren H, Wang Z, Li Y, Zhou Y (2024) ICG: A machine learning benchmark dataset and baselines for inline code comments generation task. Int J Softw Eng Knowl Eng 34(2):331–356. https://doi.org/10.1142/S0218194023500547
Zhang Z, Chen C, Liu B, Liao C, Gong Z, Yu H, Li J, Wang R (2023b) A survey on language models for code. arXiv:2311.07989
Zhang Z, Chen S, Fan G, Yang G, Feng Z (2023c) CCGRA: smart contract code comment generation with retrieval-enhanced approach. In: Chang S (ed) The 35th International Conference on Software Engineering and Knowledge Engineering, SEKE 2023, KSIR Virtual Conference Center, USA, July 1-10, 2023, KSI Research Inc., pp 212–217, https://doi.org/10.18293/SEKE2023-090
Zheng W, Zhou H, Li M, Wu J (2017) Code attention: Translating code to comments by exploiting domain features. arXiv:1709.07642
Zheng W, Zhou H, Li M, Wu J (2019) Codeattention: translating source code to comments by exploiting the code constructs. Front Comput Sci 13(3):565–578. https://doi.org/10.1007/s11704-018-7457-6
Zhou Y, Liu S, Siow JK, Du X, Liu Y (2019) Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp 10197–10207, https://proceedings.neurips.cc/paper/2019/hash/49265d2447bc3bbfe9e76306ce40a31f-Abstract.html
Zhou Y, Shen J, Zhang X, Yang W, Han T, Chen T (2022) Automatic source code summarization with graph attention networks. J Syst Softw 188:111257. https://doi.org/10.1016/j.jss.2022.111257
Zhou Z, Yu H, Fan G (2020) Effective approaches to combining lexical and syntactical information for code summarization. Softw Pract Exp 50(12):2313–2336. https://doi.org/10.1002/spe.2893
Zhou Z, Yu H, Fan G, Huang Z, Yang X (2022) Summarizing source code with hierarchical code representation. Inf Softw Technol 143:106761. https://doi.org/10.1016/j.infsof.2021.106761
Zhou Z, Yu H, Fan G, Huang Z, Yang K (2023) Towards retrieval-based neural code summarization: A meta-learning approach. IEEE Trans Software Eng 49(4):3008–3031. https://doi.org/10.1109/TSE.2023.3238161
Zhu T, Li Z, Pan M, Shi C, Zhang T, Pei Y, Li X (2023) Revisiting information retrieval and deep learning approaches for code summarization. In: 45th IEEE/ACM International Conference on Software Engineering: ICSE 2023 Companion Proceedings, Melbourne, Australia, May 14-20, 2023, IEEE, pp 328–329, https://doi.org/10.1109/ICSE-Companion58688.2023.00091
Zhuang Y, Liu Z, Qian P, Liu Q, Wang X, He Q (2020) Smart contract vulnerability detection using graph neural network. In: Bessiere C (ed) Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, ijcai.org, pp 3283–3290, https://doi.org/10.24963/ijcai.2020/454
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No.62102036) and the Beijing Municipal Natural Science Foundation for Youths (No.4224090).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interests/Competing Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Communicated by: Xia Hou.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, X., Hou, X., Qiao, X. et al. A review of automatic source code summarization. Empir Software Eng 29, 162 (2024). https://doi.org/10.1007/s10664-024-10553-6
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-024-10553-6