research-article

ESGen: Commit Message Generation Based on Edit Sequence of Code Change

Authors:

Xiangping Chen,

Zibin ZhengAuthors Info & Claims

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

Pages 112 - 124

https://doi.org/10.1145/3643916.3644414

Published: 13 June 2024 Publication History

Abstract

Commit messages provide important information for comprehending the code changes, and a number of researchers try to generate commit messages by using an automatic way. These research on commit message generation has profited from the code tokens or code structures such as AST. Since the edit sequence of code change is also important for capturing the code change intent, we propose a new commit message generation method called ESGen, which extracts AST edit sequences of code changes as model input. Specifically, we employ an O(ND) difference algorithm to extract the edit sequence from AST by comparing the ASTs before and after applying the code changes. Then, we construct a Bi-Encoder, which encodes the textual information and the AST edit sequence information of code change. The experimental results show that ESGen outperforms other baseline models, improving the BLEU-4 to 15.14. Also, when applying the edit sequence to 7 baseline models, they improve the BLEU-4 scores of these models by an average of 8.5%. Additionally, a human evaluation confirmed the effectiveness of ESGen in generating commit messages.

References

[1]

Nahla J. Abid, Natalia Dragan, Michael L. Collard, and Jonathan I. Maletic. 2015. Using stereotypes in the automatic generation of natural language summaries for C++ methods. 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2015), 561--565.

Digital Library

[2]

Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A Transformer-based Approach for Source Code Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4998--5007.

[3]

Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to Represent Programs with Graphs. ArXiv abs/1711.00740 (2017).

[4]

Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. code2vec: learning distributed representations of code. Proceedings of the ACM on Programming Languages 3 (2018), 1 -- 29.

Digital Library

[5]

Ahmed Awad and Khaled Nagaty. 2019. Commit Message Generation from Code Differences Using Hidden Markov Models. In Proceedings of the 8th International Conference on Software and Information Engineering (Cairo, Egypt) (ICSIE '19). Association for Computing Machinery, New York, NY, USA, 96--99.

Digital Library

[6]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In IEEvaluation@ACL.

[7]

Raymond P.L. Buse and Westley R. Weimer. 2010. Automatically Documenting Program Changes. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (Antwerp, Belgium) (ASE '10). Association for Computing Machinery, New York, NY, USA, 33--42.

Digital Library

[8]

Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, and Yao Chen. 2020. TAG : Type Auxiliary Guiding for Code Comment Generation. ArXiv abs/2005.02835 (2020).

[9]

Xinyun Chen, Chang Liu, and Dawn Xiaodong Song. 2018. Tree-to-tree Neural Networks for Program Translation. In Neural Information Processing Systems.

[10]

Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. ArXiv abs/1412.3555 (2014).

[11]

James Coglan. 2019. Building Git. https://shop.jcoglan.com/building-git/

[12]

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. ArXiv abs/1901.02860 (2019).

[13]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.

[14]

Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, and Dan Hao. 2022. FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022), 970--981.

Digital Library

[15]

Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). 422--431.

[16]

Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-Sequence Attentional Neural Machine Translation. ArXiv abs/1603.06075 (2016).

[17]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. ArXiv abs/2002.08155 (2020).

[18]

Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2018. Structured Neural Summarization. ArXiv abs/1811.01824 (2018).

[19]

Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Jian Yin, Daxin Jiang, and M. Zhou. 2020. GraphCodeBERT: Pre-training Code Representations with Data Flow. ArXiv abs/2009.08366 (2020).

[20]

Hanyang Guo, Xiangping Chen, Yuan Huang, Yanlin Wang, Xi Ding, Zibin Zheng, Xiaocong Zhou, and Hong-Ning Dai. 2023. Snippet Comment Generation Based on Code Context Expansion. ACM Transactions on Software Engineering and Methodology 33, 1 (2023), 1--30.

Digital Library

[21]

Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. 2010 ACM/IEEE 32nd International Conference on Software Engineering 2 (2010), 223--226.

Digital Library

[22]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9 (1997), 1735--1780.

Digital Library

[23]

Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep Code Comment Generation. In Proceedings of the 26th Conference on Program Comprehension (Gothenburg, Sweden) (ICPC '18). Association for Computing Machinery, New York, NY, USA, 200--210.

Digital Library

[24]

Xing Hu, Ge Li, Xin Xia, D. Lo, and Zhi Jin. 2019. Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25 (2019), 2179--2217.

Digital Library

[25]

Yuan Huang, Hanyang Guo, Xi Ding, Junhuai Shu, Xiangping Chen, Xiapu Luo, Zibin Zheng, and Xiaocong Zhou. 2023. A Comparative Study on Method Comment and Inline Comment. ACM Trans. Softw. Eng. Methodol. 32, 5, Article 126 (jul 2023), 26 pages.

Digital Library

[26]

Yuan Huang, Xinyu Hu, Nan Jia, Xiangping Chen, Zibin Zheng, and Xiapu Luo. 2020. CommtPst: Deep learning source code for commenting positions prediction. Journal of Systems and Software 170 (2020), 110754.

[27]

Yuan Huang, Shaohao Huang, Huanchao Chen, Xiangping Chen, Zibin Zheng, Xiapu Luo, Nan Jia, Xinyu Hu, and Xiaocong Zhou. 2020. Towards automatically generating block comments for code snippets. Information and Software Technology 127 (07 2020), 106373.

[28]

Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng. 2020. Code review knowledge perception: Fusing multi-features for salient-class location. IEEE Transactions on Software Engineering 48, 5 (2020), 1463--1479.

Digital Library

[29]

Yuan Huang, Nan Jia, Hao-Jie Zhou, Xiang-Ping Chen, Zi-Bin Zheng, and Ming-Dong Tang. 2020. Learning Human-Written Commit Messages to Document Code Changes. J. Comput. Sci. Technol. 35, 6 (nov 2020), 1258--1277.

Digital Library

[30]

Yuan Huang, Jinyu Jiang, Xiapu Luo, Xiangping Chen, Zibin Zheng, Nan Jia, and Gang Huang. 2021. Change-patterns mapping: A boosting way for change impact analysis. IEEE Transactions on Software Engineering 48, 7 (2021), 2376--2398.

[31]

Yuan Huang, Qiaoyang Zheng, Xiangping Chen, Yingfei Xiong, Zhiyong Liu, and Xiaonan Luo. 2017. Mining version control system for automatically generating commit comment. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 414--423.

Digital Library

[32]

James W. Hunt and Thomas G. Szymanski. 1977. A fast algorithm for computing longest common subsequences. Commun. ACM 20 (1977), 350--353.

Digital Library

[33]

Srini Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2016).

[34]

Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 135--146.

[35]

Siyuan Jiang and Collin McMillan. 2017. Towards Automatic Generation of Short Summaries of Commits. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). 320--323.

Digital Library

[36]

Alexander LeClair, Aakash Bansal, and Collin McMillan. 2021. Ensemble Models for Neural Source Code Summarization of Subroutines. 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2021), 286--297.

[37]

Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved Code Summarization via a Graph Neural Network. CoRR abs/2004.02843 (2020). arXiv:2004.02843 https://arxiv.org/abs/2004.02843

[38]

Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A Neural Model for Generating Natural Language Summaries of Program Subroutines. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (2019), 795--806.

Digital Library

[39]

Bo Lin, Shangwen Wang, Zhongxin Liu, Yepang Liu, Xin Xia, and Xiaoguang Mao. 2023. CCT5: A Code-Change-Oriented Pre-Trained Model. In Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.

Digital Library

[40]

C. Y. Lin. 2004. ROUGE: A Package for Automatic Evaluation of summaries. In In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004).

[41]

Chin-Yew Lin and Franz Josef Och. 2004. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In Annual Meeting of the Association for Computational Linguistics.

[42]

Mario Linares-Vásquez, Luis Fernando Cortés-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. ChangeScribe: A Tool for Automatically Generating Commit Messages. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. 709--712.

[43]

Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, and Yu Qian. 2019. Generating Commit Messages from Diffs using Pointer-Generator Network. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) (2019), 299--309.

Digital Library

[44]

Shangqing Liu, Cuiyun Gao, Sen Chen, Lun Yiu Nie, and Yang Liu. 2019. ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking. IEEE Transactions on Software Engineering 48 (2019), 1800--1817.

Digital Library

[45]

Zhongxin Liu, Zhijie Tang, Xin Xia, and Xiaohu Yang. 2023. CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE '23). IEEE Press, 17--29.

Digital Library

[46]

Zhongxin Liu, Xin Xia, Ahmed E. Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 373--384.

Digital Library

[47]

Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. ArXiv abs/1704.04856 (2017).

[48]

Rocío Cabrera Lozoya, Arnaud Baumann, Antonino Sabetta, and Michele Bezzi. 2019. Commit2Vec: Learning Distributed Representations of Code Changes. SN Computer Science 2 (2019).

[49]

Walid Maalej and Hans-Jörg Happel. 2010. Can development work describe itself?. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). 191--200.

[50]

Paul W. McBurney, Cheng Liu, Collin McMillan, and Tim Weninger. 2014. Improving topic model source code summarization. In IEEE International Conference on Program Comprehension.

Digital Library

[51]

Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrian Marcus, and Gerardo Canfora. 2014. Automatic Generation of Release Notes. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (Hong Kong, China) (FSE 2014). Association for Computing Machinery, New York, NY, USA, 484--495.

Digital Library

[52]

Eugene W. Myers. 2023. AnO(ND) Difference Algorithm and Its Variations. Algorithmica 1, 1--4 (mar 2023), 251--266.

Digital Library

[53]

Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, and Zenglin Xu. 2021. CoreGen: Contextualized Code Representation Learning for Commit Message Generation. Neurocomputing 459 (2021), 97--107.

Digital Library

[54]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Annual Meeting of the Association for Computational Linguistics.

[55]

Chris Parnin and Carsten Görg. 2008. Improving change descriptions with change contexts. In IEEE Working Conference on Mining Software Repositories.

Digital Library

[56]

Nicolae-Teodor Pavel and Traian Rebedea. 2021. A Sketch-Based Neural Model for Generating Commit Messages from Diffs. ArXiv abs/2104.04087 (2021).

[57]

Sawan Rai, Tejaswini Gaikwad, Sparshi Jain, and Atul Gupta. 2017. Method Level Text Summarization for Java Code Using Nano-Patterns. 2017 24th Asia-Pacific Software Engineering Conference (APSEC) (2017), 199--208.

[58]

Sarah Rastkar and Gail C. Murphy. 2013. Why did this code change?. In 2013 35th International Conference on Software Engineering (ICSE). 1193--1196.

[59]

Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sidney K. D'Mello. 2014. Improving automated source code summarization via an eye-tracking study of programmers. Proceedings of the 36th International Conference on Software Engineering (2014).

Digital Library

[60]

Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. CoRR abs/1704.04368 (2017). arXiv:1704.04368 http://arxiv.org/abs/1704.04368

[61]

Jinfeng Shen, Xiaobing Sun, Bin Li, Hui Yang, and Jiajun Hu. 2016. On Automatic Summarization of What and Why Information in Source Code Changes. 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) 1 (2016), 103--112.

[62]

Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic Source Code Summarization with Extended Tree-LSTM. CoRR abs/1906.08094 (2019). arXiv:1906.08094 http://arxiv.org/abs/1906.08094

[63]

Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori L. Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for Java methods. Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (2010).

Digital Library

[64]

Matú Sulír and Jaroslav Porubän. 2017. Source Code Documentation Generation Using Program Execution. Inf. 8 (2017), 148.

[65]

Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. ArXiv abs/1503.00075 (2015).

[66]

Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, LiGuo Huang, Zheling Zhu, and Bin Luo. 2022. AST-Trans: Code Summarization with Efficient Tree-Structured Attention. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022), 150--162.

Digital Library

[67]

Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2021. On the Evaluation of Commit Message Generation Models: An Experimental Study. 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2021), 126--136.

[68]

Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. CoRR abs/1908.08962 (2019). arXiv:1908.08962 http://arxiv.org/abs/1908.08962

[69]

Haoye Wang, Xin Xia, D. Lo, Qiang He, Xinyu Wang, and John C. Grundy. 2021. Context-aware Retrieval-based Deep Commit Message Generation. ACM Transactions on Software Engineering and Methodology (TOSEM) 30 (2021), 1 -- 30.

Digital Library

[70]

Ruyun Wang, Hanwen Zhang, Guoliang Lu, Lei Lyu, and Chen Lyu. 2020. Fret: Functional Reinforced Transformer With BERT for Code Summarization. IEEE Access 8 (2020), 135591--135604.

[71]

Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip S. Yu, and Guandong Xu. 2022. Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention. IEEE Transactions on Software Engineering 48 (2022), 102--119.

Digital Library

[72]

Frank. Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics 1 (1945), 196--202.

[73]

Edmund Wong, Taiyue Liu, and Lin Tan. 2015. CloCom: Mining existing source code for automatic comment generation. 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2015), 380--389.

[74]

Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, and Jian Lu. 2019. Commit Message Generation for Source Code Changes. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI'19). AAAI Press, 3975--3981.

[75]

Chunyan Zhang, Qing Qing Zhou, Meng Qiao, Ke Tang, Lianqiu Xu, and Fudong Liu. 2022. Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization. Entropy 24 (2022).

[76]

Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based Neural Source Code Summarization. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) (2020), 1385--1397.

Index Terms

ESGen: Commit Message Generation Based on Edit Sequence of Code Change
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Documentation

Recommendations

Neural-machine-translation-based commit message generation: how far are we?
ASE '18: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct ...
Commit Message Generation from Code Differences using Hidden Markov Models
ICSIE '19: Proceedings of the 8th International Conference on Software and Information Engineering

Commit messages are developer-written messages that document code changes. Such change might be adding features, fixing bugs or simply code updates. Although these messages help in understanding the evolution of any software, it is quite often that ...
Multi-grained contextual code representation learning for commit message generation
Abstract
Commit messages, precisely describing the code changes for each commit in natural language, makes it possible for developers and succeeding reviewers to understand the code changes without digging into implementation details. However, the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

April 2024

487 pages

ISBN:9798400705861

DOI:10.1145/3643916

Chair:
Igor Steinmacher,
Co-chair:
Mario Linares-Vasquez,
Program Chair:
Kevin Patrick Moran,
Program Co-chair:
Olga Baysal

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Key R&D Program of China
Natural Science Foundation of Guangdong Province
National Natural Science Foundation of China

Conference

ICPC '24

Sponsor:

SIGSOFT

ICPC '24: 32nd IEEE/ACM International Conference on Program Comprehension

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
98
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)4

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten