skip to main content
10.1145/3643916.3644414acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

ESGen: Commit Message Generation Based on Edit Sequence of Code Change

Published: 13 June 2024 Publication History

Abstract

Commit messages provide important information for comprehending the code changes, and a number of researchers try to generate commit messages by using an automatic way. These research on commit message generation has profited from the code tokens or code structures such as AST. Since the edit sequence of code change is also important for capturing the code change intent, we propose a new commit message generation method called ESGen, which extracts AST edit sequences of code changes as model input. Specifically, we employ an O(ND) difference algorithm to extract the edit sequence from AST by comparing the ASTs before and after applying the code changes. Then, we construct a Bi-Encoder, which encodes the textual information and the AST edit sequence information of code change. The experimental results show that ESGen outperforms other baseline models, improving the BLEU-4 to 15.14. Also, when applying the edit sequence to 7 baseline models, they improve the BLEU-4 scores of these models by an average of 8.5%. Additionally, a human evaluation confirmed the effectiveness of ESGen in generating commit messages.

References

[1]
Nahla J. Abid, Natalia Dragan, Michael L. Collard, and Jonathan I. Maletic. 2015. Using stereotypes in the automatic generation of natural language summaries for C++ methods. 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2015), 561--565.
[2]
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A Transformer-based Approach for Source Code Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4998--5007.
[3]
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. 2017. Learning to Represent Programs with Graphs. ArXiv abs/1711.00740 (2017).
[4]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. code2vec: learning distributed representations of code. Proceedings of the ACM on Programming Languages 3 (2018), 1 -- 29.
[5]
Ahmed Awad and Khaled Nagaty. 2019. Commit Message Generation from Code Differences Using Hidden Markov Models. In Proceedings of the 8th International Conference on Software and Information Engineering (Cairo, Egypt) (ICSIE '19). Association for Computing Machinery, New York, NY, USA, 96--99.
[6]
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In IEEvaluation@ACL.
[7]
Raymond P.L. Buse and Westley R. Weimer. 2010. Automatically Documenting Program Changes. In Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (Antwerp, Belgium) (ASE '10). Association for Computing Machinery, New York, NY, USA, 33--42.
[8]
Ruichu Cai, Zhihao Liang, Boyan Xu, Zijian Li, Yuexing Hao, and Yao Chen. 2020. TAG : Type Auxiliary Guiding for Code Comment Generation. ArXiv abs/2005.02835 (2020).
[9]
Xinyun Chen, Chang Liu, and Dawn Xiaodong Song. 2018. Tree-to-tree Neural Networks for Program Translation. In Neural Information Processing Systems.
[10]
Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. ArXiv abs/1412.3555 (2014).
[11]
James Coglan. 2019. Building Git. https://shop.jcoglan.com/building-git/
[12]
Zihang Dai, Zhilin Yang, Yiming Yang, Jaime G. Carbonell, Quoc V. Le, and Ruslan Salakhutdinov. 2019. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. ArXiv abs/1901.02860 (2019).
[13]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.
[14]
Jinhao Dong, Yiling Lou, Qihao Zhu, Zeyu Sun, Zhilin Li, Wenjie Zhang, and Dan Hao. 2022. FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022), 970--981.
[15]
Robert Dyer, Hoan Anh Nguyen, Hridesh Rajan, and Tien N. Nguyen. 2013. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In 2013 35th International Conference on Software Engineering (ICSE). 422--431.
[16]
Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-Sequence Attentional Neural Machine Translation. ArXiv abs/1603.06075 (2016).
[17]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. ArXiv abs/2002.08155 (2020).
[18]
Patrick Fernandes, Miltiadis Allamanis, and Marc Brockschmidt. 2018. Structured Neural Summarization. ArXiv abs/1811.01824 (2018).
[19]
Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Jian Yin, Daxin Jiang, and M. Zhou. 2020. GraphCodeBERT: Pre-training Code Representations with Data Flow. ArXiv abs/2009.08366 (2020).
[20]
Hanyang Guo, Xiangping Chen, Yuan Huang, Yanlin Wang, Xi Ding, Zibin Zheng, Xiaocong Zhou, and Hong-Ning Dai. 2023. Snippet Comment Generation Based on Code Context Expansion. ACM Transactions on Software Engineering and Methodology 33, 1 (2023), 1--30.
[21]
Sonia Haiduc, Jairo Aponte, and Andrian Marcus. 2010. Supporting program comprehension with source code summarization. 2010 ACM/IEEE 32nd International Conference on Software Engineering 2 (2010), 223--226.
[22]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Computation 9 (1997), 1735--1780.
[23]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep Code Comment Generation. In Proceedings of the 26th Conference on Program Comprehension (Gothenburg, Sweden) (ICPC '18). Association for Computing Machinery, New York, NY, USA, 200--210.
[24]
Xing Hu, Ge Li, Xin Xia, D. Lo, and Zhi Jin. 2019. Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25 (2019), 2179--2217.
[25]
Yuan Huang, Hanyang Guo, Xi Ding, Junhuai Shu, Xiangping Chen, Xiapu Luo, Zibin Zheng, and Xiaocong Zhou. 2023. A Comparative Study on Method Comment and Inline Comment. ACM Trans. Softw. Eng. Methodol. 32, 5, Article 126 (jul 2023), 26 pages.
[26]
Yuan Huang, Xinyu Hu, Nan Jia, Xiangping Chen, Zibin Zheng, and Xiapu Luo. 2020. CommtPst: Deep learning source code for commenting positions prediction. Journal of Systems and Software 170 (2020), 110754.
[27]
Yuan Huang, Shaohao Huang, Huanchao Chen, Xiangping Chen, Zibin Zheng, Xiapu Luo, Nan Jia, Xinyu Hu, and Xiaocong Zhou. 2020. Towards automatically generating block comments for code snippets. Information and Software Technology 127 (07 2020), 106373.
[28]
Yuan Huang, Nan Jia, Xiangping Chen, Kai Hong, and Zibin Zheng. 2020. Code review knowledge perception: Fusing multi-features for salient-class location. IEEE Transactions on Software Engineering 48, 5 (2020), 1463--1479.
[29]
Yuan Huang, Nan Jia, Hao-Jie Zhou, Xiang-Ping Chen, Zi-Bin Zheng, and Ming-Dong Tang. 2020. Learning Human-Written Commit Messages to Document Code Changes. J. Comput. Sci. Technol. 35, 6 (nov 2020), 1258--1277.
[30]
Yuan Huang, Jinyu Jiang, Xiapu Luo, Xiangping Chen, Zibin Zheng, Nan Jia, and Gang Huang. 2021. Change-patterns mapping: A boosting way for change impact analysis. IEEE Transactions on Software Engineering 48, 7 (2021), 2376--2398.
[31]
Yuan Huang, Qiaoyang Zheng, Xiangping Chen, Yingfei Xiong, Zhiyong Liu, and Xiaonan Luo. 2017. Mining version control system for automatically generating commit comment. In 2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 414--423.
[32]
James W. Hunt and Thomas G. Szymanski. 1977. A fast algorithm for computing longest common subsequences. Commun. ACM 20 (1977), 350--353.
[33]
Srini Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (2016).
[34]
Siyuan Jiang, Ameer Armaly, and Collin McMillan. 2017. Automatically generating commit messages from diffs using neural machine translation. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). 135--146.
[35]
Siyuan Jiang and Collin McMillan. 2017. Towards Automatic Generation of Short Summaries of Commits. In 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC). 320--323.
[36]
Alexander LeClair, Aakash Bansal, and Collin McMillan. 2021. Ensemble Models for Neural Source Code Summarization of Subroutines. 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2021), 286--297.
[37]
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved Code Summarization via a Graph Neural Network. CoRR abs/2004.02843 (2020). arXiv:2004.02843 https://arxiv.org/abs/2004.02843
[38]
Alexander LeClair, Siyuan Jiang, and Collin McMillan. 2019. A Neural Model for Generating Natural Language Summaries of Program Subroutines. 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) (2019), 795--806.
[39]
Bo Lin, Shangwen Wang, Zhongxin Liu, Yepang Liu, Xin Xia, and Xiaoguang Mao. 2023. CCT5: A Code-Change-Oriented Pre-Trained Model. In Proceedings of the 31th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering.
[40]
C. Y. Lin. 2004. ROUGE: A Package for Automatic Evaluation of summaries. In In Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004).
[41]
Chin-Yew Lin and Franz Josef Och. 2004. Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics. In Annual Meeting of the Association for Computational Linguistics.
[42]
Mario Linares-Vásquez, Luis Fernando Cortés-Coy, Jairo Aponte, and Denys Poshyvanyk. 2015. ChangeScribe: A Tool for Automatically Generating Commit Messages. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 2. 709--712.
[43]
Qin Liu, Zihe Liu, Hongming Zhu, Hongfei Fan, Bowen Du, and Yu Qian. 2019. Generating Commit Messages from Diffs using Pointer-Generator Network. 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) (2019), 299--309.
[44]
Shangqing Liu, Cuiyun Gao, Sen Chen, Lun Yiu Nie, and Yang Liu. 2019. ATOM: Commit Message Generation Based on Abstract Syntax Tree and Hybrid Ranking. IEEE Transactions on Software Engineering 48 (2019), 1800--1817.
[45]
Zhongxin Liu, Zhijie Tang, Xin Xia, and Xiaohu Yang. 2023. CCRep: Learning Code Change Representations via Pre-Trained Code Model and Query Back. In Proceedings of the 45th International Conference on Software Engineering (Melbourne, Victoria, Australia) (ICSE '23). IEEE Press, 17--29.
[46]
Zhongxin Liu, Xin Xia, Ahmed E. Hassan, David Lo, Zhenchang Xing, and Xinyu Wang. 2018. Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?. In 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE). 373--384.
[47]
Pablo Loyola, Edison Marrese-Taylor, and Yutaka Matsuo. 2017. A Neural Architecture for Generating Natural Language Descriptions from Source Code Changes. ArXiv abs/1704.04856 (2017).
[48]
Rocío Cabrera Lozoya, Arnaud Baumann, Antonino Sabetta, and Michele Bezzi. 2019. Commit2Vec: Learning Distributed Representations of Code Changes. SN Computer Science 2 (2019).
[49]
Walid Maalej and Hans-Jörg Happel. 2010. Can development work describe itself?. In 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010). 191--200.
[50]
Paul W. McBurney, Cheng Liu, Collin McMillan, and Tim Weninger. 2014. Improving topic model source code summarization. In IEEE International Conference on Program Comprehension.
[51]
Laura Moreno, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, Andrian Marcus, and Gerardo Canfora. 2014. Automatic Generation of Release Notes. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (Hong Kong, China) (FSE 2014). Association for Computing Machinery, New York, NY, USA, 484--495.
[52]
Eugene W. Myers. 2023. AnO(ND) Difference Algorithm and Its Variations. Algorithmica 1, 1--4 (mar 2023), 251--266.
[53]
Lun Yiu Nie, Cuiyun Gao, Zhicong Zhong, Wai Lam, Yang Liu, and Zenglin Xu. 2021. CoreGen: Contextualized Code Representation Learning for Commit Message Generation. Neurocomputing 459 (2021), 97--107.
[54]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Annual Meeting of the Association for Computational Linguistics.
[55]
Chris Parnin and Carsten Görg. 2008. Improving change descriptions with change contexts. In IEEE Working Conference on Mining Software Repositories.
[56]
Nicolae-Teodor Pavel and Traian Rebedea. 2021. A Sketch-Based Neural Model for Generating Commit Messages from Diffs. ArXiv abs/2104.04087 (2021).
[57]
Sawan Rai, Tejaswini Gaikwad, Sparshi Jain, and Atul Gupta. 2017. Method Level Text Summarization for Java Code Using Nano-Patterns. 2017 24th Asia-Pacific Software Engineering Conference (APSEC) (2017), 199--208.
[58]
Sarah Rastkar and Gail C. Murphy. 2013. Why did this code change?. In 2013 35th International Conference on Software Engineering (ICSE). 1193--1196.
[59]
Paige Rodeghero, Collin McMillan, Paul W. McBurney, Nigel Bosch, and Sidney K. D'Mello. 2014. Improving automated source code summarization via an eye-tracking study of programmers. Proceedings of the 36th International Conference on Software Engineering (2014).
[60]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. CoRR abs/1704.04368 (2017). arXiv:1704.04368 http://arxiv.org/abs/1704.04368
[61]
Jinfeng Shen, Xiaobing Sun, Bin Li, Hui Yang, and Jiajun Hu. 2016. On Automatic Summarization of What and Why Information in Source Code Changes. 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC) 1 (2016), 103--112.
[62]
Yusuke Shido, Yasuaki Kobayashi, Akihiro Yamamoto, Atsushi Miyamoto, and Tadayuki Matsumura. 2019. Automatic Source Code Summarization with Extended Tree-LSTM. CoRR abs/1906.08094 (2019). arXiv:1906.08094 http://arxiv.org/abs/1906.08094
[63]
Giriprasad Sridhara, Emily Hill, Divya Muppaneni, Lori L. Pollock, and K. Vijay-Shanker. 2010. Towards automatically generating summary comments for Java methods. Proceedings of the 25th IEEE/ACM International Conference on Automated Software Engineering (2010).
[64]
Matú Sulír and Jaroslav Porubän. 2017. Source Code Documentation Generation Using Program Execution. Inf. 8 (2017), 148.
[65]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. ArXiv abs/1503.00075 (2015).
[66]
Ze Tang, Xiaoyu Shen, Chuanyi Li, Jidong Ge, LiGuo Huang, Zheling Zhu, and Bin Luo. 2022. AST-Trans: Code Summarization with Efficient Tree-Structured Attention. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) (2022), 150--162.
[67]
Wei Tao, Yanlin Wang, Ensheng Shi, Lun Du, Hongyu Zhang, Dongmei Zhang, and Wenqiang Zhang. 2021. On the Evaluation of Commit Message Generation Models: An Experimental Study. 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME) (2021), 126--136.
[68]
Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. CoRR abs/1908.08962 (2019). arXiv:1908.08962 http://arxiv.org/abs/1908.08962
[69]
Haoye Wang, Xin Xia, D. Lo, Qiang He, Xinyu Wang, and John C. Grundy. 2021. Context-aware Retrieval-based Deep Commit Message Generation. ACM Transactions on Software Engineering and Methodology (TOSEM) 30 (2021), 1 -- 30.
[70]
Ruyun Wang, Hanwen Zhang, Guoliang Lu, Lei Lyu, and Chen Lyu. 2020. Fret: Functional Reinforced Transformer With BERT for Code Summarization. IEEE Access 8 (2020), 135591--135604.
[71]
Wenhua Wang, Yuqun Zhang, Yulei Sui, Yao Wan, Zhou Zhao, Jian Wu, Philip S. Yu, and Guandong Xu. 2022. Reinforcement-Learning-Guided Source Code Summarization Using Hierarchical Attention. IEEE Transactions on Software Engineering 48 (2022), 102--119.
[72]
Frank. Wilcoxon. 1945. Individual Comparisons by Ranking Methods. Biometrics 1 (1945), 196--202.
[73]
Edmund Wong, Taiyue Liu, and Lin Tan. 2015. CloCom: Mining existing source code for automatic comment generation. 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER) (2015), 380--389.
[74]
Shengbin Xu, Yuan Yao, Feng Xu, Tianxiao Gu, Hanghang Tong, and Jian Lu. 2019. Commit Message Generation for Source Code Changes. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (Macao, China) (IJCAI'19). AAAI Press, 3975--3981.
[75]
Chunyan Zhang, Qing Qing Zhou, Meng Qiao, Ke Tang, Lianqiu Xu, and Fudong Liu. 2022. Re_Trans: Combined Retrieval and Transformer Model for Source Code Summarization. Entropy 24 (2022).
[76]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-based Neural Source Code Summarization. 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE) (2020), 1385--1397.

Index Terms

  1. ESGen: Commit Message Generation Based on Edit Sequence of Code Change

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension
    April 2024
    487 pages
    ISBN:9798400705861
    DOI:10.1145/3643916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2024

    Check for updates

    Author Tags

    1. commit message generation
    2. code change
    3. edit sequence
    4. biencoder
    5. abstract syntax tree

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICPC '24
    Sponsor:

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 98
      Total Downloads
    • Downloads (Last 12 months)98
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media