research-article

ASKDetector: An AST-Semantic and Key Features Fusion based Code Comment Mismatch Detector

Authors:

Li KuangAuthors Info & Claims

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

Pages 392 - 402

https://doi.org/10.1145/3643916.3644405

Published: 13 June 2024 Publication History

Abstract

Code comments are essential for programming comprehension. Nevertheless, developers often neglect to update comments after modifying the source code. Wrong code comments may lead to bugs in the maintenance process, thus affecting the reliability of the software. So, timely comment mismatch detection is crucial for software development and maintenance. However, existing works have the following two limitations: 1) the lack of use of code structural and sequential information, and 2) the ignorance of existing associations between code and comments. In this paper, we propose a new model called ASKDetector (AST-Semantic and Key features fusion based mismatch Detector). For the first limitation, we encode code with an attention-based preorder traversal abstract syntax tree sequence to obtain both order and structural information. And CodeBERT is utilized to capture contextual semantic features further. For the second one, we encode extracted association information between the code snippets and comments to reduce the semantic gap. The correlations between the encoders are learned through a fusion layer and a multi-layer perceptron. The experimental results prove that our detector outperforms the state-of-the-art model in evaluation metrics, where our F1 and accuracy exceed an average of 3.4%.

References

[1]

Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International conference on machine learning. PMLR, PMLR, New York, New York, USA, 2091--2100.

[2]

Saikat Chakraborty and Baishakhi Ray. 2021. On Multi-Modal Learning of Editing Source Code. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, IEEE, Los Alamitos, CA, USA, 443--455.

[3]

Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li. 2021. Why my code summarization model does not work: Code comment improvement with category prediction. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1--29.

Digital Library

[4]

Anna Corazza, Valerio Maggio, and Giuseppe Scanniello. 2018. Coherence of comments and method implementations: a dataset and an empirical investigation. Software Quality Journal 26, 2 (2018), 751--777.

Digital Library

[5]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 abs/1810.04805 (2018), 1--16.

[6]

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1536--1547.

[7]

Beat Fluri, Michael Wursch, and Harald C Gall. 2007. Do code and comments co-evolve? on the relation between source code and comment changes. In 14th Working Conference on Reverse Engineering (WCRE 2007). IEEE, IEEE Computer Society, USA, 70--79.

Digital Library

[8]

Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Thomas Zimmermann. 2021. Automating the removal of obsolete TODO comments. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 218--229.

Digital Library

[9]

Fan Ge and Li Kuang. 2021. Keywords guided method name generation. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, IEEE, Los Alamitos, CA, USA, 196--206.

[10]

Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler. 2002. A system and language for building system-specific, static analyses. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language Design and Implementation. Association for Computing Machinery, New York, NY, USA, 69--82.

Digital Library

[11]

Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. Cc2vec: Distributed representations of code changes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 518--529.

Digital Library

[12]

Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). IEEE, Association for Computing Machinery, New York, NY, USA, 200--20010.

Digital Library

[13]

Yuan Huang, Nan Jia, Qiang Zhou, Xiangping Chen, Yingfei Xiong, and Xiaonan Luo. 2018. Guiding developers to make informative commenting decisions in source code. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. Association for Computing Machinery, New York, NY, USA, 260--261.

Digital Library

[14]

Walid M Ibrahim, Nicolas Bettenburg, Bram Adams, and Ahmed E Hassan. 2012. On the relationship between comment update practices and software bugs. Journal of Systems and Software 85, 10 (2012), 2293--2304.

Digital Library

[15]

Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. Association for the Advancement of Artificial Intelligence, Menlo Park, CA, USA, 8018--8025.

[16]

Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code prediction by feeding trees to transformers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, IEEE Press, Los Alamitos, CA, USA, 150--162.

Digital Library

[17]

Eriks Klotins, Michael Unterkalmsteiner, and Tony Gorschek. 2019. Software engineering in start-up companies: An analysis of 88 experience reports. Empirical Software Engineering 24, 1 (2019), 68--102.

Digital Library

[18]

Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 306--315.

Digital Library

[19]

Bo Lin, Shangwen Wang, Kui Liu, Xiaoguang Mao, and Tegawendé F Bissyandé. 2021. Automated Comment Update: How Far are We?. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, IEEE, Los Alamitos, CA, USA, 36--46.

[20]

Bo Lin, Shangwen Wang, Zhongxin Liu, Xin Xia, and Xiaoguang Mao. 2022. Predictive Comment Updating with Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach. IEEE Transactions on Software Engineering 49, 4 (2022), 1640--1660.

Digital Library

[21]

Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3730--3740.

[22]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 abs/1907.11692 (2019), 1--13.

[23]

Zhiyong Liu, Huanchao Chen, Xiangping Chen, Xiaonan Luo, and Fan Zhou. 2018. Automatic detection of outdated comments during code changes. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. IEEE, IEEE Computer Society, Los Alamitos, CA, USA, 154--163.

[24]

Zhongxin Liu, Xin Xia, Meng Yan, and Shanping Li. 2020. Automating just-in-time comment updating. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA, 585--597.

Digital Library

[25]

Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 4 (2014), 1--37.

Digital Library

[26]

Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, and Raymond J Mooney. 2021. Deep Just-In-Time Inconsistency Detection Between Comments and Source Code. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. Association for the Advancement of Artificial Intelligence, Menlo Park, CA, USA, 427--435.

[27]

Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, and Raymond Mooney. 2020. Learning to Update Natural Language Comments Based on Code Changes. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1853--1868.

[28]

Mikhail Pravilov, Egor Bogomolov, Yaroslav Golubev, and Timofey Bryksin. 2021. Unsupervised learning of general-purpose embeddings for code changes. In Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution. Association for Computing Machinery, New York, NY, USA, 7--12.

Digital Library

[29]

Fazle Rabbi and Md Saeed Siddik. 2020. Detecting code comment inconsistency using siamese recurrent network. In Proceedings of the 28th International Conference on Program Comprehension. Association for Computing Machinery, New York, NY, USA, 371--375.

Digital Library

[30]

Inderjot Kaur Ratol and Martin P Robillard. 2017. Detecting fragile comments. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, IEEE, Los Alamitos, CA, USA, 112--122.

[31]

Daniela Steidl, Benjamin Hummel, and Elmar Juergens. 2013. Quality analysis of source code comments. In 2013 21st international conference on program comprehension (icpc). Ieee, IEEE, Los Alamitos, CA, USA, 83--92.

[32]

Nataliia Stulova, Arianna Blasi, Alessandra Gorla, and Oscar Nierstrasz. 2020. Towards detecting inconsistent comments in java source code automatically. In 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, IEEE Computer Society, Los Alamitos, CA, USA, 65--69.

[33]

Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: Bugs or bad comments?*. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. Association for Computing Machinery, New York, NY, USA, 145--158.

Digital Library

[34]

Lin Tan, Ding Yuan, and Yuanyuan Zhou. 2007. Hotcomments: how to make program comments more useful?. In Proceedings of the 11th USENIX Workshop on Hot Topics in Operating Systems, Vol. 7. USENIX Association, USA, 49--54.

[35]

Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In 2011 33rd International Conference on Software Engineering (ICSE). IEEE, Association for Computing Machinery, New York, NY, USA, 11--20.

Digital Library

[36]

Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T Leavens. 2012. @ tcomment: Testing javadoc comments to detect comment-code inconsistencies. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. IEEE, IEEE, Los Alamitos, CA, USA, 260--269.

Digital Library

[37]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017), 1--11.

[38]

Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A large-scale empirical study on code-comment inconsistencies. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, IEEE Press, Los Alamitos, CA, USA, 53--64.

Digital Library

[39]

Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering 44, 10 (2017), 951--976.

Digital Library

[40]

Ziyu Yao, Frank F Xu, Pengcheng Yin, Huan Sun, and Graham Neubig. 2021. Learning Structural Edits via Incremental Tree Transformations. In International Conference on Learning Representations. http://OpenReview.net, online, 1--20.

[41]

Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations. http://OpenReview.net, online, 1--43.

[42]

Hongquan Zhu, Xincheng He, and Lei Xu. 2022. HatCUP: hybrid analysis and attention based just-in-time comment updating. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. Association for Computing Machinery, New York, NY, USA, 619--630.

Digital Library

Index Terms

ASKDetector: An AST-Semantic and Key Features Fusion based Code Comment Mismatch Detector
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Software management
        Software maintenance
2. Software and its engineering
  1. Software organization and properties
    1. Extra-functional properties
      1. Software reliability

Recommendations

A Survey on Research of Code Comment
ICMSS 2019: Proceedings of the 2019 3rd International Conference on Management Engineering, Software Engineering and Service Sciences

Code comments are one of the effective means for assisting programmers to understand the source code. High-quality code comments play an important role in areas such as software maintenance and software reuse. Good code comments can help programmers ...
Code Clone Graph Metrics for Detecting Diffused Code Clones
APSEC '09: Proceedings of the 2009 16th Asia-Pacific Software Engineering Conference

Code clones (duplicated source code in a software system) are one of the major factors in decreasing maintainability. Many code clone detection methods have been proposed to find code clones automatically from large-scale software. However, it is still ...
Detecting Code Comment Inconsistency using Siamese Recurrent Network
ICPC '20: Proceedings of the 28th International Conference on Program Comprehension

Comments are the internal documentation of corresponding code blocks, which are essential to understand and maintain a software. In large scale software development, developers need to analyze existing codes, where comments assist better readability. In ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension

April 2024

487 pages

ISBN:9798400705861

DOI:10.1145/3643916

Chair:
Igor Steinmacher,
Co-chair:
Mario Linares-Vasquez,
Program Chair:
Kevin Patrick Moran,
Program Co-chair:
Olga Baysal

Copyright © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

CCF-Tencent Rhino-Bird Open Research Fund
High Performance Computing Center of Central South University, PR China

Conference

ICPC '24

Sponsor:

SIGSOFT

ICPC '24: 32nd IEEE/ACM International Conference on Program Comprehension

April 15 - 16, 2024

Lisbon, Portugal

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
98
Total Downloads

Downloads (Last 12 months)98
Downloads (Last 6 weeks)14

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten