skip to main content
10.1145/3643916.3644405acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

ASKDetector: An AST-Semantic and Key Features Fusion based Code Comment Mismatch Detector

Published: 13 June 2024 Publication History

Abstract

Code comments are essential for programming comprehension. Nevertheless, developers often neglect to update comments after modifying the source code. Wrong code comments may lead to bugs in the maintenance process, thus affecting the reliability of the software. So, timely comment mismatch detection is crucial for software development and maintenance. However, existing works have the following two limitations: 1) the lack of use of code structural and sequential information, and 2) the ignorance of existing associations between code and comments. In this paper, we propose a new model called ASKDetector (AST-Semantic and Key features fusion based mismatch Detector). For the first limitation, we encode code with an attention-based preorder traversal abstract syntax tree sequence to obtain both order and structural information. And CodeBERT is utilized to capture contextual semantic features further. For the second one, we encode extracted association information between the code snippets and comments to reduce the semantic gap. The correlations between the encoders are learned through a fusion layer and a multi-layer perceptron. The experimental results prove that our detector outperforms the state-of-the-art model in evaluation metrics, where our F1 and accuracy exceed an average of 3.4%.

References

[1]
Miltiadis Allamanis, Hao Peng, and Charles Sutton. 2016. A convolutional attention network for extreme summarization of source code. In International conference on machine learning. PMLR, PMLR, New York, New York, USA, 2091--2100.
[2]
Saikat Chakraborty and Baishakhi Ray. 2021. On Multi-Modal Learning of Editing Source Code. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, IEEE, Los Alamitos, CA, USA, 443--455.
[3]
Qiuyuan Chen, Xin Xia, Han Hu, David Lo, and Shanping Li. 2021. Why my code summarization model does not work: Code comment improvement with category prediction. ACM Transactions on Software Engineering and Methodology (TOSEM) 30, 2 (2021), 1--29.
[4]
Anna Corazza, Valerio Maggio, and Giuseppe Scanniello. 2018. Coherence of comments and method implementations: a dataset and an empirical investigation. Software Quality Journal 26, 2 (2018), 751--777.
[5]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 abs/1810.04805 (2018), 1--16.
[6]
Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. 2020. CodeBERT: A Pre-Trained Model for Programming and Natural Languages. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 1536--1547.
[7]
Beat Fluri, Michael Wursch, and Harald C Gall. 2007. Do code and comments co-evolve? on the relation between source code and comment changes. In 14th Working Conference on Reverse Engineering (WCRE 2007). IEEE, IEEE Computer Society, USA, 70--79.
[8]
Zhipeng Gao, Xin Xia, David Lo, John Grundy, and Thomas Zimmermann. 2021. Automating the removal of obsolete TODO comments. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Association for Computing Machinery, New York, NY, USA, 218--229.
[9]
Fan Ge and Li Kuang. 2021. Keywords guided method name generation. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, IEEE, Los Alamitos, CA, USA, 196--206.
[10]
Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler. 2002. A system and language for building system-specific, static analyses. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming language Design and Implementation. Association for Computing Machinery, New York, NY, USA, 69--82.
[11]
Thong Hoang, Hong Jin Kang, David Lo, and Julia Lawall. 2020. Cc2vec: Distributed representations of code changes. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. Association for Computing Machinery, New York, NY, USA, 518--529.
[12]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep code comment generation. In 2018 IEEE/ACM 26th International Conference on Program Comprehension (ICPC). IEEE, Association for Computing Machinery, New York, NY, USA, 200--20010.
[13]
Yuan Huang, Nan Jia, Qiang Zhou, Xiangping Chen, Yingfei Xiong, and Xiaonan Luo. 2018. Guiding developers to make informative commenting decisions in source code. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings. Association for Computing Machinery, New York, NY, USA, 260--261.
[14]
Walid M Ibrahim, Nicolas Bettenburg, Bram Adams, and Ahmed E Hassan. 2012. On the relationship between comment update practices and software bugs. Journal of Systems and Software 85, 10 (2012), 2293--2304.
[15]
Di Jin, Zhijing Jin, Joey Tianyi Zhou, and Peter Szolovits. 2020. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. Association for the Advancement of Artificial Intelligence, Menlo Park, CA, USA, 8018--8025.
[16]
Seohyun Kim, Jinman Zhao, Yuchi Tian, and Satish Chandra. 2021. Code prediction by feeding trees to transformers. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, IEEE Press, Los Alamitos, CA, USA, 150--162.
[17]
Eriks Klotins, Michael Unterkalmsteiner, and Tony Gorschek. 2019. Software engineering in start-up companies: An analysis of 88 experience reports. Empirical Software Engineering 24, 1 (2019), 68--102.
[18]
Zhenmin Li and Yuanyuan Zhou. 2005. PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code. ACM SIGSOFT Software Engineering Notes 30, 5 (2005), 306--315.
[19]
Bo Lin, Shangwen Wang, Kui Liu, Xiaoguang Mao, and Tegawendé F Bissyandé. 2021. Automated Comment Update: How Far are We?. In 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC). IEEE, IEEE, Los Alamitos, CA, USA, 36--46.
[20]
Bo Lin, Shangwen Wang, Zhongxin Liu, Xin Xia, and Xiaoguang Mao. 2022. Predictive Comment Updating with Heuristics and AST-Path-Based Neural Learning: A Two-Phase Approach. IEEE Transactions on Software Engineering 49, 4 (2022), 1640--1660.
[21]
Yang Liu and Mirella Lapata. 2019. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3730--3740.
[22]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 abs/1907.11692 (2019), 1--13.
[23]
Zhiyong Liu, Huanchao Chen, Xiangping Chen, Xiaonan Luo, and Fan Zhou. 2018. Automatic detection of outdated comments during code changes. In 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Vol. 1. IEEE, IEEE Computer Society, Los Alamitos, CA, USA, 154--163.
[24]
Zhongxin Liu, Xin Xia, Meng Yan, and Shanping Li. 2020. Automating just-in-time comment updating. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. Association for Computing Machinery, New York, NY, USA, 585--597.
[25]
Walid Maalej, Rebecca Tiarks, Tobias Roehm, and Rainer Koschke. 2014. On the comprehension of program comprehension. ACM Transactions on Software Engineering and Methodology (TOSEM) 23, 4 (2014), 1--37.
[26]
Sheena Panthaplackel, Junyi Jessy Li, Milos Gligoric, and Raymond J Mooney. 2021. Deep Just-In-Time Inconsistency Detection Between Comments and Source Code. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. Association for the Advancement of Artificial Intelligence, Menlo Park, CA, USA, 427--435.
[27]
Sheena Panthaplackel, Pengyu Nie, Milos Gligoric, Junyi Jessy Li, and Raymond Mooney. 2020. Learning to Update Natural Language Comments Based on Code Changes. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1853--1868.
[28]
Mikhail Pravilov, Egor Bogomolov, Yaroslav Golubev, and Timofey Bryksin. 2021. Unsupervised learning of general-purpose embeddings for code changes. In Proceedings of the 5th International Workshop on Machine Learning Techniques for Software Quality Evolution. Association for Computing Machinery, New York, NY, USA, 7--12.
[29]
Fazle Rabbi and Md Saeed Siddik. 2020. Detecting code comment inconsistency using siamese recurrent network. In Proceedings of the 28th International Conference on Program Comprehension. Association for Computing Machinery, New York, NY, USA, 371--375.
[30]
Inderjot Kaur Ratol and Martin P Robillard. 2017. Detecting fragile comments. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, IEEE, Los Alamitos, CA, USA, 112--122.
[31]
Daniela Steidl, Benjamin Hummel, and Elmar Juergens. 2013. Quality analysis of source code comments. In 2013 21st international conference on program comprehension (icpc). Ieee, IEEE, Los Alamitos, CA, USA, 83--92.
[32]
Nataliia Stulova, Arianna Blasi, Alessandra Gorla, and Oscar Nierstrasz. 2020. Towards detecting inconsistent comments in java source code automatically. In 2020 IEEE 20th International Working Conference on Source Code Analysis and Manipulation (SCAM). IEEE, IEEE Computer Society, Los Alamitos, CA, USA, 65--69.
[33]
Lin Tan, Ding Yuan, Gopal Krishna, and Yuanyuan Zhou. 2007. /* iComment: Bugs or bad comments?*. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. Association for Computing Machinery, New York, NY, USA, 145--158.
[34]
Lin Tan, Ding Yuan, and Yuanyuan Zhou. 2007. Hotcomments: how to make program comments more useful?. In Proceedings of the 11th USENIX Workshop on Hot Topics in Operating Systems, Vol. 7. USENIX Association, USA, 49--54.
[35]
Lin Tan, Yuanyuan Zhou, and Yoann Padioleau. 2011. aComment: mining annotations from comments and code to detect interrupt related concurrency bugs. In 2011 33rd International Conference on Software Engineering (ICSE). IEEE, Association for Computing Machinery, New York, NY, USA, 11--20.
[36]
Shin Hwei Tan, Darko Marinov, Lin Tan, and Gary T Leavens. 2012. @ tcomment: Testing javadoc comments to detect comment-code inconsistencies. In 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation. IEEE, IEEE, Los Alamitos, CA, USA, 260--269.
[37]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017), 1--11.
[38]
Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A large-scale empirical study on code-comment inconsistencies. In 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC). IEEE, IEEE Press, Los Alamitos, CA, USA, 53--64.
[39]
Xin Xia, Lingfeng Bao, David Lo, Zhenchang Xing, Ahmed E Hassan, and Shanping Li. 2017. Measuring program comprehension: A large-scale field study with professionals. IEEE Transactions on Software Engineering 44, 10 (2017), 951--976.
[40]
Ziyu Yao, Frank F Xu, Pengcheng Yin, Huan Sun, and Graham Neubig. 2021. Learning Structural Edits via Incremental Tree Transformations. In International Conference on Learning Representations. http://OpenReview.net, online, 1--20.
[41]
Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. BERTScore: Evaluating Text Generation with BERT. In International Conference on Learning Representations. http://OpenReview.net, online, 1--43.
[42]
Hongquan Zhu, Xincheng He, and Lei Xu. 2022. HatCUP: hybrid analysis and attention based just-in-time comment updating. In Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension. Association for Computing Machinery, New York, NY, USA, 619--630.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICPC '24: Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension
April 2024
487 pages
ISBN:9798400705861
DOI:10.1145/3643916
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2024

Check for updates

Author Tags

  1. reliability of open source software
  2. software maintenance
  3. code comment mismatch
  4. program comprehension
  5. deep learning

Qualifiers

  • Research-article

Funding Sources

  • CCF-Tencent Rhino-Bird Open Research Fund
  • High Performance Computing Center of Central South University, PR China

Conference

ICPC '24
Sponsor:

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 98
    Total Downloads
  • Downloads (Last 12 months)98
  • Downloads (Last 6 weeks)14
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media