Toward an accurate method renaming approach via structural and lexical analyses

Luo, Junpeng; Zhang, Jingxuan; Huang, Zhiqiu; Xu, Yong; Sun, Chenxing

doi:10.1631/FITEE.2100470

Toward an accurate method renaming approach via structural and lexical analyses

一种基于结构和词汇分析的精确重命名方法

Research Article
Published: 25 May 2022

Volume 23, pages 732–748, (2022)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Junpeng Luo (骆君鹏)¹,
Jingxuan Zhang (张静宣) ORCID: orcid.org/0000-0002-8437-6640^1,2,
Zhiqiu Huang (黄志球)¹,
Yong Xu (徐勇)³ &
…
Chenxing Sun (孙辰星)³

85 Accesses
Explore all metrics

Abstract

Methods in programs must be accurately named to facilitate source code analysis and comprehension. With the evolution of software, method names may be inconsistent with their implemented method bodies, leading to inaccurate or buggy method names. Debugging method names remains an important topic in the literature. Although researchers have proposed several approaches to suggest accurate method names once the method bodies have been modified, two main drawbacks remain to be solved: there is no analysis of method name structure, and the programming context information is not captured efficiently. To resolve these drawbacks and suggest more accurate method names, we propose a novel automated approach based on the analysis of the method name structure and lexical analysis with the programming context information. Our approach first leverages deep feature representation to embed method names and method bodies in vectors. Then, it obtains useful verb-tokens from a large method corpus through structural analysis and noun-tokens from method bodies through lexical analysis. Finally, our approach dynamically combines these tokens to form and recommend high-quality and project-specific method names. Experimental results over 2111 Java testing methods show that the proposed approach can achieve a Hit Ratio, or Hit@5, of 33.62% and outperform the state-of-the-art approach by 14.12% in suggesting accurate method names. We also demonstrate the effectiveness of structural and lexical analyses in our approach.

摘要

程序中的方法必须准确命名, 以便于源代码分析和理解。随着软件的演变, 方法名称可能与其实现的方法体不一致, 导致方法名称不准确或有缺陷。调试方法名称仍然是文献中的一个重要主题。尽管研究人员已提出一些方法, 用于在方法体被修改后给出准确的方法名称建议, 但有两个主要不足仍待解决: 对方法名称结构未加以分析, 且未能有效捕获编程环境上下文信息。为避免上述不足, 并给出更准确的方法名称建议, 提出一种基于方法名称结构分析和编程上下文信息词法分析的新颖自动化方法。首先, 利用深层特征表示, 将方法名称和方法体嵌入向量中; 然后, 通过结构分析从大型方法语料库中获取有用的动词标记, 通过词汇分析从方法体中获取名词标记; 最后, 动态结合这些标记, 形成并推荐高质量和特定于项目的方法名称。在2111个Java测试方法上的实验结果表明, 所提方法可以达到33.62%的命中率 (Hit@5), 并且在建议准确方法名称方面比最先进的方法高出14.12%。此外, 展示了所提方法对结构和词汇分析的有效性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A controlled experiment of different code representations for learning-based program repair

Article 03 October 2022

Marjane Namavar, Noor Nashid & Ali Mesbah

Automated variable renaming: are we there yet?

Article Open access 14 February 2023

Antonio Mastropaolo, Emad Aghajani, … Gabriele Bavota

Universal Representation for Code

References

Abebe SL, Tonella P, 2013. Automated identifier completion and replacement. Proc 17^th European Conf on Software Maintenance and Reengineering, p.263–272. https://doi.org/10.1109/CSMR.2013.35
Abebe SL, Haiduc S, Tonella P, et al., 2011. The effect of lexicon bad smells on concept location in source code. Proc 11^th Int Working Conf on Source Code Analysis and Manipulation, p.125–134. https://doi.org/10.1109/SCAM.2011.18
Abebe SL, Arnaoudova V, Tonella P, et al., 2012. Can lexicon bad smells improve fault prediction? Proc 19^th Working Conf on Reverse Engineering, p.235–244. https://doi.org/10.1109/WCRE.2012.33
Allamanis M, Barr ET, Bird C, et al., 2014. Learning natural coding conventions. Proc 22^nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.281–293. https://doi.org/10.1145/2635868.2635883
Allamanis M, Barr ET, Bird C, et al., 2015. Suggesting accurate method and class names. Proc 10^th Joint Meeting on Foundations of Software Engineering, p.38–49. https://doi.org/10.1145/2786805.2786849
Allamanis M, Peng H, Sutton C, 2016. A convolutional attention network for extreme summarization of source code. Proc 33^rd Int Conf on Machine Learning, p.2091–2100.
Amann S, Nguyen HA, Nadi S, et al., 2019. A systematic evaluation of static API-misuse detectors. IEEE Trans Softw Eng, 45(12):1170–1188. https://doi.org/10.1109/TSE.2018.2827384
Article Google Scholar
Arnaoudova V, Eshkevari LM, di Penta M, et al., 2014. REPENT: analyzing the nature of identifier renamings. IEEE Trans Softw Eng, 40(5):502–532. https://doi.org/10.1109/TSE.2014.2312942
Article Google Scholar
Binkley D, Hearn M, Lawrie D, 2011. Improving identifier informativeness using part of speech information. Proc 8^th Working Conf on Mining Software Repositories, p.203–206. https://doi.org/10.1145/1985441.1985471
Butler S, 2012. Mining Java class identifier naming conventions. Proc 34^th Int Conf on Software Engineering, p.1641–1643. https://doi.org/10.1109/ICSE.2012.6227216
Butler S, 2016. Analysing Java Identifier Names. PhD Thesis, the Open University, England Birmingham, UK.
Google Scholar
Butler S, Wermelinger M, Yu YJ, et al., 2009. Relating identifier naming flaws and code quality: an empirical study. Proc 16^th Working Conf on Reverse Engineering, p.31–35. https://doi.org/10.1109/WCRE.2009.50
Butler S, Wermelinger M, Yu YJ, et al., 2010. Exploring the influence of identifier names on code quality: an empirical study. Proc 14^th European Conf on Software Maintenance and Reengineering, p.156–165. https://doi.org/10.1109/CSMR.2010.27
Butler S, Wermelinger M, Yu YJ, et al., 2011. Mining Java class naming conventions. Proc 27^th IEEE Int Conf on Software Maintenance, p.93–102. https://doi.org/10.1109/ICSM.2011.6080776
Butler S, Wermelinger M, Yu YJ, et al., 2013. INVocD: identifier name vocabulary dataset. Proc 10^th Working Conf on Mining Software Repositories, p.405–408. https://doi.org/10.1109/MSR.2013.6624056
Caprile B, Tonella P, 1999. Nomen est omen: analyzing the language of function identifiers. Proc 6^th Working Conf on Reverse Engineering, p.112–122. https://doi.org/10.1109/WCRE.1999.806952
Caprile B, Tonella P, 2000. Restructuring program identifier names. Proc Int Conf on Software Maintenance, p.97–107. https://doi.org/10.1109/ICSM.2000.883022
Corbo F, del Grosso C, di Penta M, 2007. Smart formatter: learning coding style from existing source code. Proc IEEE Int Conf on Software Maintenance, p.525–526. https://doi.org/10.1109/ICSM.2007.4362682
Gosling J, Joy B, Steele G, et al., 2005. The Java™ Language Specification (3^rd Ed.). Addison-Wesley, New York, USA.
MATH Google Scholar
Hindle A, Barr ET, Su ZD, et al., 2012. On the naturalness of software. Proc 34^th Int Conf on Software Engineering, p.837–847. https://doi.org/10.1109/ICSE.2012.6227135
Høst EW, Østvold BM, 2009. Debugging method names. Proc 23^rd European Conf on Object-Oriented Programming, p.294–317. https://doi.org/10.1007/978-3-642-03013-0_14
Kim S, Kim D, 2016. Automatic identifier inconsistency detection using code dictionary. Empir Softw Eng, 21(2):565–604. https://doi.org/10.1007/s10664-015-9369-5
Article Google Scholar
Kim Y, 2014. Convolutional neural networks for sentence classification. Proc Conf Empirical Methods in Natural Language Processing, p.1746–1751. https://doi.org/10.3115/v1/D14-1181
Lawrie D, Morrell C, Feild H, et al., 2006. What’s in a name? A study of identifiers. Proc 14^th IEEE Int Conf on Program Comprehension, p.3–12. https://doi.org/10.1109/ICPC.2006.51
Le Q, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31^st Int Conf on Machine Learning, p.II-1188–II-1196.
Li GJ, Liu H, Nyamawe AS, 2021. A survey on renamings of software entities. ACM Comput Surv, 53(2):41. https://doi.org/10.1145/3379443
Article Google Scholar
Lin B, Scalabrino S, Mocci A, et al., 2017. Investigating the use of code analysis and NLP to promote a consistent usage of identifiers. Proc 17^th Int Working Conf on Source Code Analysis and Manipulation, p.81–90. https://doi.org/10.1109/SCAM.2017.17
Liu H, Liu QR, Liu Y, et al., 2015. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Softw Eng, 41(9):887–900. https://doi.org/10.1109/TSE.2015.2427831
Article Google Scholar
Liu K, Kim D, Bissyandé TF, et al., 2019. Learning to spot and refactor inconsistent method names. Proc 41^st Int Conf on Software Engineering, p.1–12. https://doi.org/10.1109/ICSE.2019.00019
Liu K, Kim D, Bissyandé TF, et al., 2021. Mining fix patterns for FindBugs violations. IEEE Trans Softw Eng, 47(1):165–188. https://doi.org/10.1109/TSE.2018.2884955
Article Google Scholar
Matsugu M, Mori K, Mitari Y, et al., 2003. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neur Netw, 16(5–6):555–559. https://doi.org/10.1016/S0893-6080(03)00115-1
Article Google Scholar
Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781
Nguyen TT, Nguyen AT, Nguyen HA, et al., 2013. A statistical semantic language model for source code. Proc 9^th Joint Meeting on Foundations of Software Engineering, p.532–542. https://doi.org/10.1145/2491411.2491458
Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc 14^th Int Working Conf on Source Code Analysis and Manipulation, p.285–294. https://doi.org/10.1109/SCAM.2014.15
Suzuki T, Sakamoto K, Ishikawa F, et al., 2014. An approach for evaluating and suggesting method names using n-gram models. Proc 22^nd Int Conf on Program Comprehension, p.271–274. https://doi.org/10.1145/2597008.2597797
Takang AA, Grubb PA, Macredie RD, 1996. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang, 4:143–167.
Google Scholar
Wang S, Liu TY, Tan L, 2016. Automatically learning semantic features for defect prediction. Proc 38^th Int Conf on Software Engineering, p.297–308. https://doi.org/10.1145/2884781.2884804
White M, Tufano M, Vendome C, et al., 2016. Deep learning code fragments for code clone detection. Proc 31^st IEEE/ACM Int Conf on Automated Software Engineering, p.87–98.
Yu SS, Zhang RC, Guan JH, 2012. Properly and automatically naming Java methods: a machine learning based approach. Proc 8^th Int Conf on Advanced Data Mining and Applications, p.235–246. https://doi.org/10.1007/978-3-642-35527-1_20

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
Junpeng Luo (骆君鹏), Jingxuan Zhang (张静宣) & Zhiqiu Huang (黄志球)
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, 211106, China
Jingxuan Zhang (张静宣)
Tencent Technology (Shenzhen) Company Limited, Shenzhen, 518054, China
Yong Xu (徐勇) & Chenxing Sun (孙辰星)

Authors

Junpeng Luo (骆君鹏)
View author publications
You can also search for this author in PubMed Google Scholar
Jingxuan Zhang (张静宣)
View author publications
You can also search for this author in PubMed Google Scholar
Zhiqiu Huang (黄志球)
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xu (徐勇)
View author publications
You can also search for this author in PubMed Google Scholar
Chenxing Sun (孙辰星)
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Junpeng LUO and Jingxuan ZHANG designed the research. Junpeng LUO processed the data. Junpeng LUO and Jingxuan ZHANG drafted the paper. Yong XU and Chenxing SUN helped organize the paper. Jingxuan ZHANG and Zhiqiu HUANG revised and finalized the paper.

Corresponding author

Correspondence to Jingxuan Zhang (张静宣).

Additional information

Compliance with ethics guidelines

Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, and Chenxing SUN declare that they have no conflict of interest.

Project supported by the National Natural Science Foundation of China (Nos. 61902181 and 62002161), the China Postdoctoral Science Foundation (No. 2020M671489), the CCF-Tencent Open Research Fund (No. RAGR20200106), and the Nanjing University of Aeronautics and Astronautics Postgraduate Research and Practice Innovation Program (No. xcxjh20211612)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luo, J., Zhang, J., Huang, Z. et al. Toward an accurate method renaming approach via structural and lexical analyses. Front Inform Technol Electron Eng 23, 732–748 (2022). https://doi.org/10.1631/FITEE.2100470

Download citation

Received: 30 September 2021
Accepted: 28 February 2022
Published: 25 May 2022
Issue Date: May 2022
DOI: https://doi.org/10.1631/FITEE.2100470

Keywords

关键词

CLC number

TP311

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Toward an accurate method renaming approach via structural and lexical analyses

Abstract

摘要

Access this article

Similar content being viewed by others

A controlled experiment of different code representations for learning-based program repair

Automated variable renaming: are we there yet?

Universal Representation for Code

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Keywords

关键词

CLC number

Navigation

Toward an accurate method renaming approach via structural and lexical analyses

Abstract

摘要

Access this article

Similar content being viewed by others

A controlled experiment of different code representations for learning-based program repair

Automated variable renaming: are we there yet?

Universal Representation for Code

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Compliance with ethics guidelines

Rights and permissions

About this article

Cite this article

Share this article

Keywords

关键词

CLC number

Search

Navigation