Skip to main content
Log in

Toward an accurate method renaming approach via structural and lexical analyses

一种基于结构和词汇分析的精确重命名方法

  • Research Article
  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Methods in programs must be accurately named to facilitate source code analysis and comprehension. With the evolution of software, method names may be inconsistent with their implemented method bodies, leading to inaccurate or buggy method names. Debugging method names remains an important topic in the literature. Although researchers have proposed several approaches to suggest accurate method names once the method bodies have been modified, two main drawbacks remain to be solved: there is no analysis of method name structure, and the programming context information is not captured efficiently. To resolve these drawbacks and suggest more accurate method names, we propose a novel automated approach based on the analysis of the method name structure and lexical analysis with the programming context information. Our approach first leverages deep feature representation to embed method names and method bodies in vectors. Then, it obtains useful verb-tokens from a large method corpus through structural analysis and noun-tokens from method bodies through lexical analysis. Finally, our approach dynamically combines these tokens to form and recommend high-quality and project-specific method names. Experimental results over 2111 Java testing methods show that the proposed approach can achieve a Hit Ratio, or Hit@5, of 33.62% and outperform the state-of-the-art approach by 14.12% in suggesting accurate method names. We also demonstrate the effectiveness of structural and lexical analyses in our approach.

摘要

程序中的方法必须准确命名, 以便于源代码分析和理解。随着软件的演变, 方法名称可能与其实现的方法体不一致, 导致方法名称不准确或有缺陷。调试方法名称仍然是文献中的一个重要主题。尽管研究人员已提出一些方法, 用于在方法体被修改后给出准确的方法名称建议, 但有两个主要不足仍待解决: 对方法名称结构未加以分析, 且未能有效捕获编程环境上下文信息。为避免上述不足, 并给出更准确的方法名称建议, 提出一种基于方法名称结构分析和编程上下文信息词法分析的新颖自动化方法。首先, 利用深层特征表示, 将方法名称和方法体嵌入向量中; 然后, 通过结构分析从大型方法语料库中获取有用的动词标记, 通过词汇分析从方法体中获取名词标记; 最后, 动态结合这些标记, 形成并推荐高质量和特定于项目的方法名称。在2111个Java测试方法上的实验结果表明, 所提方法可以达到33.62%的命中率 (Hit@5), 并且在建议准确方法名称方面比最先进的方法高出14.12%。此外, 展示了所提方法对结构和词汇分析的有效性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abebe SL, Tonella P, 2013. Automated identifier completion and replacement. Proc 17th European Conf on Software Maintenance and Reengineering, p.263–272. https://doi.org/10.1109/CSMR.2013.35

  • Abebe SL, Haiduc S, Tonella P, et al., 2011. The effect of lexicon bad smells on concept location in source code. Proc 11th Int Working Conf on Source Code Analysis and Manipulation, p.125–134. https://doi.org/10.1109/SCAM.2011.18

  • Abebe SL, Arnaoudova V, Tonella P, et al., 2012. Can lexicon bad smells improve fault prediction? Proc 19th Working Conf on Reverse Engineering, p.235–244. https://doi.org/10.1109/WCRE.2012.33

  • Allamanis M, Barr ET, Bird C, et al., 2014. Learning natural coding conventions. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.281–293. https://doi.org/10.1145/2635868.2635883

  • Allamanis M, Barr ET, Bird C, et al., 2015. Suggesting accurate method and class names. Proc 10th Joint Meeting on Foundations of Software Engineering, p.38–49. https://doi.org/10.1145/2786805.2786849

  • Allamanis M, Peng H, Sutton C, 2016. A convolutional attention network for extreme summarization of source code. Proc 33rd Int Conf on Machine Learning, p.2091–2100.

  • Amann S, Nguyen HA, Nadi S, et al., 2019. A systematic evaluation of static API-misuse detectors. IEEE Trans Softw Eng, 45(12):1170–1188. https://doi.org/10.1109/TSE.2018.2827384

    Article  Google Scholar 

  • Arnaoudova V, Eshkevari LM, di Penta M, et al., 2014. REPENT: analyzing the nature of identifier renamings. IEEE Trans Softw Eng, 40(5):502–532. https://doi.org/10.1109/TSE.2014.2312942

    Article  Google Scholar 

  • Binkley D, Hearn M, Lawrie D, 2011. Improving identifier informativeness using part of speech information. Proc 8th Working Conf on Mining Software Repositories, p.203–206. https://doi.org/10.1145/1985441.1985471

  • Butler S, 2012. Mining Java class identifier naming conventions. Proc 34th Int Conf on Software Engineering, p.1641–1643. https://doi.org/10.1109/ICSE.2012.6227216

  • Butler S, 2016. Analysing Java Identifier Names. PhD Thesis, the Open University, England Birmingham, UK.

    Google Scholar 

  • Butler S, Wermelinger M, Yu YJ, et al., 2009. Relating identifier naming flaws and code quality: an empirical study. Proc 16th Working Conf on Reverse Engineering, p.31–35. https://doi.org/10.1109/WCRE.2009.50

  • Butler S, Wermelinger M, Yu YJ, et al., 2010. Exploring the influence of identifier names on code quality: an empirical study. Proc 14th European Conf on Software Maintenance and Reengineering, p.156–165. https://doi.org/10.1109/CSMR.2010.27

  • Butler S, Wermelinger M, Yu YJ, et al., 2011. Mining Java class naming conventions. Proc 27th IEEE Int Conf on Software Maintenance, p.93–102. https://doi.org/10.1109/ICSM.2011.6080776

  • Butler S, Wermelinger M, Yu YJ, et al., 2013. INVocD: identifier name vocabulary dataset. Proc 10th Working Conf on Mining Software Repositories, p.405–408. https://doi.org/10.1109/MSR.2013.6624056

  • Caprile B, Tonella P, 1999. Nomen est omen: analyzing the language of function identifiers. Proc 6th Working Conf on Reverse Engineering, p.112–122. https://doi.org/10.1109/WCRE.1999.806952

  • Caprile B, Tonella P, 2000. Restructuring program identifier names. Proc Int Conf on Software Maintenance, p.97–107. https://doi.org/10.1109/ICSM.2000.883022

  • Corbo F, del Grosso C, di Penta M, 2007. Smart formatter: learning coding style from existing source code. Proc IEEE Int Conf on Software Maintenance, p.525–526. https://doi.org/10.1109/ICSM.2007.4362682

  • Gosling J, Joy B, Steele G, et al., 2005. The Java™ Language Specification (3rd Ed.). Addison-Wesley, New York, USA.

    MATH  Google Scholar 

  • Hindle A, Barr ET, Su ZD, et al., 2012. On the naturalness of software. Proc 34th Int Conf on Software Engineering, p.837–847. https://doi.org/10.1109/ICSE.2012.6227135

  • Høst EW, Østvold BM, 2009. Debugging method names. Proc 23rd European Conf on Object-Oriented Programming, p.294–317. https://doi.org/10.1007/978-3-642-03013-0_14

  • Kim S, Kim D, 2016. Automatic identifier inconsistency detection using code dictionary. Empir Softw Eng, 21(2):565–604. https://doi.org/10.1007/s10664-015-9369-5

    Article  Google Scholar 

  • Kim Y, 2014. Convolutional neural networks for sentence classification. Proc Conf Empirical Methods in Natural Language Processing, p.1746–1751. https://doi.org/10.3115/v1/D14-1181

  • Lawrie D, Morrell C, Feild H, et al., 2006. What’s in a name? A study of identifiers. Proc 14th IEEE Int Conf on Program Comprehension, p.3–12. https://doi.org/10.1109/ICPC.2006.51

  • Le Q, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.II-1188–II-1196.

  • Li GJ, Liu H, Nyamawe AS, 2021. A survey on renamings of software entities. ACM Comput Surv, 53(2):41. https://doi.org/10.1145/3379443

    Article  Google Scholar 

  • Lin B, Scalabrino S, Mocci A, et al., 2017. Investigating the use of code analysis and NLP to promote a consistent usage of identifiers. Proc 17th Int Working Conf on Source Code Analysis and Manipulation, p.81–90. https://doi.org/10.1109/SCAM.2017.17

  • Liu H, Liu QR, Liu Y, et al., 2015. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Softw Eng, 41(9):887–900. https://doi.org/10.1109/TSE.2015.2427831

    Article  Google Scholar 

  • Liu K, Kim D, Bissyandé TF, et al., 2019. Learning to spot and refactor inconsistent method names. Proc 41st Int Conf on Software Engineering, p.1–12. https://doi.org/10.1109/ICSE.2019.00019

  • Liu K, Kim D, Bissyandé TF, et al., 2021. Mining fix patterns for FindBugs violations. IEEE Trans Softw Eng, 47(1):165–188. https://doi.org/10.1109/TSE.2018.2884955

    Article  Google Scholar 

  • Matsugu M, Mori K, Mitari Y, et al., 2003. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neur Netw, 16(5–6):555–559. https://doi.org/10.1016/S0893-6080(03)00115-1

    Article  Google Scholar 

  • Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

  • Nguyen TT, Nguyen AT, Nguyen HA, et al., 2013. A statistical semantic language model for source code. Proc 9th Joint Meeting on Foundations of Software Engineering, p.532–542. https://doi.org/10.1145/2491411.2491458

  • Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc 14th Int Working Conf on Source Code Analysis and Manipulation, p.285–294. https://doi.org/10.1109/SCAM.2014.15

  • Suzuki T, Sakamoto K, Ishikawa F, et al., 2014. An approach for evaluating and suggesting method names using n-gram models. Proc 22nd Int Conf on Program Comprehension, p.271–274. https://doi.org/10.1145/2597008.2597797

  • Takang AA, Grubb PA, Macredie RD, 1996. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang, 4:143–167.

    Google Scholar 

  • Wang S, Liu TY, Tan L, 2016. Automatically learning semantic features for defect prediction. Proc 38th Int Conf on Software Engineering, p.297–308. https://doi.org/10.1145/2884781.2884804

  • White M, Tufano M, Vendome C, et al., 2016. Deep learning code fragments for code clone detection. Proc 31st IEEE/ACM Int Conf on Automated Software Engineering, p.87–98.

  • Yu SS, Zhang RC, Guan JH, 2012. Properly and automatically naming Java methods: a machine learning based approach. Proc 8th Int Conf on Advanced Data Mining and Applications, p.235–246. https://doi.org/10.1007/978-3-642-35527-1_20

Download references

Author information

Authors and Affiliations

Authors

Contributions

Junpeng LUO and Jingxuan ZHANG designed the research. Junpeng LUO processed the data. Junpeng LUO and Jingxuan ZHANG drafted the paper. Yong XU and Chenxing SUN helped organize the paper. Jingxuan ZHANG and Zhiqiu HUANG revised and finalized the paper.

Corresponding author

Correspondence to Jingxuan Zhang  (张静宣).

Additional information

Compliance with ethics guidelines

Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, and Chenxing SUN declare that they have no conflict of interest.

Project supported by the National Natural Science Foundation of China (Nos. 61902181 and 62002161), the China Postdoctoral Science Foundation (No. 2020M671489), the CCF-Tencent Open Research Fund (No. RAGR20200106), and the Nanjing University of Aeronautics and Astronautics Postgraduate Research and Practice Innovation Program (No. xcxjh20211612)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Luo, J., Zhang, J., Huang, Z. et al. Toward an accurate method renaming approach via structural and lexical analyses. Front Inform Technol Electron Eng 23, 732–748 (2022). https://doi.org/10.1631/FITEE.2100470

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.2100470

Keywords

关键词

CLC number

Navigation