Abstract
Methods in programs must be accurately named to facilitate source code analysis and comprehension. With the evolution of software, method names may be inconsistent with their implemented method bodies, leading to inaccurate or buggy method names. Debugging method names remains an important topic in the literature. Although researchers have proposed several approaches to suggest accurate method names once the method bodies have been modified, two main drawbacks remain to be solved: there is no analysis of method name structure, and the programming context information is not captured efficiently. To resolve these drawbacks and suggest more accurate method names, we propose a novel automated approach based on the analysis of the method name structure and lexical analysis with the programming context information. Our approach first leverages deep feature representation to embed method names and method bodies in vectors. Then, it obtains useful verb-tokens from a large method corpus through structural analysis and noun-tokens from method bodies through lexical analysis. Finally, our approach dynamically combines these tokens to form and recommend high-quality and project-specific method names. Experimental results over 2111 Java testing methods show that the proposed approach can achieve a Hit Ratio, or Hit@5, of 33.62% and outperform the state-of-the-art approach by 14.12% in suggesting accurate method names. We also demonstrate the effectiveness of structural and lexical analyses in our approach.
摘要
程序中的方法必须准确命名, 以便于源代码分析和理解。随着软件的演变, 方法名称可能与其实现的方法体不一致, 导致方法名称不准确或有缺陷。调试方法名称仍然是文献中的一个重要主题。尽管研究人员已提出一些方法, 用于在方法体被修改后给出准确的方法名称建议, 但有两个主要不足仍待解决: 对方法名称结构未加以分析, 且未能有效捕获编程环境上下文信息。为避免上述不足, 并给出更准确的方法名称建议, 提出一种基于方法名称结构分析和编程上下文信息词法分析的新颖自动化方法。首先, 利用深层特征表示, 将方法名称和方法体嵌入向量中; 然后, 通过结构分析从大型方法语料库中获取有用的动词标记, 通过词汇分析从方法体中获取名词标记; 最后, 动态结合这些标记, 形成并推荐高质量和特定于项目的方法名称。在2111个Java测试方法上的实验结果表明, 所提方法可以达到33.62%的命中率 (Hit@5), 并且在建议准确方法名称方面比最先进的方法高出14.12%。此外, 展示了所提方法对结构和词汇分析的有效性。
Similar content being viewed by others
References
Abebe SL, Tonella P, 2013. Automated identifier completion and replacement. Proc 17th European Conf on Software Maintenance and Reengineering, p.263–272. https://doi.org/10.1109/CSMR.2013.35
Abebe SL, Haiduc S, Tonella P, et al., 2011. The effect of lexicon bad smells on concept location in source code. Proc 11th Int Working Conf on Source Code Analysis and Manipulation, p.125–134. https://doi.org/10.1109/SCAM.2011.18
Abebe SL, Arnaoudova V, Tonella P, et al., 2012. Can lexicon bad smells improve fault prediction? Proc 19th Working Conf on Reverse Engineering, p.235–244. https://doi.org/10.1109/WCRE.2012.33
Allamanis M, Barr ET, Bird C, et al., 2014. Learning natural coding conventions. Proc 22nd ACM SIGSOFT Int Symp on Foundations of Software Engineering, p.281–293. https://doi.org/10.1145/2635868.2635883
Allamanis M, Barr ET, Bird C, et al., 2015. Suggesting accurate method and class names. Proc 10th Joint Meeting on Foundations of Software Engineering, p.38–49. https://doi.org/10.1145/2786805.2786849
Allamanis M, Peng H, Sutton C, 2016. A convolutional attention network for extreme summarization of source code. Proc 33rd Int Conf on Machine Learning, p.2091–2100.
Amann S, Nguyen HA, Nadi S, et al., 2019. A systematic evaluation of static API-misuse detectors. IEEE Trans Softw Eng, 45(12):1170–1188. https://doi.org/10.1109/TSE.2018.2827384
Arnaoudova V, Eshkevari LM, di Penta M, et al., 2014. REPENT: analyzing the nature of identifier renamings. IEEE Trans Softw Eng, 40(5):502–532. https://doi.org/10.1109/TSE.2014.2312942
Binkley D, Hearn M, Lawrie D, 2011. Improving identifier informativeness using part of speech information. Proc 8th Working Conf on Mining Software Repositories, p.203–206. https://doi.org/10.1145/1985441.1985471
Butler S, 2012. Mining Java class identifier naming conventions. Proc 34th Int Conf on Software Engineering, p.1641–1643. https://doi.org/10.1109/ICSE.2012.6227216
Butler S, 2016. Analysing Java Identifier Names. PhD Thesis, the Open University, England Birmingham, UK.
Butler S, Wermelinger M, Yu YJ, et al., 2009. Relating identifier naming flaws and code quality: an empirical study. Proc 16th Working Conf on Reverse Engineering, p.31–35. https://doi.org/10.1109/WCRE.2009.50
Butler S, Wermelinger M, Yu YJ, et al., 2010. Exploring the influence of identifier names on code quality: an empirical study. Proc 14th European Conf on Software Maintenance and Reengineering, p.156–165. https://doi.org/10.1109/CSMR.2010.27
Butler S, Wermelinger M, Yu YJ, et al., 2011. Mining Java class naming conventions. Proc 27th IEEE Int Conf on Software Maintenance, p.93–102. https://doi.org/10.1109/ICSM.2011.6080776
Butler S, Wermelinger M, Yu YJ, et al., 2013. INVocD: identifier name vocabulary dataset. Proc 10th Working Conf on Mining Software Repositories, p.405–408. https://doi.org/10.1109/MSR.2013.6624056
Caprile B, Tonella P, 1999. Nomen est omen: analyzing the language of function identifiers. Proc 6th Working Conf on Reverse Engineering, p.112–122. https://doi.org/10.1109/WCRE.1999.806952
Caprile B, Tonella P, 2000. Restructuring program identifier names. Proc Int Conf on Software Maintenance, p.97–107. https://doi.org/10.1109/ICSM.2000.883022
Corbo F, del Grosso C, di Penta M, 2007. Smart formatter: learning coding style from existing source code. Proc IEEE Int Conf on Software Maintenance, p.525–526. https://doi.org/10.1109/ICSM.2007.4362682
Gosling J, Joy B, Steele G, et al., 2005. The Java™ Language Specification (3rd Ed.). Addison-Wesley, New York, USA.
Hindle A, Barr ET, Su ZD, et al., 2012. On the naturalness of software. Proc 34th Int Conf on Software Engineering, p.837–847. https://doi.org/10.1109/ICSE.2012.6227135
Høst EW, Østvold BM, 2009. Debugging method names. Proc 23rd European Conf on Object-Oriented Programming, p.294–317. https://doi.org/10.1007/978-3-642-03013-0_14
Kim S, Kim D, 2016. Automatic identifier inconsistency detection using code dictionary. Empir Softw Eng, 21(2):565–604. https://doi.org/10.1007/s10664-015-9369-5
Kim Y, 2014. Convolutional neural networks for sentence classification. Proc Conf Empirical Methods in Natural Language Processing, p.1746–1751. https://doi.org/10.3115/v1/D14-1181
Lawrie D, Morrell C, Feild H, et al., 2006. What’s in a name? A study of identifiers. Proc 14th IEEE Int Conf on Program Comprehension, p.3–12. https://doi.org/10.1109/ICPC.2006.51
Le Q, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.II-1188–II-1196.
Li GJ, Liu H, Nyamawe AS, 2021. A survey on renamings of software entities. ACM Comput Surv, 53(2):41. https://doi.org/10.1145/3379443
Lin B, Scalabrino S, Mocci A, et al., 2017. Investigating the use of code analysis and NLP to promote a consistent usage of identifiers. Proc 17th Int Working Conf on Source Code Analysis and Manipulation, p.81–90. https://doi.org/10.1109/SCAM.2017.17
Liu H, Liu QR, Liu Y, et al., 2015. Identifying renaming opportunities by expanding conducted rename refactorings. IEEE Trans Softw Eng, 41(9):887–900. https://doi.org/10.1109/TSE.2015.2427831
Liu K, Kim D, Bissyandé TF, et al., 2019. Learning to spot and refactor inconsistent method names. Proc 41st Int Conf on Software Engineering, p.1–12. https://doi.org/10.1109/ICSE.2019.00019
Liu K, Kim D, Bissyandé TF, et al., 2021. Mining fix patterns for FindBugs violations. IEEE Trans Softw Eng, 47(1):165–188. https://doi.org/10.1109/TSE.2018.2884955
Matsugu M, Mori K, Mitari Y, et al., 2003. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neur Netw, 16(5–6):555–559. https://doi.org/10.1016/S0893-6080(03)00115-1
Mikolov T, Chen K, Corrado G, et al., 2013. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781
Nguyen TT, Nguyen AT, Nguyen HA, et al., 2013. A statistical semantic language model for source code. Proc 9th Joint Meeting on Foundations of Software Engineering, p.532–542. https://doi.org/10.1145/2491411.2491458
Rahman MM, Roy CK, 2014. On the use of context in recommending exception handling code examples. Proc 14th Int Working Conf on Source Code Analysis and Manipulation, p.285–294. https://doi.org/10.1109/SCAM.2014.15
Suzuki T, Sakamoto K, Ishikawa F, et al., 2014. An approach for evaluating and suggesting method names using n-gram models. Proc 22nd Int Conf on Program Comprehension, p.271–274. https://doi.org/10.1145/2597008.2597797
Takang AA, Grubb PA, Macredie RD, 1996. The effects of comments and identifier names on program comprehensibility: an experimental investigation. J Program Lang, 4:143–167.
Wang S, Liu TY, Tan L, 2016. Automatically learning semantic features for defect prediction. Proc 38th Int Conf on Software Engineering, p.297–308. https://doi.org/10.1145/2884781.2884804
White M, Tufano M, Vendome C, et al., 2016. Deep learning code fragments for code clone detection. Proc 31st IEEE/ACM Int Conf on Automated Software Engineering, p.87–98.
Yu SS, Zhang RC, Guan JH, 2012. Properly and automatically naming Java methods: a machine learning based approach. Proc 8th Int Conf on Advanced Data Mining and Applications, p.235–246. https://doi.org/10.1007/978-3-642-35527-1_20
Author information
Authors and Affiliations
Contributions
Junpeng LUO and Jingxuan ZHANG designed the research. Junpeng LUO processed the data. Junpeng LUO and Jingxuan ZHANG drafted the paper. Yong XU and Chenxing SUN helped organize the paper. Jingxuan ZHANG and Zhiqiu HUANG revised and finalized the paper.
Corresponding author
Additional information
Compliance with ethics guidelines
Junpeng LUO, Jingxuan ZHANG, Zhiqiu HUANG, Yong XU, and Chenxing SUN declare that they have no conflict of interest.
Project supported by the National Natural Science Foundation of China (Nos. 61902181 and 62002161), the China Postdoctoral Science Foundation (No. 2020M671489), the CCF-Tencent Open Research Fund (No. RAGR20200106), and the Nanjing University of Aeronautics and Astronautics Postgraduate Research and Practice Innovation Program (No. xcxjh20211612)
Rights and permissions
About this article
Cite this article
Luo, J., Zhang, J., Huang, Z. et al. Toward an accurate method renaming approach via structural and lexical analyses. Front Inform Technol Electron Eng 23, 732–748 (2022). https://doi.org/10.1631/FITEE.2100470
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.2100470