skip to main content
10.1145/3377811.3380926acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Suggesting natural method names to check name consistencies

Published: 01 October 2020 Publication History

Abstract

Misleading names of the methods in a project or the APIs in a software library confuse developers about program functionality and API usages, leading to API misuses and defects. In this paper, we introduce MNire, a machine learning approach to check the consistency between the name of a given method and its implementation. MNire first generates a candidate name and compares the current name against it. If the two names are sufficiently similar, we consider the method as consistent. To generate the method name, we draw our ideas and intuition from an empirical study on the nature of method names in a large dataset. Our key finding is that high proportions of the tokens of method names can be found in the three contexts of a given method including its body, the interface (the method's parameter types and return type), and the enclosing class' name. Even when such tokens are not there, MNire uses the contexts to predict the tokens due to the high likelihoods of their co-occurrences. Our unique idea is to treat the name generation as an abstract summarization on the tokens collected from the names of the program entities in the three above contexts.
We conducted several experiments to evaluate MNire in method name consistency checking and in method name recommending on large datasets with +14M methods. In detecting inconsistency method names, MNire improves the state-of-the-art approach by 10.4% and 11% relatively in recall and precision, respectively. In method name recommendation, MNire improves relatively over the state-of-the-art technique, code2vec, in both recall (18.2% higher) and precision (11.1% higher). To assess MNire's usefulness, we used it to detect inconsistent methods and suggest new names in several active, GitHub projects. We made 50 pull requests (PRs) and received 42 responses. Among them, five PRs were merged into the main branch, and 13 were approved for later merging. In total, in 31/42 cases, the developer teams agree that our suggested names are more meaningful than the current names, showing MNire's usefulness.

References

[1]
[n.d.]. . https://doubledoubleblind.github.io/mnire/.
[2]
[n.d.]. Apache Cassandra, http://cassandraapache.org/
[3]
[n.d.]. Apache Common IO. https://commons.apache.org/proper/commons-io/
[4]
[n.d.]. Apache MINA. https://mina.apache.org/
[5]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2014. Learning Natural Coding Conventions. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2014). ACM Press, 281--293.
[6]
Miltiadis Allamanis, Earl T. Barr, Christian Bird, and Charles Sutton. 2015. Suggesting Accurate Method and Class Names. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2015). ACM, 38--49.
[7]
Miltiadis Allamanis, Hao Peng, and Charles A. Sutton. 2016. A Convolutional Attention Network for Extreme Summarization of Source Code. In Proceedings of the 33nd International Conference on Machine Learning, ICML 2016 (JMLR Workshop and Conference Proceedings), Vol. 48. JMLR.org, 2091--2100.
[8]
M. Allamanis and C. Sutton. 2013. Mining source code repositories at massive scale using language modeling. In Proceedings of the 10th IEEE Working Conference on Mining Software Repositories (MSR'13). IEEE CS, 207--216.
[9]
Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2018. code2seq: Generating sequences from structured representations of code. arXiv preprint arXiv:1808.01400 (2018).
[10]
Uri Alon, Omer Levy, and Eran Yahav. 2019. code2seq: Generating Sequences from Structured Representations of Code. In International Conference on Learning Representations (ICLR 2019). https://openreview.net/forum?id=HlgKYo09tX
[11]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2018. code2vec: Learning Distributed Representations of Code. CoRR abs/1803.09473 (2018). arXiv:1803.09473 http://arxiv.org/abs/1803.09473
[12]
Uri Alon, Meital Zilberstein, Omer Levy, and Eran Yahav. 2019. Code2Vec: Learning Distributed Representations of Code. Proceedings of the ACM on Programming Languages 3, POPL, Article 40 (Jan. 2019), 29 pages.
[13]
Matthew Amodio, Swarat Chaudhuri, and Thomas W. Reps. 2017. Neural Attribute Machines for Program Generation. CoRR abs/1705.09231 (2017). arXiv:1705.09231 http://arxiv.org/abs/1705.09231
[14]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473 (2014). arXiv:1409.0473 http://arxiv.org/abs/1409.0473
[15]
Sahil Bhatia and Rishabh Singh. 2016. Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks. CoRR abs/1603.06129 (2016). arXiv:1603.06129 http://arxiv.org/abs/1603.06129
[16]
Pavol Bielik, Veselin Raychev, and Martin Vechev. 2016. PHOG: Probabilistic Model for Code. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016) (Proceedings of Machine Learning Research), Vol. 48. PMLR, 2933--2942. http://proceedings.mlr.press/v48/bielikl6.html
[17]
Denny Britz, Anna Goldie, Thang Luong, and Quoc Le. 2017. Massive Exploration of Neural Machine Translation Architectures. ArXiv e-prints (March 2017). arXiv:cs.CL/1703.03906
[18]
Marc Brockschmidt, Miltiadis Allamanis, Alexander L Gaunt, and Oleksandr Polozov. 2018. Generative code modeling with graphs. arXiv preprint arXiv:1805.08490 (2018).
[19]
S. Butler, M. Wermelinger, Y. Yu, and H. Sharp. 2009. Relating Identifier Naming Flaws and Code Quality: An Empirical Study. In Proceedings of the 16th Working Conference on Reverse Engineering (WCRE 2009). 31--35.
[20]
Jaroslav M. Fowkes and Charles A. Sutton. 2015. Parameter-Free Probabilistic API Mining at GitHub Scale. CoRR abs/1512.05558 (2015). http://arxiv.org/abs/1512.05558
[21]
Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn and Fuzz: Machine Learning for Input Fuzzing. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE 2017). IEEE Press, 50--59.
[22]
Palash Goyal and Emilio Ferrara. 2018. Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems 151 (2018), 78--94.
[23]
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API Learning. In Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016). ACM, 631--642.
[24]
Tihomir Gvero and Viktor Kuncak. 2015. Synthesizing Java Expressions from Free-form Queries. In Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA 2015). ACM, 416--432.
[25]
Tjalling Haije. [n.d.]. Automatic Comment Generation using a Neural Translation Mode.
[26]
Jordan Henkel, Shuvendu Lahiri, Ben Liblit, and Thomas W. Reps. 2018. Code Vectors: Understanding Programs Through Embedded Abstracted Symbolic Traces. CoRR abs/1803.06686 (2018). arXiv:1803.06686 http://arxiv.org/abs/1803.06686
[27]
Abram Hindle, Earl T. Barr, Zhendong Su, Mark Gabel, and Premkumar Devanbu. 2012. On the naturalness of software. In Proceedings of the 34th International Conference on Software Engineering (ICSE 2012). IEEE Press, 837--847.
[28]
Xing Hu, Ge Li, Xin Xia, David Lo, and Zhi Jin. 2018. Deep Code Comment Generation. In Proceedings of the 26th Conference on Program Comprehension (ICPC '18). ACM, 200--210.
[29]
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 2073--2083.
[30]
Lin Jiang, Hui Liu, and He Jiang. 2019. Machine Learning Based Automated Method Name Recommendation: How Far Are We. In Proceedings of the 34th ACM/IEEE International Conference on Automated Software Engineering (ASE'19). IEEE CS.
[31]
Jey Han Lau and Timothy Baldwin. 2016. An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. In Proceedings of the 1st Workshop on Representation Learning for NLP. Association for Computational Linguistics, 78--86.
[32]
Liuqing Li, He Feng, Wenjie Zhuang, Na Meng, and Barbara Ryder. 2017. CCLearner: A Deep Learning-Based Clone Detection Approach. In Proceedings of the 33rd IEEE International Conference on Software Maintenance and Evolution (ICSME 2017). 249--260.
[33]
Kui Liu, Dongsun Kim, Tegawendé F. Bissyandé, Taeyoung Kim, Kisub Kim, Anil Koyuncu, Suntae Kim, and Yves Le Traon. 2019. Learning to Spot and Refactor Inconsistent Method Names. In Proceedings of the 41th International Conference on Software Engineering (ICSE '19). ACM, 1--12.
[34]
Zhongxin Liu, Xin Xia, Christoph Treude, David Lo, and Shanping Li. 2019. Automatic Generation of Pull Request Descriptions. In Proceedings of the 34th ACM/IEEE International Conference on Automated Software Engineering (ASE'19). IEEE CS.
[35]
Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. 2014. Addressing the Rare Word Problem in Neural Machine Translation. CoRR abs/1410.8206 (2014). arXiv:1410.8206 http://arxiv.org/abs/1410.8206
[36]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems 2013 (NIPS'13). 3111--3119.
[37]
Lili Mou, Ge Li, Zhi Jin, Lu Zhang, and Tao Wang. 2014. TBCNN: A Tree-Based Convolutional Neural Network for Programming Language Processing. CoRR abs/1409.5718 (2014). arXiv:1409.5718 http://arxiv.org/abs/1409.5718
[38]
Annamalai Narayanan, Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, and Yang Liu. [n.d.]. graph2vec: Learning distributed representations of graphs. ([n.d.]).
[39]
Hung Phan, Hoan Anh Nguyen, Ngoc M. Tran, Linh H. Truong, Anh Tuan Nguyen, and Tien N. Nguyen. 2018. Statistical Learning of API Fully Qualified Names in Code Snippets of Online Forums. In Proceedings of the 40th International Conference on Software Engineering (ICSE '18). ACM, 632--642.
[40]
Mukund Raghothaman, Yi Wei, and Youssef Hamadi. 2016. SWIM: Synthesizing What I Mean: Code Search and Idiomatic Snippet Synthesis. In Proceedings of the 38th International Conference on Software Engineering (ICSE '16). ACM, 357--367.
[41]
Veselin Raychev, Martin Vechev, and Andreas Krause. 2015. Predicting Program Properties from "Big Code". In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '15). ACM, 111--124.
[42]
Randy Smith and Susan Horwitz. 2009. Detecting and Measuring Similarity in Code Clones. In Proceedings of the 2009 International Workshop on Software Clones (IWSC 2009). IEEE CS, 28--34.
[43]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. CoRR abs/1503.00075 (2015). arXiv:1503.00075 http://arxiv.org/abs/1503.00075
[44]
Hieu Tran, Ngoc Tran, Son Nguyen, Hoan Nguyen, and Tien N. Nguyen. 2019. Recovering Variable Names for Minified Code with Usage Contexts. In Proceedings of the 41st International Conference on Software Engineering (ICSE'19). IEEE Press, 1165--1175.
[45]
Michele Tufano, Cody Watson, Gabriele Bavota, Massimiliano Di Penta, Martin White, and Denys Poshyvanyk 2018. Deep Learning Similarities from Different Representations of Source Code. In Proceedings of the 15th International Conference on Mining Software Repositories (MSR '18). ACM, 542--553.
[46]
Bogdan Vasilescu, Casey Casalnuovo, and Premkumar Devanbu. 2017. Recovering Clear, Natural Identifiers from Obfuscated JS Names. In Proceedings of the 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). ACM, 683--693.
[47]
Yao Wan, Zhou Zhao, Min Yang, Guandong Xu, Haochao Ying, Jian Wu, and Philip S. Yu. 2018. Improving Automatic Source Code Summarization via Deep Reinforcement Learning. CoRR abs/1811.07234 (2018). arXiv:1811.07234 http://arxiv.org/abs/1811.07234
[48]
Ming Li Wenhao Zheng, Hongyu Zhou and Jianxin Wu. 2018. CodeAttention: translating source code to comments by exploiting the code constructs. Frontiers of Computer Science 13 (2018), 565--578.
[49]
Martin White, Michele Tufano, Christopher Vendome, and Denys Poshyvanyk. 2016. Deep Learning Code Fragments for Code Clone Detection. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering (ASE 2016). ACM, 87--98.
[50]
Gang Zhao and Jeff Huang. 2018. DeepSim: Deep Learning Code Functional Similarity. In Proceedings of the 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2018). ACM, 141--151.

Cited By

View all

Index Terms

  1. Suggesting natural method names to check name consistencies

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICSE '20: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering
    June 2020
    1640 pages
    ISBN:9781450371216
    DOI:10.1145/3377811
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • KIISE: Korean Institute of Information Scientists and Engineers
    • IEEE CS

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. deep learning
    2. naturalness of source code
    3. program entity name suggestion

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICSE '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 276 of 1,856 submissions, 15%

    Upcoming Conference

    ICSE 2025

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Transformer-based code model with compressed hierarchy representationEmpirical Software Engineering10.1007/s10664-025-10612-630:2Online publication date: 23-Jan-2025
    • (2025)Deep learning based identification of inconsistent method names: How far are we?Empirical Software Engineering10.1007/s10664-024-10592-z30:1Online publication date: 1-Feb-2025
    • (2024)Intelligent code search aids edge software developmentJournal of Cloud Computing: Advances, Systems and Applications10.1186/s13677-024-00629-513:1Online publication date: 1-Apr-2024
    • (2024)Natural Is the Best: Model-Agnostic Code Simplification for Pre-trained Large Language ModelsProceedings of the ACM on Software Engineering10.1145/36437531:FSE(586-608)Online publication date: 12-Jul-2024
    • (2024)An intelligent java method name recommendation framework via two-phase neural networksEmpirical Software Engineering10.1007/s10664-024-10574-130:1Online publication date: 8-Nov-2024
    • (2023)Feature Location Using Extraction of Code DocumentationProceedings of the 8th International Conference on Sustainable Information Engineering and Technology10.1145/3626641.3627149(481-488)Online publication date: 24-Oct-2023
    • (2023)An Accurate Identifier Renaming Prediction and Suggestion ApproachACM Transactions on Software Engineering and Methodology10.1145/360310932:6(1-51)Online publication date: 29-Sep-2023
    • (2023)Pre-implementation Method Name Prediction for Object-oriented ProgrammingACM Transactions on Software Engineering and Methodology10.1145/359720332:6(1-35)Online publication date: 29-Sep-2023
    • (2023)Toward Interpretable Graph Tensor Convolution Neural Network for Code Semantics EmbeddingACM Transactions on Software Engineering and Methodology10.1145/358257432:5(1-40)Online publication date: 21-Jul-2023
    • (2023)Context-Encoded Code Change Representation for Automated Commit Message GenerationInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350049334:01(185-202)Online publication date: 16-Sep-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media