Elsevier

Journal of Systems and Software

Volume 142, August 2018, Pages 195-205
Journal of Systems and Software

MULAPI: Improving API method recommendation with API usage location

https://doi.org/10.1016/j.jss.2018.04.060Get rights and content

Highlights

  • This article proposed an approach to recommend API method and API usage location for an incoming feature request.

  • The approach employs the feature location technique to recommend API usage location and API methods.

  • An empirical study on more than 1000 feature requests of eight open source subjects show the effectiveness of the approach.

Abstract

During the evolution of a software system, a large number of feature requests are continuously proposed by users. To implement these feature requests, developers often utilize existing third-party libraries and make use of Application Programming Interfaces (APIs) to accelerate the feature implementation process. However, it is not always obvious which API methods are suitable and where these API methods can be used in the target program.

In this paper, we propose an approach, MULAPI (Method Usage and Location for API), to recommend API methods and figure out the API usage location where these API methods would be used. MULAPI employs feature location to identify feature related files as API usage location. Further, these feature related files are taken into account to recommend API methods by exploring the source code repository and API libraries as well. We evaluate MULAPI on more than 1000 feature requests of eight Java projects (Axis/Java, CXF, Hadoop Common, Hbase, Struts2, Hadoop HDFS, Hive and Hadoop Map/Reduce), and recommend API methods from ten third-party libraries. The empirical results show that MULAPI can accurately recommend API methods and usage location, and moreover, MULAPI improves the effectiveness of API method recommendation, compared with the state-of-the-art approach.

Introduction

During the software maintenance and evolution, developers often receive a large number of feature requests due to various users’ requirements. During the process of feature request implementation, some third-party libraries or Application Programming Interfaces (APIs) are usually used to accelerate this process (Thung et al., 2013a). Thung et al. investigated a number of GitHub projects, and found that 93.3% of them use third-party libraries (Thung et al., 2013a). However, to use third-party libraries or APIs effectively, developers need to first understand the methods and classes in libraries and determine which API methods to use. For example, there are more than 2.6 million Java third-party libraries in Maven Central Repository.2 Given such a scale of libraries, and their associated methods and classes, it is challenging for developers to choose suitable methods which are useful for implementing the new feature request. In addition, when maintaining a software system, lots of effort is spent on figuring out how a feature is implemented in existing source code, and identifying where in the source code to implement this new feature request. No matter a developer newly assigned to a project or an experienced developer all need to find related files for a maintain task, because the relevant portion of the project has changed so drastically that his previous understanding of the target software is not so applicable. Identifying the feature related files can effectively help them implement the new feature request, which is a tedious and time-consuming step.

Considering the above issues, there is a need for an automated approach which can recommend API methods and their location to help developers implement the new incoming feature request. A lot of work has been devoted to API recommendation (Thung, Wang, Lo, Lawall, 2013, Chan, Cheng, Lo, 2012, Yu, Song, Mine, 2016, Robillard, 2005, Long, Wang, Cai, 2009, Sun et al., 2016, Shi et al., 2015), or similar repositories recommendation (Zhang, Lo, Kochhar, Xia, Li, Sun, 2017, Xu, Sun, Xia, Chen, 2017, Sun et al., 2018, Xu et al., 2017). For example, Thung et al. proposed a technique to recommend API methods from the feature request (Thung et al., 2013b). First, the approach searches for similar closed or resolved feature requests in issue tracking system, and then extracts the API methods they used. Second, it computes the similarity between the description of feature request and the description of API methods to identify the relevant API methods. Finally, the two ranked lists of API methods are integrated into a list of potential methods recommended to developers. However, their approach takes little about the target source code into consideration, which is an important data source to examine the potential API methods. In addition, when these API methods are recommended, developers need to understand them and investigate into the target source code to learn where to use these API methods to implement an incoming feature request.

To address the above problems, we extend Thung et al.’s approach, and propose an approach named MULAPI (Method Usage and Location for API), which recommends not only API method, but also API usage location. Thung et al. approach recommends API methods based on historical requests and similarity of descriptions. MULAPI takes the information stored in codebase into account and adds one more component (API usage location component) for API recommendation. MULAPI takes the target source code as well as the historical feature requests repository, and API libraries into account for API methods and usage recommendation. First, MULAPI employs the feature location technique (Dit et al., 2013) to identify the API usage location, i.e., feature related files. Feature location is a technique to map a description of a feature or concept to the feature related files (Dit et al., 2013), which are used to recommend API methods more accurately. Second, the new feature request will be processed by three components: API usage location based component, historical requests based component and description similarity based component for API methods recommendation. Finally, three different lists of API methods are recommended by three components, and we combine these three lists to generate the final ranked API methods for developers to use.

We have evaluated MULAPI on 1048 feature requests stored in JIRA3 issues tracking system from eight java software projects: Axis2/Java, CXF, Hadoop Common, Hbase, Struts2, Hadoop HDFS, Hive and Hadoop Map/Reduce. We recommend API methods from 10 third-party libraries. These libraries are popular libraries used by Java applications developed under the Apache Foundation.4 We employ Hit@N, MAP and MRR to measure the effectiveness of MULAPI. Our experiments show that MULAPI achieves the average values of Hit@1, Hit@5, Hit@10, MAP and MRR with 0.483, 0.785, 0.876, 0.530 and 0.619 respectively in recommending API methods, and 0.367, 0.623, 0.707, 0.331 and 0.481 respectively in recommending API usage location. MULAPI improves the Hit@5 and Hit@10 by 28.58% and 14.06% respectively, compared with the state-of-the-art approach proposed by Thung et al. (2013b) . The main contributions of this paper are as follows:

  • (1)

    We propose a technique named MULAPI, which recommends API methods, and API usage location for an incoming feature request by analyzing the source code repository, historical feature repository and API libraries.

  • (2)

    MULAPI employs the feature location technique to recommend API usage location, and the recommendation of API methods are improved by considering the probable API usage location.

  • (3)

    MULAPI is empirically evaluated on 1048 feature requests of eight software systems, and the results show that API usage location and method are both effectively recommended. Moreover, the accuracy of API method recommendation is improved over a state-of-the-art technique proposed by Thung et al. (2013b).

The rest of this paper is organized as follows: Section 2 introduces the preliminaries to develop API recommendation. Section 3 presents our approach. Section 4 shows our empirical study and results. Section 5 discusses the related work in API recommendation. Section 6 concludes the paper.

Section snippets

Preliminaries

In this section, we describe the preliminaries used in MULAPI. First, we describe the feature location technique that is used to identify the feature related files. Second, we describe the information retrieval (IR) techniques, such as text pre-processing and vector space model, that are used to analyze different data sources to recommend API methods.

Approach

In this section, we show more details about our approach. First, we describe the architecture of MULAPI, and then we present the main components in MULAPI.

Empirical study

In this section, we present our empirical study to evaluate MULAPI. We first propose our research questions, and then describe the dataset, methodology and the evaluation metrics. Next, we examine the empirical results. Finally, we discuss threats to validity.

API code recommendation

Among the studies in API code recommendation, some focused on recommending API classes (Rahman et al., 2016), while some others focused on recommending methods (Chan et al., 2012).

Rahman et al. proposed an API class recommendation technique, named RACK. RACK recommends a list of relevant API classes for a natural language query by exploiting keyword-API associations from the crowdsourced knowledge of Stack Overflow (Rahman et al., 2016). Thung et al. proposed an automated approach called

Conclusion and future work

A large number of feature requests are continually submitted during the evolution of a software system. To accelerate the process of feature request implementation, utilizing existing APIs is a common way. However, accurately finding suitable API methods and identifying the probable API usage location are usually tedious and expensive for developers. To address this need, we proposed a technique, named MULAPI, which employs feature location to recommend API methods as well as usage location. We

Acknowledgments

This work is supported partially by Natural Science Foundation of China under Grant No. 61472344, No. 61611540347 and No. 61402396, partially by the Open Funds of State Key Laboratory for Novel Software Technology of Nanjing University under Grant no. KFKT2016B21, partially by the Jiangsu Qin Lan Project, partially by the China Postdoctoral Science Foundation under Grant No. 2015M571489, and partially by the Natural Science Foundation of Yangzhou City under Grant No. YZ2017113.

XU Congying is a student in School of Information Engineering, Yangzhou University. His current research interest is recommendation systems for software maintenance.

References (44)

  • B. Dit et al.

    Feature location in source code: a taxonomy and survey

    J. Softw.

    (2013)
  • X. Gu et al.

    Deep API learning

    Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2016

    (2016)
  • J. Hu et al.

    Modeling the evolution of development topics using dynamic topic models

    22nd IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2015, Montreal, QC, Canada, March 2-6, 2015

    (2015)
  • F. Keller et al.

    A critical evaluation of spectrum-based fault localization techniques on a large-scale software system

    2017 IEEE International Conference on Software Quality, Reliability and Security (QRS)

    (2017)
  • F. Long et al.

    Api hyperlinking via structural overlap

    International Symposium on Foundations of Software Engineering, 2009. Amsterdam, the Netherlands, August

    (2009)
  • C. Lv et al.

    Apisynth: a new graph-based API recommender system

    Chin. J. Comput.

    (2015)
  • Manning, C., Raghavan, P., Schutze, H., 2008. Introduction to information retrieval....
  • L. Moreno et al.

    How can I use this method

    2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1

    (2015)
  • E. Moritz et al.

    Export: detecting and visualizing API usages in large source code repositories

    Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering

    (2013)
  • A.T. Nguyen

    Duplicate bug report detection with a combination of information retrieval and topic modeling

    Proceedings of the Ieee/acm International Conference on Automated Software Engineering

    (2012)
  • T.N. Nguyen et al.

    Exploring API embedding for API usages and applications

    International Conference on Software Engineering

    (2017)
  • M.M. Rahman et al.

    Rack: automatic API recommendation using crowdsourced knowledge

    International Conference on Software Analysis, Evolution, and Reengineering

    (2016)
  • Cited by (0)

    XU Congying is a student in School of Information Engineering, Yangzhou University. His current research interest is recommendation systems for software maintenance.

    SUN Xiaobing is an associate professor in School of Information Engineering, Yangzhou University. His current research interests include change comprehension, analysis and testing, software data analytics.

    LI Bin is a professor in School of Information Engineering, Yangzhou University. His current research interests include web service analysis, cloud computing.

    Lu Xintong is a student in School of Information Engineering, Yangzhou University. Her current research interests include recommendation systems and knowledge graph.

    GUO Hongjing is a student in School of Information Engineering, Yangzhou University. Her current research interests include software engineering, software quality engineering.

    1

    Xiaobing Sun and Congying Xu contributed equally to this work.

    View full text