Describing the APIs comprehensively: Obtaining the holistic representations from multiple modalities data for different tasks

doi:10.1016/j.infsof.2023.107188

Information and Software Technology

Volume 158, June 2023, 107188

https://doi.org/10.1016/j.infsof.2023.107188 Get rights and content

Abstract

Context:

API (Application Programming Interface) is an important object in software development, and describing them properly is the basis for solving related problems, such as API recommendation. Recently, multimodal data fusing approaches become a hot research topic in different fields, and they can be used to get comprehensive representations of things by describing them from different angles. This provides us with a new useful way for API representation.

Objective:

In this work, we aim at describing APIs comprehensively by fusing information from multimodal data for supporting different API-related tasks.

Method:

To achieve this goal, we propose a novel approach BDBM (Bimodal Deep Boltzmann Machine) to obtain holistic representations of APIs by fusing the information in text and code modalities, which are the API descriptions and the codes of the products. Then, the BDBM is applied to two typical API tasks (API recommendation and similar API mining) to analyze its performance.

Results and Conclusion:

The results show that the API recommendation based on BDBM outperforms the ones based on unimodal API information, our method’s precisions can reach 0.67, 0.65, 0.61 at top-3, top-5 and top-10, while MAP and MRR are 0.66 and 0.67. Meanwhile, the close representations give similar APIs with similar functionalities as well as similar usage in codes. Thus, we believe that multimodal data fusion is suitable for describing APIs, and the holistic representations given by BDBM can be used in different API-related tasks.

Introduction

How do people perceive a thing in their daily life? Watching its appearance, hearing its sound, feeling its touch or smell [1]. By receiving such multimodal information, people can overall get the “essence” of things [2]. Can a computer do the same thing? Based on this consideration, many studies have been given on multimodal representation approaches in recent years [3], [4]. They try to obtain holistic representations of objects by integrating multimodal data features and then use the results as metadata for further solving different problems. Such practice is prevalent in the fields of computer vision and natural language processing [5], [6], [7]. For example, the joint representation obtained by fusing data features from images, texts, and videos can support various tasks (such as image retrieval [8], image labeling [9], etc.), and achieve better results than using unimodal information.

Whether the same idea is also applicable for solving the problems in the process of software development? From the view of a software developer, an object is often described by two main modalities [10], [11], [12]:

•
Code modality: codes (source code, pseudo-code, etc.) can be the $o b j e c t$ itself or the ones that use the $o b j e c t$ , and they contain the program logic for implementing or using the $o b j e c t$ .
•
Text modality: texts (description of the document, code comments, etc.) often give the illustrations of the $o b j e c t$ directly in natural language, so that people can understand it easily.

To explore the above issue, we take APIs (Application Programming Interface) as our research objects [13], [14] and aim at obtaining their representations based on multimodal data (the texts of API descriptions and the code of products). As commonly used objects in software development, APIs are important for improving the work efficiency by providing pre-defined functionalities. However, it also brings developers kinds of problems in using APIs, and correspondingly different API-related tasks are generated (i.e. API recommendation, similar API mining). By obtaining holistic representations of APIs, it would be helpful to support different API-related tasks in a unified way.

Let us consider a scenario. John is a developer of an App product for handling images. In the software development phase, he wants to find an API to get a drawable object from a given resource ID, so he asks for help from an API recommendation tool, such as BIKER. By giving the text query, John gets a usable API “android.content.res. Resources.getDrawable” for his demand. However, during the maintenance of the product after release, John finds this API is deprecated as the Android OS is upgraded, and he needs to replace it with a new API. Of course, John could also get help from the API recommendation tool, but he hopes he could modify the codes with less cost. So, he needs a new professional tool for getting similar APIs according to the code information of the existing API. In fact, John may also face other different API-related questions and it would be helpful if there is a tool that can be used for solving all of them. To achieve this goal, an effective way is to get the holistic representations of APIs from different modalities, so that the results can support different tasks in kinds of conditions.

However, each modality has its own characters and is represented by different forms:

1.
API code modality information is embedded in the call relationship between the API and the internal methods in the code, and it can be represented as a graph structure, which is a high-dimensional spatial information [15], [16].
2.
API text modal information is embedded in the semantics of the words in the API description, and it is represented as a sequence of words in the text, which is a one-dimensional spatial information [17], [18].

This means it is not easy to integrate information from different modalities into holistic representations of APIs.

Traditional API-related researches [14], [19] are often based solely on the unimodal information of the API. They utilize either the code or text modality API information for addressing special API-related tasks, but the information of a single modal is not enough to cover all features of API, and this may affect the performance of solving problems. For example, with respect to the task of API recommendation, some methods only utilize text modality information [20], so they cannot find relevant APIs that do not share semantically similar words with developers’ tasks for recommendation; While the APIs recommended based on the code modality information are often too task-specific and cannot satisfy the high-level demands of developers [21]. Recently, some researchers begin to integrate multimodal information of API to get more comprehensive information [15], [22], but they focus on a specific problem to design the approach so the results cannot be used to support other API-related jobs.

In this paper, we propose a bimodal deep Boltzmann machine (BDBM) based method to fuse API features in code and text modalities into a holistic representation and apply it to diverse API-related tasks. The main process of our method is illustrated in Fig. 1 .

Firstly, we represent the features of API in the code and text modalities respectively as real-valued vectors ( $v^{c o d e}$ and $v^{t e x t}$ ). In this process, the graph embedding technique is applied for representing the structural information of API used in APK files (code modality data) in the app store. Meanwhile, API semantic information is represented from API description (text modality data) with the word embedding technique.

Secondly, we construct the BDBM, an undirected graphical model, to learn a joint probability distribution $P (v^{c o d e}, v^{t e x t}; a)$ over the bimodal input data (the representation of code and text modalities API information). It fuses the data features in different modalities into a holistic representation of the API.

Finally, BDBM is applied to different API-related tasks for helping developers address their problems in practice. By drawing samples from $P (v^{c o d e} | v^{t e x t}; a)$ , which is calculated based on the Bayesian probability formula, we can recommend proper API code modality information to developers. Besides, the joint representation of API can be also applied for similar API mining since such representation is more comprehensive to capture the true meaning of the API than unimodal API information.

For the purpose of evaluation, we conducted a series of experiments with apps from Google Play, API descriptions from Android SDK documentation and Q&As on Stack Overflow.

•
On the one hand, we evaluate the performance of our method in supporting different API-related tasks. With respect to API recommendation, BDBM’s MAP and MRR can reach 0.66 and 0.67 respectively: the improvements are 53.5% and 55.8% compared with BIKER, while such values reach 842.9% and 204.5% compared with GAPI. With respect to similar APIs mining, the experimental result also shows our method has a relatively good improvement over the baselines: compared with API2VEC, the improvements are 22.1%, 10.2% and 15.3% in Precision, Recall and F1 value respectively. These comparison experiments show that our holistic API representations obtained from multiple modalities can better support different tasks than unimodal API information.
•
On the other hand, we use the experiments to evaluate the rationality of our technique selection and model design. By comparing the performance of different graph embedding and word embedding methods on our approach, we determine suitable algorithms (Node2vec and Glove) for representing API code and text modality information. Furthermore, by adjusting the number of neurons in different layers of BDBM, we identify the structure of the model that achieves better performance on information fusion.

This paper is organized as follows. Section 2 reviews the existing studies related to our work. Section 3 gives the background knowledge applied in our approach. Section 4 introduces the process of API code and text modalities information representation. Section 5 offers the details of BDBM, including the composition structure and training details of the model. Section 6 introduces the different usage scenarios that BDBM can be applied. Section 7 presents a series of experiments for evaluating our approach. Section 8 and Section 9 are the conclusion and future work and acknowledgments respectively.

Section snippets

Related work

Many approaches have been proposed for representing API information in different modalities to support diverse tasks in development. We summarize these methods according to the data of different modalities they focus on.

As for the methods based on text modality information of API, they tend to fully exploit the information in various textual data (i.e. API descriptions, discussions about APIs in the developer forums, comments in code, etc.). Thung et al. [20] propose an automated approach to

Gaussian-Bernoulli restricted Boltzmann machine

Gaussian-Bernoulli restricted Boltzmann machine (GRBM) is the basis of our model. Understanding how GRBM works is helpful to perceive our approach. Therefore, we introduce GRBM as the background knowledge.

GRBM [35] is a two-layer stochastic neural network capable of drawing abstract and salient representation of real-value data [36]. Its structure is shown in Fig. 2 . The visible units model the inputs and with the network processing, the hidden units output the data features of visible units.

API information representation in two modalities

Our approach takes the code and text modalities data as input and fuses the API information in them to form a holistic representation of API (see Fig. 1). However, the two kinds of data give the information in different ways, so we need to analyze them separately with different methods, and vectorize the results for supporting the information fusion process.

Presenting APIs based on two modalities

By fusing the information in data of different modalities, we can obtain holistic representations of APIs. To achieve this goal, we construct the model BDBM that takes the code and text modalities information as input. The architecture and training process of BDBM are given as follows.

API information utilization: applying BDBM to different tasks

Our BDBM gives representations of APIs comprehensively based on data features in code and text modalities, and it can be applied in addressing different API-related tasks. In this section, we take API recommendation and similar API mining as examples to introduce how to use BDBM in these tasks.

Experiments and results

To evaluate the effectiveness of our approach, we conduct a series of experiments to study the following three questions:

•
RQ1: Whether BDBM can be used to support API recommendation effectively based on developers’ queries?
•
RQ2: Whether using the holistic representation of API can find similar APIs based on the APIs given by developers?
•
RQ3: Whether the configuration of BDBM is reasonable?

Specifically, RQ1 focuses on evaluating whether the code modality API information (recommended APIs) inferred

Conclusion and future work

In this paper, we propose a bimodal deep Boltzmann machine (BDBM) based approach for modeling the holistic representation of API information. Unlike the previous approaches, our approach constructs a generative graph model to learn the joint probability distribution of API information in the code and text modalities. Such probability not only makes our model can generate the holistic representation fusing the data features of API across different modalities, but also can be turned into

CRediT authorship contribution statement

Xun Li: Validation, Formal analysis, Writing – original draft. Lei Liu: Conceptualization, Methodology, Writing – review & editing. Yuzhou Liu: Software, Resources, Data curation. Huaxiao Liu: Investigation, Supervision.

Declaration of Competing Interest

No author associated with this paper has disclosed any potential or pertinent conflicts which may be perceived to have impending conflict with this work. For full disclosure statements refer to https://doi.org/10.1016/j.infsof.2023.107188.

Acknowledgments

The work is funded by the National Natural Science Foundation of China (NSFC) No. 62102160, and the Natural Science Research Foundation of Jilin Province of China (Grant 20230101070JC).

References (49)

PanWeifeng et al.
Structure-aware mashup service clustering for cloud-based internet of things using genetic algorithm based clustering algorithm
Future Gener. Comput. Syst.
(2018)
SmithLinda B. et al.
The development of embodied cognition: Six lessons from babies
Artif. Life
(2005)
Sufang Zhang, Jun-Hai Zhai, Bo-Jun Xie, Yan Zhan, Xin Wang, Multimodal Representation Learning: Advances, Trends and...
Xinwei Sun, Yilun Xu, Peng Cao, Yuqing Kong, Lingjing Hu, Shan Zhang, Yizhou Wang, TCGM: An Information-Theoretic...
Changqing Zhang, Zongbo Han, Yajie Cui, H. Fu, Joey Tianyi Zhou, Qinghua Hu, CPM-Nets: Cross Partial Multi-View...
ChoJaemin et al.
X-LXMERT: Paint, caption and answer questions with multi-modal transformers
(2020)
Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee, ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for...
WangPeiqi et al.
Using multiple instance learning to build multimodal representations
(2022)
Shih-Cheng Huang, Liyue Shen, Matthew P. Lungren, Serena Yeung, GLoRIA: A Multimodal Global-Local Representation...
Soravit Changpinyo, Jordi Pont-Tuset, Vittorio Ferrari, Radu Soricut, Telling the What while Pointing to the Where:...

DabrowskiJacek et al.

Analysing app reviews for software engineering: a systematic literature review

Empir. Softw. Eng.

(2022)

GregorioMarianna Di et al.

The making of accessible android applications: an empirical study on the state of the practice

Empir. Softw. Eng.

(2020)

CruzL. et al.

To the attention of mobile software developers: guess what, test your app!

Empir. Softw. Eng.

(2019)

Shi Zhong Yang, Rui Li, Jiongyi Chen, Wenrui Diao, Shanqing Guo, Demystifying Android Non-SDK APls: Measurement and...

Qiao Huang, Xin Xia, Zhenchang Xing, D. Lo, Xinyu Wang, API Method Recommendation without Worrying about the Task-API...

Zijie Chen, Tao Zhang, Xiao Peng, A Novel API Recommendation Approach By Using Graph Attention Network, in: 2021 IEEE...

Vitalis Salis, Thodoris Sotiropoulos, Panos Louridas, Diomidis D. Spinellis, Dimitris Mitropoulos, PyCG: Practical Call...

DevlinJacob et al.

BERT: Pre-training of deep bidirectional transformers for language understanding

(2019)

YuDian et al.

Dialogue-based relation extraction

(2020)

Xiaodong Gu, Hongyu Zhang, D. Zhang, Sunghun Kim, Deep API learning, in: Proceedings of the 2016 24th ACM SIGSOFT...

Ferdian Thung, Shaowei Wang, D. Lo, Julia L. Lawall, Automatic recommendation of API methods from feature requests, in:...

Collin McMillan, M. Grechanik, Denys Poshyvanyk, Qing Xie, Chen Fu, Portfolio: finding relevant functions and their...

ChenChi et al.

Holistic combination of structural and textual code information for context based API recommendation

(2020)

Mohammad Masudur Rahman, Chanchal Kumar Roy, D. Lo, RACK: Automatic API Recommendation Using Crowdsourced Knowledge,...

Cited by (0)

View full text

Describing the APIs comprehensively: Obtaining the holistic representations from multiple modalities data for different tasks

Abstract

Context:

Objective:

Method:

Results and Conclusion:

Introduction

Section snippets

Related work

Gaussian-Bernoulli restricted Boltzmann machine

API information representation in two modalities

Presenting APIs based on two modalities

API information utilization: applying BDBM to different tasks

Experiments and results

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Future Gener. Comput. Syst.

The development of embodied cognition: Six lessons from babies

Artif. Life

X-LXMERT: Paint, caption and answer questions with multi-modal transformers

Using multiple instance learning to build multimodal representations

Analysing app reviews for software engineering: a systematic literature review

Empir. Softw. Eng.

The making of accessible android applications: an empirical study on the state of the practice

Empir. Softw. Eng.

To the attention of mobile software developers: guess what, test your app!

Empir. Softw. Eng.

BERT: Pre-training of deep bidirectional transformers for language understanding

Dialogue-based relation extraction

Holistic combination of structural and textual code information for context based API recommendation