Automatic recognizing relevant fragments of APIs using API references

Wu, Di; Feng, Yang; Zhang, Hongyu; Xu, Baowen

doi:10.1007/s10515-023-00401-0

Automatic recognizing relevant fragments of APIs using API references

Published: 19 November 2023

Volume 31, article number 3, (2024)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Di Wu¹,
Yang Feng¹,
Hongyu Zhang² &
…
Baowen Xu¹

465 Accesses
Explore all metrics

Abstract

API tutorials are crucial resources as they often provide detailed explanations of how to utilize APIs. Typically, an API tutorial is segmented into a number of consecutive fragments.. If a fragment explains API usage, we regard it as a relevant fragment of the API. Recognizing relevant fragments can aid developers in comprehending, learning, and using APIs. Recently, some studies have presented relevant fragments recognition approaches, which mainly focused on using API tutorials or Stack Overflow to train the recognition model. API references are also important API learning resources as they contain abundant API knowledge. Considering the similarity between API tutorials and API references (both provide API knowledge), we believe that using API knowledge from API references could help recognize relevant tutorial fragments of APIs effectively. However, it is non-trivial to leverage API references to build a supervised learning-based recognition model. Two major problems are the lack of labeled API references and the unavailability of engineered features of API references. We propose a supervised learning based approach named RRTR (which stands for Recognize Relevant Tutorial fragments using API References) to address the above problems. For the problem of lacking labeled API references, RRTR designs heuristic rules to automatically collect relevant and irrelevant API references for APIs. Regarding the unavailable engineered features issue, we adopt the pre-trained SBERT model (SBERT stands for Sentence-BERT) to automatically learn semantic features for API references. More specifically, we first automatically generate labeled $\left\langle API, ARE \right\rangle$ pairs (ARE stands for an API reference) via our heuristic rules of API references. We then use SBERT to automatically learn semantic features for the collected pairs and train a supervised learning based recognition model. Finally, we can recognize the relevant tutorial fragments of APIs based on the trained model. To evaluate the effectiveness of RRTR, we collected Java and Android API reference datasets containing a total of 20,680 labeled $\left\langle API, ARE \right\rangle$ pairs. Experimental results demonstrate that RRTR outperforms state-of-the-art approaches in terms of F-Measure on two datasets. In addition, we conducted a user study to investigate the practicality of RRTR and the results further illustrate the effectiveness of RRTR in practice. The proposed RRTR approach can effectively recognize relevant fragments of APIs with API references by solving the problems of lacking labeled API references and engineered features of API references.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

Article 25 November 2022

Generating API tags for tutorial fragments from Stack Overflow

Article 08 May 2021

APIReal: an API recognition and linking approach for online developer forums

Article 05 March 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availibility Statement

We released our tool and experimental data at: https://sites.google.com/view/rrtr2023/.

Notes

References

Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3, 1–29 (2019)
Article Google Scholar
Azad, S., Rigby, P.C., Guerrouj, L.: Generating API call rules from version history and stack overflow posts. ACM Trans. Softw. Eng. Methodol. 25(4), 1–22 (2017)
Article Google Scholar
Bao, L., Xing, Z., Xia, X., Lo, D., Wu, M., Yang, X.: psc2code: denoising code extraction from programming screencasts. ACM Trans. Softw. Eng. Methodol. 29(3), 1–38 (2020)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Chen, C., Zhang, K.: Who asked what: integrating crowdsourced faqs into API documentation. In: International Conference on Software Engineering, pp. 456–459 (2014)
Chowdhury, S.A., Hindle, A.: Mining stackoverflow to filter out off-topic IRC discussion. In: Working Conference on Mining Software Repositories, pp. 422–425 (2015)
Ciborowska, A., Damevski, K.: Fast changeset-based bug localization with bert. In: International Conference on Software Engineering, pp. 946–957 (2022)
Cliff, N.: Ordinal Methods for Behavioral Data Analysis. Psychology Press, London (2014)
Book Google Scholar
Dekel, U., Herbsleb, J.D.: Improving API documentation usability with knowledge pushing. In: International Conference on Software Engineering, pp 320–330 (2009)
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: A pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics, pp 1536–1547 (2020)
Fu, W., Menzies, T.: Revisiting unsupervised learning for defect prediction. In: Joint Meeting on Foundations of Software Engineering, pp. 72–83 (2017)
Gao, Z., Xia, X., Grundy, J., Lo, D., Li, Y.F.: Generating question titles for stack overflow from mined code snippets. ACM Trans. Softw. Eng. Methodol. 29(4), 1–37 (2020)
Article Google Scholar
Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API learning. In: International Symposium on Foundations of Software Engineering, pp. 631–642 (2016)
Hall, M.A., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Hoang, T., Kang, H.J., Lo, D., Lawall, J.: Cc2vec: distributed representations of code changes. In: International Conference on Software Engineering, pp. 518–529 (2020)
Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: International Conference on Software Maintenance and Evolution, pp. 159–170 (2017)
Huang, Q., Xia, X., Xing, Z., Lo, D., Wang, X.: API method recommendation without worrying about the task-API knowledge gap. In: International Conference on Automated Software Engineering, pp. 293–304 (2018)
Isotani, H., Washizaki, H., Fukazawa, Y., Nomoto, T., Ouji, S., Saito, S.: Duplicate bug report detection by using sentence embedding and fine-tuning. In: IEEE International Conference on Software Maintenance and Evolution, pp. 535–544 (2021)
Jiang, H., Zhang, J., Li, X., Ren, Z., Lo, D.: A more accurate model for finding tutorial segments explaining API s. In: International Conference on Software Analysis, Evolution, and Reengineering, pp 157–167 (2016)
Jiang, H., Zhang, J., Ren, Z., Zhang, T.: An unsupervised approach for discovering relevant tutorial fragments for API s. In: International Conference on Software Engineering, pp 38–48 (2017)
Jing, X., Wu, F., Dong, X., Xu, B.: An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans. Softw. Eng. 43(4), 321–339 (2017)
Article Google Scholar
Karmakar, A., Robbes, R.: What do pre-trained code models know about code? In: International Conference on Automated Software Engineering, pp. 1332–1336 (2021)
Kenton, J.D.M.W.C., Toutanova, L.K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp .4171–4186 (2019)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Article Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Li, H., Li, S., Sun, J., Xing, Z., Peng, X., Liu, M., Zhao, X.: Improving API caveats accessibility by mining API caveats knowledge graph. In: International Conference on Software Maintenance and Evolution, pp. 183–193 (2018)
Li, X., Jiang, H., Kamei, Y., Chen, X.: Bridging semantic gaps between natural languages and API s with word embedding. IEEE Trans. Softw. Eng. 46(10), 1081–1097 (2020)
Article Google Scholar
Lin, J., Liu, Y., Zeng, Q., Jiang, M., Cleland-Huang, J.: Traceability transformed: Generating more accurate links with pre-trained Bert models. In: International Conference on Software Engineering, pp. 324–335 (2021)
Lin, B., Wang, S., Wen, M., Mao, X.: Context-aware code change embedding for better patch correctness assessment. ACM Trans. Softw. Eng. Methodol. 31(3), 1–29 (2022)
Google Scholar
Luo, X., Xue, Y., Xing, Z., Sun, J., Prcbert: Prompt learning for requirement classification using bert-based pretrained language models. In: International Conference on Automated Software Engineering, pp 1–13 (2022)
Ma, S., Xing, Z., Chen, C., Chen, C., Qu, L., Li, G.: Easy-to-deploy API extraction by multi-level feature embedding and transfer learning. IEEE Trans. Softw. Eng. 47(10), 2296–2311 (2021)
Article Google Scholar
Maalej, W., Robillard, M.P.: Patterns of knowledge in API reference documentation. IEEE Trans. Softw. Eng. 39(9), 1264–1282 (2013)
Article Google Scholar
Meyer, A.N., Fritz, T., Murphy, G.C., Zimmermann, T.: Software developers’ perceptions of productivity. In: Proceedings of the International Symposium on Foundations of Software Engineering, pp. 19–29 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Annual Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Nguyen, T.V., Tran, N.M., Phan, H., Nguyen, T.D., Truong, L.H., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: Complementing global and local contexts in representing API descriptions to improve API retrieval tasks. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 551–562 (2018)
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)
Petrosyan, G., Robillard, M.P., De Mori, R.: Discovering information explaining API types using text classification. In: International Conference on Software Engineering, pp. 869–879 (2015)
Ponzanelli, L., Bavota, G., Mocci, A., Oliveto, R., Penta, M.D., Haiduc, S., Russo, B., Lanza, M.: Automatic identification and classification of software development video tutorial fragments. IEEE Trans. Softw. Eng. 45(5), 464–488 (2019)
Article Google Scholar
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010)
Reimers, N., Gurevych, I.: Sentence-Bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019)
Robillard, M.P.: What makes API s hard to learn? answers from developers. IEEE Softw. 26(6), 27–34 (2009)
Article Google Scholar
Robillard, M.P., Chhetri, Y.B.: Recommending reference API documentation. Empir. Softw. Eng. 20(6), 1558–1586 (2015)
Article Google Scholar
Robillard, M.P., DeLine, R.: A field study of API learning obstacles. Empir. Softw. Eng. 16(6), 703–732 (2011)
Article Google Scholar
Tan, C., Sun, .F, Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. CoRR arXiv:1808.01974 (2018)
Tensorflow framework: https://www.tensorflow.org (2023)
Tian, H., Liu, K., Li, Y., Kaboré, A.K., Koyuncu, A., Habib, A., Li, L., Wen, J., Klein, J., Bissyandé, T.F.: The best of both worlds: combining learned embeddings with engineered features for accurate prediction of correct patches. ACM Trans. Softw. Eng. Methodol. (2022). https://doi.org/10.1145/3576039
Article Google Scholar
Treude, C., Robillard, M.P.: Augmenting API documentation with insights from stack overflow. In: International Conference on Software Engineering, pp. 392–403 (2016)
Treude, C., Robillard, M.P., Dagenais, B.: Extracting development tasks to navigate software documentation. IEEE Trans. Softw. Eng. 41(6), 565–581 (2015)
Article Google Scholar
Viggiato, M., Paas, D., Buzon, C., Bezemer, C.P.: Identifying similar test cases that are specified in natural language. IEEE Trans. Softw. Eng. 49(3), 1027–1043 (2022)
Article Google Scholar
Wang, D., Jia, Z., Li, S., Yu, Y., Xiong, Y., Dong, W., Liao, X.: Bridging pre-trained models and downstream tasks for source code understanding. In: International Conference on Software Engineering, pp 287–298 (2022)
Wei, M., Harzevili, N.S., Huang, Y., Wang, J., Wang, S.: Clear: contrastive learning for API recommendation. In: International Conference on Software Engineering, pp. 376–387 (2022)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Article Google Scholar
Wu, D., Jing, X.Y., Zhang, H., Kong, X., Xie, Y., Huang, Z.: Data-driven approach to application programming interface documentation mining: a review. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 10(5), e1369 (2020)
Google Scholar
Wu, D., Jing, X.Y., Zhang, H., Li, B., Xie, Y., Xu, B.: Generating API tags for tutorial fragments from stack overflow. Empir. Softw. Eng. 26(4), 66 (2021)
Article Google Scholar
Wu, D., Jing, X.Y., Zhang, H., Feng, Y., Chen, H., Zhou, Y., Xu, B.: Retrieving API knowledge from tutorials and stack overflow based on natural language queries. ACM Trans. Softw. Eng. Methodol. 32(5), 1–36 (2023)
Article Google Scholar
Wu, D., Jing, X.Y., Zhang, H., Zhou, Y., Xu, B.: Leveraging stack overflow to detect relevant tutorial fragments of APIs. Empir. Softw. Eng. 28(1), 12 (2023)
Article Google Scholar
Xu, B., Xing, Z., Xia, X., Lo, D.: Answerbot: automated generation of answer summary to developers’ technical questions. In: International Conference on Automated Software Engineering, pp. 706–716 (2017)
Xu, B., Ye, D., Xing, Z., Xia, X., Chen, G., Li, S.: Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: International Conference on Automated Software Engineering, pp. 51–62 (2016)
Ye, X., Shen, H., Ma, X., Bunescu, R.C., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: International Conference on Software Engineering, pp. 404–415 (2016)
Zhang, H., Jain, A., Khandelwal, G., Kaushik, C., Ge, S., Hu, W.: Bing developer assistant: improving developer productivity by recommending sample code. In: International Symposium on Foundations of Software Engineering, pp. 956–961 (2016)
Zhang, J., Liu, S., Gong, L., Zhang, H., Huang, Z., Jiang, H.: Beqain: An effective and efficient identifier normalization approach with bert and the question answering system. IEEE Trans. Softw. Eng. (2022a, in press)
Zhang, F., Niu, H., Keivanloo, I., Zou, Y.: Expanding queries for code search using semantically related API class-names. IEEE Trans. Softw. Eng. 44(11), 1070–1082 (2018)
Article Google Scholar
Zhang, J., Jiang, H., Ren, Z., Zhang, T., Huang, Z.: Enriching API documentation with code samples and usage scenarios from crowd knowledge. IEEE Trans. Softw. Eng. 47(6), 1299–1314 (2021)
Article Google Scholar
Zhong, H., Xie, T., Zhang, L., Pei, J., Mei, H.: MAPO: mining and recommending API usage patterns. In: Object-Oriented Programming, pp. 318–343 (2009a)
Zhang, N., Huang, Q., Xia, X., Zou, Y., Lo, D., Xing, Z.: Chatbot4qr: interactive query refinement for technical question retrieval. IEEE Trans. Softw. Eng. 48(4), 1185–1211 (2022)
Article Google Scholar
Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: International Conference on Automated Software Engineering, pp. 307–318 (2009b)
Zhong, H., Mei, H.: An empirical study on API usages. IEEE Trans. Softw. Eng. 45(4), 319–334 (2019)
Article Google Scholar
Zhou, Y., Wang, C., Yan, X., Chen, T., Panichella, S., Gall, H.C.: Automatic detection and repair recommendation of directive defects in java API documentation. IEEE Trans. Softw. Eng. 46(9), 1004–1023 (2020)
Article Google Scholar

Download references

Acknowledgements

We would like to thank anonymous reviewers for their insightful and constructive comments. This research was partially funded by the National Natural Science Foundation of China under Grant No. 62172209, and the Science, Technology and Innovation Commission of Shenzhen Municipality (No.CJGJZD20200617103001003, 2021Szvup057).

Author information

Authors and Affiliations

The State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Di Wu, Yang Feng & Baowen Xu
School of Big Data and Software Engineering, Chongqing University, Chongqing, China
Hongyu Zhang

Authors

Di Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Feng
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Baowen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Di Wu, Yang Feng, and Hongyu Zhang wrote the main manuscript text. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yang Feng.

Ethics declarations

Competing interests

All the authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, D., Feng, Y., Zhang, H. et al. Automatic recognizing relevant fragments of APIs using API references. Autom Softw Eng 31, 3 (2024). https://doi.org/10.1007/s10515-023-00401-0

Download citation

Received: 11 May 2023
Accepted: 30 September 2023
Published: 19 November 2023
DOI: https://doi.org/10.1007/s10515-023-00401-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic recognizing relevant fragments of APIs using API references

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

Generating API tags for tutorial fragments from Stack Overflow

APIReal: an API recognition and linking approach for online developer forums

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Automatic recognizing relevant fragments of APIs using API references

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

Generating API tags for tutorial fragments from Stack Overflow

APIReal: an API recognition and linking approach for online developer forums

Explore related subjects

Data Availibility Statement

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation