Abstract
API tutorials are crucial resources as they often provide detailed explanations of how to utilize APIs. Typically, an API tutorial is segmented into a number of consecutive fragments.. If a fragment explains API usage, we regard it as a relevant fragment of the API. Recognizing relevant fragments can aid developers in comprehending, learning, and using APIs. Recently, some studies have presented relevant fragments recognition approaches, which mainly focused on using API tutorials or Stack Overflow to train the recognition model. API references are also important API learning resources as they contain abundant API knowledge. Considering the similarity between API tutorials and API references (both provide API knowledge), we believe that using API knowledge from API references could help recognize relevant tutorial fragments of APIs effectively. However, it is non-trivial to leverage API references to build a supervised learning-based recognition model. Two major problems are the lack of labeled API references and the unavailability of engineered features of API references. We propose a supervised learning based approach named RRTR (which stands for Recognize Relevant Tutorial fragments using API References) to address the above problems. For the problem of lacking labeled API references, RRTR designs heuristic rules to automatically collect relevant and irrelevant API references for APIs. Regarding the unavailable engineered features issue, we adopt the pre-trained SBERT model (SBERT stands for Sentence-BERT) to automatically learn semantic features for API references. More specifically, we first automatically generate labeled \(\left\langle API, ARE \right\rangle\) pairs (ARE stands for an API reference) via our heuristic rules of API references. We then use SBERT to automatically learn semantic features for the collected pairs and train a supervised learning based recognition model. Finally, we can recognize the relevant tutorial fragments of APIs based on the trained model. To evaluate the effectiveness of RRTR, we collected Java and Android API reference datasets containing a total of 20,680 labeled \(\left\langle API, ARE \right\rangle\) pairs. Experimental results demonstrate that RRTR outperforms state-of-the-art approaches in terms of F-Measure on two datasets. In addition, we conducted a user study to investigate the practicality of RRTR and the results further illustrate the effectiveness of RRTR in practice. The proposed RRTR approach can effectively recognize relevant fragments of APIs with API references by solving the problems of lacking labeled API references and engineered features of API references.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10515-023-00401-0/MediaObjects/10515_2023_401_Fig10_HTML.png)
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availibility Statement
We released our tool and experimental data at: https://sites.google.com/view/rrtr2023/.
Notes
The relevance between fragment (a) and Period has been annotated by Petrosyan et al. (2015).
References
Alon, U., Zilberstein, M., Levy, O., Yahav, E.: code2vec: learning distributed representations of code. Proc. ACM Program. Lang. 3, 1–29 (2019)
Azad, S., Rigby, P.C., Guerrouj, L.: Generating API call rules from version history and stack overflow posts. ACM Trans. Softw. Eng. Methodol. 25(4), 1–22 (2017)
Bao, L., Xing, Z., Xia, X., Lo, D., Wu, M., Yang, X.: psc2code: denoising code extraction from programming screencasts. ACM Trans. Softw. Eng. Methodol. 29(3), 1–38 (2020)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Chen, C., Zhang, K.: Who asked what: integrating crowdsourced faqs into API documentation. In: International Conference on Software Engineering, pp. 456–459 (2014)
Chowdhury, S.A., Hindle, A.: Mining stackoverflow to filter out off-topic IRC discussion. In: Working Conference on Mining Software Repositories, pp. 422–425 (2015)
Ciborowska, A., Damevski, K.: Fast changeset-based bug localization with bert. In: International Conference on Software Engineering, pp. 946–957 (2022)
Cliff, N.: Ordinal Methods for Behavioral Data Analysis. Psychology Press, London (2014)
Dekel, U., Herbsleb, J.D.: Improving API documentation usability with knowledge pushing. In: International Conference on Software Engineering, pp 320–330 (2009)
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: A pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics, pp 1536–1547 (2020)
Fu, W., Menzies, T.: Revisiting unsupervised learning for defect prediction. In: Joint Meeting on Foundations of Software Engineering, pp. 72–83 (2017)
Gao, Z., Xia, X., Grundy, J., Lo, D., Li, Y.F.: Generating question titles for stack overflow from mined code snippets. ACM Trans. Softw. Eng. Methodol. 29(4), 1–37 (2020)
Gu, X., Zhang, H., Zhang, D., Kim, S.: Deep API learning. In: International Symposium on Foundations of Software Engineering, pp. 631–642 (2016)
Hall, M.A., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Hoang, T., Kang, H.J., Lo, D., Lawall, J.: Cc2vec: distributed representations of code changes. In: International Conference on Software Engineering, pp. 518–529 (2020)
Huang, Q., Xia, X., Lo, D.: Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: International Conference on Software Maintenance and Evolution, pp. 159–170 (2017)
Huang, Q., Xia, X., Xing, Z., Lo, D., Wang, X.: API method recommendation without worrying about the task-API knowledge gap. In: International Conference on Automated Software Engineering, pp. 293–304 (2018)
Isotani, H., Washizaki, H., Fukazawa, Y., Nomoto, T., Ouji, S., Saito, S.: Duplicate bug report detection by using sentence embedding and fine-tuning. In: IEEE International Conference on Software Maintenance and Evolution, pp. 535–544 (2021)
Jiang, H., Zhang, J., Li, X., Ren, Z., Lo, D.: A more accurate model for finding tutorial segments explaining API s. In: International Conference on Software Analysis, Evolution, and Reengineering, pp 157–167 (2016)
Jiang, H., Zhang, J., Ren, Z., Zhang, T.: An unsupervised approach for discovering relevant tutorial fragments for API s. In: International Conference on Software Engineering, pp 38–48 (2017)
Jing, X., Wu, F., Dong, X., Xu, B.: An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans. Softw. Eng. 43(4), 321–339 (2017)
Karmakar, A., Robbes, R.: What do pre-trained code models know about code? In: International Conference on Automated Software Engineering, pp. 1332–1336 (2021)
Kenton, J.D.M.W.C., Toutanova, L.K.: Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp .4171–4186 (2019)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
Li, H., Li, S., Sun, J., Xing, Z., Peng, X., Liu, M., Zhao, X.: Improving API caveats accessibility by mining API caveats knowledge graph. In: International Conference on Software Maintenance and Evolution, pp. 183–193 (2018)
Li, X., Jiang, H., Kamei, Y., Chen, X.: Bridging semantic gaps between natural languages and API s with word embedding. IEEE Trans. Softw. Eng. 46(10), 1081–1097 (2020)
Lin, J., Liu, Y., Zeng, Q., Jiang, M., Cleland-Huang, J.: Traceability transformed: Generating more accurate links with pre-trained Bert models. In: International Conference on Software Engineering, pp. 324–335 (2021)
Lin, B., Wang, S., Wen, M., Mao, X.: Context-aware code change embedding for better patch correctness assessment. ACM Trans. Softw. Eng. Methodol. 31(3), 1–29 (2022)
Luo, X., Xue, Y., Xing, Z., Sun, J., Prcbert: Prompt learning for requirement classification using bert-based pretrained language models. In: International Conference on Automated Software Engineering, pp 1–13 (2022)
Ma, S., Xing, Z., Chen, C., Chen, C., Qu, L., Li, G.: Easy-to-deploy API extraction by multi-level feature embedding and transfer learning. IEEE Trans. Softw. Eng. 47(10), 2296–2311 (2021)
Maalej, W., Robillard, M.P.: Patterns of knowledge in API reference documentation. IEEE Trans. Softw. Eng. 39(9), 1264–1282 (2013)
Meyer, A.N., Fritz, T., Murphy, G.C., Zimmermann, T.: Software developers’ perceptions of productivity. In: Proceedings of the International Symposium on Foundations of Software Engineering, pp. 19–29 (2014)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Annual Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)
Nguyen, T.V., Tran, N.M., Phan, H., Nguyen, T.D., Truong, L.H., Nguyen, A.T., Nguyen, H.A., Nguyen, T.N.: Complementing global and local contexts in representing API descriptions to improve API retrieval tasks. In: Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 551–562 (2018)
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI-99 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)
Petrosyan, G., Robillard, M.P., De Mori, R.: Discovering information explaining API types using text classification. In: International Conference on Software Engineering, pp. 869–879 (2015)
Ponzanelli, L., Bavota, G., Mocci, A., Oliveto, R., Penta, M.D., Haiduc, S., Russo, B., Lanza, M.: Automatic identification and classification of software development video tutorial fragments. IEEE Trans. Softw. Eng. 45(5), 464–488 (2019)
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks (2010)
Reimers, N., Gurevych, I.: Sentence-Bert: sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084 (2019)
Robillard, M.P.: What makes API s hard to learn? answers from developers. IEEE Softw. 26(6), 27–34 (2009)
Robillard, M.P., Chhetri, Y.B.: Recommending reference API documentation. Empir. Softw. Eng. 20(6), 1558–1586 (2015)
Robillard, M.P., DeLine, R.: A field study of API learning obstacles. Empir. Softw. Eng. 16(6), 703–732 (2011)
Tan, C., Sun, .F, Kong, T., Zhang, W., Yang, C., Liu, C.: A survey on deep transfer learning. CoRR arXiv:1808.01974 (2018)
Tensorflow framework: https://www.tensorflow.org (2023)
Tian, H., Liu, K., Li, Y., Kaboré, A.K., Koyuncu, A., Habib, A., Li, L., Wen, J., Klein, J., Bissyandé, T.F.: The best of both worlds: combining learned embeddings with engineered features for accurate prediction of correct patches. ACM Trans. Softw. Eng. Methodol. (2022). https://doi.org/10.1145/3576039
Treude, C., Robillard, M.P.: Augmenting API documentation with insights from stack overflow. In: International Conference on Software Engineering, pp. 392–403 (2016)
Treude, C., Robillard, M.P., Dagenais, B.: Extracting development tasks to navigate software documentation. IEEE Trans. Softw. Eng. 41(6), 565–581 (2015)
Viggiato, M., Paas, D., Buzon, C., Bezemer, C.P.: Identifying similar test cases that are specified in natural language. IEEE Trans. Softw. Eng. 49(3), 1027–1043 (2022)
Wang, D., Jia, Z., Li, S., Yu, Y., Xiong, Y., Dong, W., Liao, X.: Bridging pre-trained models and downstream tasks for source code understanding. In: International Conference on Software Engineering, pp 287–298 (2022)
Wei, M., Harzevili, N.S., Huang, Y., Wang, J., Wang, S.: Clear: contrastive learning for API recommendation. In: International Conference on Software Engineering, pp. 376–387 (2022)
Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics Bull. 1(6), 80–83 (1945)
Wu, D., Jing, X.Y., Zhang, H., Kong, X., Xie, Y., Huang, Z.: Data-driven approach to application programming interface documentation mining: a review. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 10(5), e1369 (2020)
Wu, D., Jing, X.Y., Zhang, H., Li, B., Xie, Y., Xu, B.: Generating API tags for tutorial fragments from stack overflow. Empir. Softw. Eng. 26(4), 66 (2021)
Wu, D., Jing, X.Y., Zhang, H., Feng, Y., Chen, H., Zhou, Y., Xu, B.: Retrieving API knowledge from tutorials and stack overflow based on natural language queries. ACM Trans. Softw. Eng. Methodol. 32(5), 1–36 (2023)
Wu, D., Jing, X.Y., Zhang, H., Zhou, Y., Xu, B.: Leveraging stack overflow to detect relevant tutorial fragments of APIs. Empir. Softw. Eng. 28(1), 12 (2023)
Xu, B., Xing, Z., Xia, X., Lo, D.: Answerbot: automated generation of answer summary to developers’ technical questions. In: International Conference on Automated Software Engineering, pp. 706–716 (2017)
Xu, B., Ye, D., Xing, Z., Xia, X., Chen, G., Li, S.: Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: International Conference on Automated Software Engineering, pp. 51–62 (2016)
Ye, X., Shen, H., Ma, X., Bunescu, R.C., Liu, C.: From word embeddings to document similarities for improved information retrieval in software engineering. In: International Conference on Software Engineering, pp. 404–415 (2016)
Zhang, H., Jain, A., Khandelwal, G., Kaushik, C., Ge, S., Hu, W.: Bing developer assistant: improving developer productivity by recommending sample code. In: International Symposium on Foundations of Software Engineering, pp. 956–961 (2016)
Zhang, J., Liu, S., Gong, L., Zhang, H., Huang, Z., Jiang, H.: Beqain: An effective and efficient identifier normalization approach with bert and the question answering system. IEEE Trans. Softw. Eng. (2022a, in press)
Zhang, F., Niu, H., Keivanloo, I., Zou, Y.: Expanding queries for code search using semantically related API class-names. IEEE Trans. Softw. Eng. 44(11), 1070–1082 (2018)
Zhang, J., Jiang, H., Ren, Z., Zhang, T., Huang, Z.: Enriching API documentation with code samples and usage scenarios from crowd knowledge. IEEE Trans. Softw. Eng. 47(6), 1299–1314 (2021)
Zhong, H., Xie, T., Zhang, L., Pei, J., Mei, H.: MAPO: mining and recommending API usage patterns. In: Object-Oriented Programming, pp. 318–343 (2009a)
Zhang, N., Huang, Q., Xia, X., Zou, Y., Lo, D., Xing, Z.: Chatbot4qr: interactive query refinement for technical question retrieval. IEEE Trans. Softw. Eng. 48(4), 1185–1211 (2022)
Zhong, H., Zhang, L., Xie, T., Mei, H.: Inferring resource specifications from natural language API documentation. In: International Conference on Automated Software Engineering, pp. 307–318 (2009b)
Zhong, H., Mei, H.: An empirical study on API usages. IEEE Trans. Softw. Eng. 45(4), 319–334 (2019)
Zhou, Y., Wang, C., Yan, X., Chen, T., Panichella, S., Gall, H.C.: Automatic detection and repair recommendation of directive defects in java API documentation. IEEE Trans. Softw. Eng. 46(9), 1004–1023 (2020)
Acknowledgements
We would like to thank anonymous reviewers for their insightful and constructive comments. This research was partially funded by the National Natural Science Foundation of China under Grant No. 62172209, and the Science, Technology and Innovation Commission of Shenzhen Municipality (No.CJGJZD20200617103001003, 2021Szvup057).
Author information
Authors and Affiliations
Contributions
Di Wu, Yang Feng, and Hongyu Zhang wrote the main manuscript text. All authors reviewed the manuscript.
Corresponding author
Ethics declarations
Competing interests
All the authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wu, D., Feng, Y., Zhang, H. et al. Automatic recognizing relevant fragments of APIs using API references. Autom Softw Eng 31, 3 (2024). https://doi.org/10.1007/s10515-023-00401-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10515-023-00401-0