Rap4DQ: Learning to recommend relevant API documentation for developer questions

Li, Yi; Wang, Shaohua; Wang, Wenbo; Nguyen, Tien N.; Wang, Yan; Ye, Xinyue

doi:10.1007/s10664-021-10067-5

Rap4DQ: Learning to recommend relevant API documentation for developer questions

Published: 29 November 2021

Volume 27, article number 23, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Yi Li¹,
Shaohua Wang ORCID: orcid.org/0000-0001-5777-7759¹,
Wenbo Wang¹,
Tien N. Nguyen²,
Yan Wang³ &
…
Xinyue Ye⁴

630 Accesses
1 Citation
Explore all metrics

Abstract

Developers often face difficulties in using different API methods during the software development process. Answering API related questions on API Q&A forums often costs API development teams a lot of time. To help save time for API development teams, we propose a deep learning-based approach, namely Rap4DQ, to identify relevant web API documentation for developer’s API related questions on API Q&A forums. Rap4DQ learns representation vectors for questions and API documentation separately using Gated Recurrent Unit (GRU) and adds different weights to reflect the various importance of varied API documents during training. Rap4DQ is designed to train on positive and negative samples with a loss function that minimizes the distances between questions and their relevant documentation, but maximizes the distances between questions and their irrelevant documentation. In the end, we construct a learning-to-rank layer to rank the API documentation based on learned representation vectors from GRUs. We have conducted several experiments to evaluate Rap4DQ on three popular and large API Q&A forums, Twitter, eBay, and AdWords. The results show that Rap4DQ can outperform all baselines by having a relative improvement up to 84.3% in terms of AUC. Rap4DQ can obtain a high AUC of 0.84, 0.88, and 0.94 on identifying relevant API documentation on Twitter, eBay, and AdWords, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews

Article 23 May 2023

Automated Assessment of Question Quality on Online Community Forums

iLinker: a novel approach for issue knowledge acquisition in GitHub projects

Article 27 January 2020

References

AdWords (2019) A adwords question: How to create dynamic targeting ’all websites’. https://groups.google.com/forum/#!topic/adwords-api/xPIhAyhAX9o. Last Accessed May 10, 2019
Adwords (2020) Adwords. https://ads.google.com/
Annoy (2020) Annoy. URL https://github.com/spotify/annoy
Berger A, Caruana R, Cohn D, Freitag D, Mittal V (2000) Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 192–199
Bishop CM (2006) Pattern recognition and machine learning. Springer
Brokos G-I, Malakasiotis P, Androutsopoulos I (2016) Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. arXiv:1608.03905
Burke RD, Hammond KJ, Kulyukin V, Lytinen SL, Tomuro N, Schoenberg S (1997) Question answering from frequently asked question files: Experiences with the faq finder system. AI Mag 18(2):57–57
Google Scholar
Cao Q, Trivedi H, Balasubramanian A, Balasubramanian N (2020) Deformer: Decomposing pre-trained transformers for faster question answering. arXi:2005.00697
Cao X, Cong G, Cui B, Jensen CS (2010) A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the 19th international conference on World wide web. ACM, pp 201–210
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078
Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805
Duan H, Cao Y, Lin C-Y, Yu Y (2008) Searching questions by identifying question topic and question focus. Proceedings of ACL-08: HLT, pp 156–164
Ebay (2019) A ebay question: How to find product descriptions by id? https://forums.developer.ebay.com/questions/16455/how-to-find-product-descriptions-by-id.html. Last Accessed May 10, 2019
eBay (2020) ebay. https://www.ebay.com/
Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403
Article Google Scholar
Figueroa A, Neumann G (2016) Context-aware semantic classification of search queries for browsing community question–answering archives. Knowl-Based Syst 96:1–13
Article Google Scholar
Gu X, Zhang H, Zhang D, Kim S (2016) Deep api learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 631–642
Guo J, Fan Y, Ai Q, Bruce Croft W (2016) A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp 55–64
He H, Ning Q, Roth D (2020) Quase: Question-answer driven sentence encoding. In: Proc. of the annual meeting of the association for computational linguistics (ACL)
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) Api method recommendation without worrying about the task-api knowledge gap. In: 2018 33Rd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 293–304
Jeon J, Bruce Croft W, Lee JH (2005) Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, pp 84–90
Ji Z, Xu F, Wang B, He B (2012) Question-answer topic model for question retrieval in community question answering. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 2471–2474
Keras (2019) Keras documentation. https://keras.io/. Last Accessed May 10, 2019
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Kokkinos Y, Margaritis KG (2015) Topology and simulations of a hierarchical markovian radial basis function neural network classifier. Inf Sci 294:612–627
Article MathSciNet Google Scholar
Kusner M, Yu S, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on machine learning, pp 957–966
Li J, Sun A, Xing Z (2018a) Learning to answer programming questions with software documentation through social context embedding. Inf Sci 448:36–52
Article Google Scholar
Li X, Jiang H, Kamei Y, Chen X (2018b) Bridging semantic gaps between natural languages and apis with word embedding. IEEE Trans Softw Eng 46(10):1081–1097
Li Y, Wang S, Nguyen TN (2020) An empirical study on the characteristics of question-answering process on developer forums. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, pp 318–319
Lilleberg J, Zhu Y, Zhang Y (2015) Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14Th international conference on cognitive informatics & cognitive computing (ICCI* CC). IEEE, pp 136–140
Luong M-T (2015) Hieu pham, and christopher d manning. Effective approaches to attention-based neural machine translation. arXiv:1508.04025
Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 2857–2866
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781
Nassif H, Mohtarami M, Glass J (2016) Learning semantic relatedness in community question answering using neural models. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp 137–147
Nicosia M, Filice S, Barrón-Cedeno A, Saleh I, Mubarak H, Gao W, Nakov P, Da San Martino G, Moschitti A, Darwish K et al (2015) Qcri: Answer selection for community question answering-experiments for arabic and english. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp 203–209
NLTK (2020) Nltk. https://www.nltk.org/
Pal SK, Mitra S (1992) Multilayer perceptron, fuzzy sets, and classification. IEEE Trans Neural Netw 3(5):683–697
Article Google Scholar
Palangi H, Li D, Shen Y, Gao J, He X, Chen J, Song X, Ward R (2016) Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 24(4):694–707
Article Google Scholar
Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv:1802.05365
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9
Google Scholar
Rahman MM, Roy C (2018) Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 473–484
Rahman MM, Roy CK, Lo D (2016) Rack: Automatic api recommendation using crowdsourced knowledge. In: 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 349–359
Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press
Ranklib (2020) Ranklib. https://github.com/codelibs/ranklib. Last Accessed Dec 9, 2020
Rap4DQ Replication (2020) Rap4dq-replication. https://github.com/spacenjit/QA2020
Robertson S, Zaragoza H, et al. (2009) The probabilistic relevance framework: Bm25 and beyond. Found Trends®; Inf Retr 3(4):333–389
Article Google Scholar
Sakai T, Ishikawa D, Kando N, Seki Y, Kuriyama K, Lin C-Y (2011) Using graded-relevance metrics for evaluating community qa answer selection. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 187–196. ACM
Scikit-learn (2020) Scikit-learn. https://scikit-learn.org/stable/
Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 373–382
Severyn A, Moschitti A (2016) Modeling relational information in question-answer pairs with convolutional neural networks. arXiv:1604.01178
Silva RFG, Roy CK, Rahman MM, Schneider A, Paixao K, de Almeida Maia M (2019) Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In: 2019 IEEE/ACM 27Th international conference on program comprehension (ICPC). IEEE, pp 358–368
Singh P, Simperl E (2016) Using semantics to search answers for unanswered questions in q&a forums. In: Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, pp 699–706
Squire M (2015) ”Should we move to stack overflow?” measuring the utility of social media for developer support. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol 2. IEEE, pp 219–228
StackExchangeNetwork (2020) Stack overflow. https://stackoverflow.com/
Sun R, Cui H, Li K, Kan M-Y, Chua T-S (2005) Dependency relation matching for answer selection. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 651–652
Surdeanu M, Ciaramita M, Zaragoza H (2008) Learning to rank answers on large online qa collections. In: Proceedings of ACL-08: HLT, pp 719–727
Sutskever I, Vinyals O, Le Quoc V (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112
Tan Mx, Santos CD, Xiang B, Zhou B (2016) Improved representation learning for question answer matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1, pp 464–473
Twitter (2020) Twitter. URL https://twitter.com/
Uddin G, Khomh F (2017) Automatic summarization of api reviews. In: 2017 32Nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 159–170
Venkatesh PK, Wang S, Zhang F, Zou Y, Hassan AE (2016) What do client developers concern when using web apis? an empirical study on developer forums and stack overflow. In: 2016 IEEE International conference on web services (ICWS). IEEE, pp 131–138
Wang S, Chen T-HP, Hassan AE (2018) How do users revise answers on technical q&a websites? a case study on stack overflow. IEEE Transactions on Software Engineering
Wu Q, Burges CJC, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270
Article Google Scholar
Xue X, Jeon J, Bruce Croft W (2008) Retrieval models for question and answer archives. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 475–482
Yan R, Song Y, Wu H (2016) Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, pp 55–64
Yao Y, Tong H, Xie T, Akoglu L, Xu F, Lu J (2015) Detecting high-quality posts in community question answering sites. Inf Sci 302:70–82
Article Google Scholar
Yen S-J, Wu Y-C, Yang J-C, Lee Y-S, Lee C-J, Liu J-J (2013) A support vector machine-based context-ranking model for question answering. Inf Sci 224:77–87
Article Google Scholar
Zhou G, Li C, Zhao J, Liu K (2011) Phrase-based translation model for question retrieval in community question answer archives. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, pp 653–662
Zhou G, Liu Y, Liu F, Zeng D, Zhao J (2013) Improving question retrieval in community question answering using world knowledge. In: Twenty-third international joint conference on artificial intelligence
Zhou G, He T, Zhao J, Hu P (2015) Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol 1, pp 250–259
Zhou G, Zhou Y, He T, Wu W (2016) Learning semantic representation with neural networks for community question answering retrieval. Knowl-Based Syst 93:75–83
Article Google Scholar

Download references

Acknowledgements

We thank the anonymous reviewers who reviewed our paper and the associated editor for their valuable feedback.

Author information

Authors and Affiliations

New Jersey Institute of Technology, University Heights, Newark, NJ, 07102, USA
Yi Li, Shaohua Wang & Wenbo Wang
The University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX, 75080-3021, USA
Tien N. Nguyen
Central University of Finance and Economics, Changping District, Beijing, 100081, China
Yan Wang
Texas A, M University, 400 Bizzell St, College Station, TX, 77843, USA
Xinyue Ye

Authors

Yi Li
View author publications
You can also search for this author inPubMed Google Scholar
Shaohua Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wenbo Wang
View author publications
You can also search for this author inPubMed Google Scholar
Tien N. Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Yan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Xinyue Ye
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Shaohua Wang.

Additional information

Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude and Alexander Serebrenik

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Collective Knowledge in Software Engineering

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Wang, S., Wang, W. et al. Rap4DQ: Learning to recommend relevant API documentation for developer questions. Empir Software Eng 27, 23 (2022). https://doi.org/10.1007/s10664-021-10067-5

Download citation

Accepted: 18 October 2021
Published: 29 November 2021
DOI: https://doi.org/10.1007/s10664-021-10067-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rap4DQ: Learning to recommend relevant API documentation for developer questions

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluating pre-trained models for user feedback analysis in software engineering: a study on classification of app-reviews

Automated Assessment of Question Quality on Online Community Forums

iLinker: a novel approach for issue knowledge acquisition in GitHub projects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now