Skip to main content
Log in

Rap4DQ: Learning to recommend relevant API documentation for developer questions

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Developers often face difficulties in using different API methods during the software development process. Answering API related questions on API Q&A forums often costs API development teams a lot of time. To help save time for API development teams, we propose a deep learning-based approach, namely Rap4DQ, to identify relevant web API documentation for developer’s API related questions on API Q&A forums. Rap4DQ learns representation vectors for questions and API documentation separately using Gated Recurrent Unit (GRU) and adds different weights to reflect the various importance of varied API documents during training. Rap4DQ is designed to train on positive and negative samples with a loss function that minimizes the distances between questions and their relevant documentation, but maximizes the distances between questions and their irrelevant documentation. In the end, we construct a learning-to-rank layer to rank the API documentation based on learned representation vectors from GRUs. We have conducted several experiments to evaluate Rap4DQ on three popular and large API Q&A forums, Twitter, eBay, and AdWords. The results show that Rap4DQ can outperform all baselines by having a relative improvement up to 84.3% in terms of AUC. Rap4DQ can obtain a high AUC of 0.84, 0.88, and 0.94 on identifying relevant API documentation on Twitter, eBay, and AdWords, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • AdWords (2019) A adwords question: How to create dynamic targeting ’all websites’. https://groups.google.com/forum/#!topic/adwords-api/xPIhAyhAX9o. Last Accessed May 10, 2019

  • Adwords (2020) Adwords. https://ads.google.com/

  • Annoy (2020) Annoy. URL https://github.com/spotify/annoy

  • Berger A, Caruana R, Cohn D, Freitag D, Mittal V (2000) Bridging the lexical chasm: statistical approaches to answer-finding. In: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 192–199

  • Bishop CM (2006) Pattern recognition and machine learning. Springer

  • Brokos G-I, Malakasiotis P, Androutsopoulos I (2016) Using centroids of word embeddings and word mover’s distance for biomedical document retrieval in question answering. arXiv:1608.03905

  • Burke RD, Hammond KJ, Kulyukin V, Lytinen SL, Tomuro N, Schoenberg S (1997) Question answering from frequently asked question files: Experiences with the faq finder system. AI Mag 18(2):57–57

    Google Scholar 

  • Cao Q, Trivedi H, Balasubramanian A, Balasubramanian N (2020) Deformer: Decomposing pre-trained transformers for faster question answering. arXi:2005.00697

  • Cao X, Cong G, Cui B, Jensen CS (2010) A generalized framework of exploring category information for question retrieval in community question answer archives. In: Proceedings of the 19th international conference on World wide web. ACM, pp 201–210

  • Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078

  • Devlin J, Chang M-W, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805

  • Duan H, Cao Y, Lin C-Y, Yu Y (2008) Searching questions by identifying question topic and question focus. Proceedings of ACL-08: HLT, pp 156–164

  • Ebay (2019) A ebay question: How to find product descriptions by id? https://forums.developer.ebay.com/questions/16455/how-to-find-product-descriptions-by-id.html. Last Accessed May 10, 2019

  • eBay (2020) ebay. https://www.ebay.com/

  • Er MJ, Zhang Y, Wang N, Pratama M (2016) Attention pooling-based convolutional neural network for sentence modelling. Inf Sci 373:388–403

    Article  Google Scholar 

  • Figueroa A, Neumann G (2016) Context-aware semantic classification of search queries for browsing community question–answering archives. Knowl-Based Syst 96:1–13

    Article  Google Scholar 

  • Gu X, Zhang H, Zhang D, Kim S (2016) Deep api learning. In: Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 631–642

  • Guo J, Fan Y, Ai Q, Bruce Croft W (2016) A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp 55–64

  • He H, Ning Q, Roth D (2020) Quase: Question-answer driven sentence encoding. In: Proc. of the annual meeting of the association for computational linguistics (ACL)

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) Api method recommendation without worrying about the task-api knowledge gap. In: 2018 33Rd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 293–304

  • Jeon J, Bruce Croft W, Lee JH (2005) Finding similar questions in large question and answer archives. In: Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, pp 84–90

  • Ji Z, Xu F, Wang B, He B (2012) Question-answer topic model for question retrieval in community question answering. In: Proceedings of the 21st ACM international conference on Information and knowledge management. ACM, pp 2471–2474

  • Keras (2019) Keras documentation. https://keras.io/. Last Accessed May 10, 2019

  • Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882

  • Kokkinos Y, Margaritis KG (2015) Topology and simulations of a hierarchical markovian radial basis function neural network classifier. Inf Sci 294:612–627

    Article  MathSciNet  Google Scholar 

  • Kusner M, Yu S, Kolkin N, Weinberger K (2015) From word embeddings to document distances. In: International conference on machine learning, pp 957–966

  • Li J, Sun A, Xing Z (2018a) Learning to answer programming questions with software documentation through social context embedding. Inf Sci 448:36–52

    Article  Google Scholar 

  • Li X, Jiang H, Kamei Y, Chen X (2018b) Bridging semantic gaps between natural languages and apis with word embedding. IEEE Trans Softw Eng 46(10):1081–1097

  • Li Y, Wang S, Nguyen TN (2020) An empirical study on the characteristics of question-answering process on developer forums. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings, pp 318–319

  • Lilleberg J, Zhu Y, Zhang Y (2015) Support vector machines and word2vec for text classification with semantic features. In: 2015 IEEE 14Th international conference on cognitive informatics & cognitive computing (ICCI* CC). IEEE, pp 136–140

  • Luong M-T (2015) Hieu pham, and christopher d manning. Effective approaches to attention-based neural machine translation. arXiv:1508.04025

  • Mamykina L, Manoim B, Mittal M, Hripcsak G, Hartmann B (2011) Design lessons from the fastest q&a site in the west. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 2857–2866

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  • Nassif H, Mohtarami M, Glass J (2016) Learning semantic relatedness in community question answering using neural models. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp 137–147

  • Nicosia M, Filice S, Barrón-Cedeno A, Saleh I, Mubarak H, Gao W, Nakov P, Da San Martino G, Moschitti A, Darwish K et al (2015) Qcri: Answer selection for community question answering-experiments for arabic and english. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), pp 203–209

  • NLTK (2020) Nltk. https://www.nltk.org/

  • Pal SK, Mitra S (1992) Multilayer perceptron, fuzzy sets, and classification. IEEE Trans Neural Netw 3(5):683–697

    Article  Google Scholar 

  • Palangi H, Li D, Shen Y, Gao J, He X, Chen J, Song X, Ward R (2016) Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Trans Audio Speech Lang Process (TASLP) 24(4):694–707

    Article  Google Scholar 

  • Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L (2018) Deep contextualized word representations. arXiv:1802.05365

  • Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners. OpenAI blog 1(8):9

    Google Scholar 

  • Rahman MM, Roy C (2018) Effective reformulation of query for code search using crowdsourced knowledge and extra-large data analytics. In: 2018 IEEE International conference on software maintenance and evolution (ICSME). IEEE, pp 473–484

  • Rahman MM, Roy CK, Lo D (2016) Rack: Automatic api recommendation using crowdsourced knowledge. In: 2016 IEEE 23Rd international conference on software analysis, evolution, and reengineering (SANER), vol 1. IEEE, pp 349–359

  • Rajaraman A, Ullman JD (2011) Mining of massive datasets. Cambridge University Press

  • Ranklib (2020) Ranklib. https://github.com/codelibs/ranklib. Last Accessed Dec 9, 2020

  • Rap4DQ Replication (2020) Rap4dq-replication. https://github.com/spacenjit/QA2020

  • Robertson S, Zaragoza H, et al. (2009) The probabilistic relevance framework: Bm25 and beyond. Found Trends®; Inf Retr 3(4):333–389

    Article  Google Scholar 

  • Sakai T, Ishikawa D, Kando N, Seki Y, Kuriyama K, Lin C-Y (2011) Using graded-relevance metrics for evaluating community qa answer selection. In: Proceedings of the fourth ACM international conference on Web search and data mining, pp 187–196. ACM

  • Scikit-learn (2020) Scikit-learn. https://scikit-learn.org/stable/

  • Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 373–382

  • Severyn A, Moschitti A (2016) Modeling relational information in question-answer pairs with convolutional neural networks. arXiv:1604.01178

  • Silva RFG, Roy CK, Rahman MM, Schneider A, Paixao K, de Almeida Maia M (2019) Recommending comprehensive solutions for programming tasks by mining crowd knowledge. In: 2019 IEEE/ACM 27Th international conference on program comprehension (ICPC). IEEE, pp 358–368

  • Singh P, Simperl E (2016) Using semantics to search answers for unanswered questions in q&a forums. In: Proceedings of the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee, pp 699–706

  • Squire M (2015) ”Should we move to stack overflow?” measuring the utility of social media for developer support. In: 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, vol 2. IEEE, pp 219–228

  • StackExchangeNetwork (2020) Stack overflow. https://stackoverflow.com/

  • Sun R, Cui H, Li K, Kan M-Y, Chua T-S (2005) Dependency relation matching for answer selection. In: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 651–652

  • Surdeanu M, Ciaramita M, Zaragoza H (2008) Learning to rank answers on large online qa collections. In: Proceedings of ACL-08: HLT, pp 719–727

  • Sutskever I, Vinyals O, Le Quoc V (2014) Sequence to sequence learning with neural networks. In: Advances in neural information processing systems, pp 3104–3112

  • Tan Mx, Santos CD, Xiang B, Zhou B (2016) Improved representation learning for question answer matching. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol 1, pp 464–473

  • Twitter (2020) Twitter. URL https://twitter.com/

  • Uddin G, Khomh F (2017) Automatic summarization of api reviews. In: 2017 32Nd IEEE/ACM international conference on automated software engineering (ASE). IEEE, pp 159–170

  • Venkatesh PK, Wang S, Zhang F, Zou Y, Hassan AE (2016) What do client developers concern when using web apis? an empirical study on developer forums and stack overflow. In: 2016 IEEE International conference on web services (ICWS). IEEE, pp 131–138

  • Wang S, Chen T-HP, Hassan AE (2018) How do users revise answers on technical q&a websites? a case study on stack overflow. IEEE Transactions on Software Engineering

  • Wu Q, Burges CJC, Svore KM, Gao J (2010) Adapting boosting for information retrieval measures. Inf Retr 13(3):254–270

    Article  Google Scholar 

  • Xue X, Jeon J, Bruce Croft W (2008) Retrieval models for question and answer archives. In: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 475–482

  • Yan R, Song Y, Wu H (2016) Learning to respond with deep neural networks for retrieval-based human-computer conversation system. In: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, pp 55–64

  • Yao Y, Tong H, Xie T, Akoglu L, Xu F, Lu J (2015) Detecting high-quality posts in community question answering sites. Inf Sci 302:70–82

    Article  Google Scholar 

  • Yen S-J, Wu Y-C, Yang J-C, Lee Y-S, Lee C-J, Liu J-J (2013) A support vector machine-based context-ranking model for question answering. Inf Sci 224:77–87

    Article  Google Scholar 

  • Zhou G, Li C, Zhao J, Liu K (2011) Phrase-based translation model for question retrieval in community question answer archives. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, pp 653–662

  • Zhou G, Liu Y, Liu F, Zeng D, Zhao J (2013) Improving question retrieval in community question answering using world knowledge. In: Twenty-third international joint conference on artificial intelligence

  • Zhou G, He T, Zhao J, Hu P (2015) Learning continuous word embedding with metadata for question retrieval in community question answering. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol 1, pp 250–259

  • Zhou G, Zhou Y, He T, Wu W (2016) Learning semantic representation with neural networks for community question answering retrieval. Knowl-Based Syst 93:75–83

    Article  Google Scholar 

Download references

Acknowledgements

We thank the anonymous reviewers who reviewed our paper and the associated editor for their valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaohua Wang.

Additional information

Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude and Alexander Serebrenik

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Collective Knowledge in Software Engineering

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Wang, S., Wang, W. et al. Rap4DQ: Learning to recommend relevant API documentation for developer questions. Empir Software Eng 27, 23 (2022). https://doi.org/10.1007/s10664-021-10067-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-10067-5

Keywords

Navigation