Skip to main content
Log in

Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Developers often use learning resources such as API tutorials and Stack Overflow (SO) to learn how to use an unfamiliar API. An API tutorial can be divided into a number of consecutive units that describe the same topic, denoted as tutorial fragments. We consider a tutorial fragment explaining the API usage knowledge as a relevant fragment of the API. Discovering relevant tutorial fragments of APIs can facilitate API understanding, learning, and application. However, existing approaches, based on supervised or unsupervised approaches, often suffer from either high manual efforts or lack of consideration of the relevance information. In this paper, we propose a novel approach, called SO2RT, to detect relevant tutorial fragments of APIs based on SO posts. SO2RT first automatically extracts relevant and irrelevant \(\left \langle API, QA \right \rangle \) pairs (QA stands for question and answer) and \(\left \langle API, FRA \right \rangle \) pairs (FRA stands for tutorial fragment). It then trains a semi-supervised transfer learning based detection model, which can transfer the API usage knowledge in SO Q&A pairs to tutorial fragments by utilizing the easy-to-extract \(\left \langle API, QA \right \rangle \) pairs. Finally, relevant fragments of APIs can be discovered by consulting the trained model. In this way, the effort for labeling the relevance between tutorial fragments and APIs can be reduced. We evaluate SO2RT on Java and Android datasets containing 21,008 \(\left \langle API, QA \right \rangle \) pairs. Experimental results show that SO2RT improves the state-of-the-art approaches in terms of F-Measure on both datasets. Our user study further confirms the effectiveness of SO2RT in practice. We also show a successful application of the relevant fragments to API recommendation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data Availability

Our tool, experimental data, and results are publicly available at: https://sites.google.com/site/stcaso2rt.

Notes

  1. https://www.joda.org/joda-time/userguide.html

  2. https://www.joda.org/joda-time/apidocs/index.html

  3. https://archive.org/download/stackexchange

  4. https://www.joda.org/joda-time/apidocs/index.html

  5. https://www.joda.org/joda-time/apidocs/index.html

  6. http://commons.apache.org/proper/commons-math/javadocs/

  7. https://www.oracle.com/technetwork/java/javase/documentation/index.html

  8. http://download.igniterealtime.org/smack/docs/

  9. https://developer.android.com/reference/packages

  10. https://www.joda.org/joda-time/userguide.html#Intervals

    Table 3 Precision, recall and F-measure of SO2RT and the baseline approaches on McGill dataset
    Table 4 Precision, recall and F-measure of SO2RT and the baseline approaches on android dataset
  11. https://stackoverflow.com/questions/15358409

  12. https://www.mathworks.com/matlabcentral/fileexchange/69718-semi-supervised-learning-functions

  13. https://sites.google.com/site/stcaso2rt/user-study

  14. https://www.joda.org/joda-time/userguide.html#Custom_Formatters

  15. https://stackoverflow.com/questions/6252678/

  16. https://www.joda.org/joda-time/userguide.html#Time_fields

References

  • How to add one day to a date? (2018a). https://stackoverflow.com/questions/1005523/

  • Joda time - add weekdays to date (2018b). https://stackoverflow.com/questions/12728527/

  • JodaTime tutorial Construction (2018c). https://www.joda.org/joda-time/userguide.html#Construction

  • Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2020) CAPS: a supervised technique for classifying stack overflow posts concerning API issues. Empir Softw Eng 25(2):1493–1532

    Article  Google Scholar 

  • Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Working conference on mining software repositories, pp 97–100

  • Bao L, Xing Z, Xia X, Lo D, Wu M, Yang X (2020) psc2code: Denoising code extraction from programming screencasts. ACM Trans Softw Eng Methodol 29(3):1–38

    Article  Google Scholar 

  • Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  • Bulmer MG (1979) Principles of statistics. Courier Corporation, Massachusetts

    MATH  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  • Chowdhury SA, Hindle A (2015) Mining stackoverflow to filter out off-topic IRC discussion. In: Working conference on mining software repositories, pp 422–425

  • Cliff N (2014) Ordinal methods for behavioral data analysis. Psychology Press, United Kingdom

    Book  Google Scholar 

  • Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(Aug):1871–1874

    MATH  Google Scholar 

  • Gao Z, Xia X, Lo D, Grundy J (2020) Technical q8a site answer recommendation via question boosting. ACM Trans Softw Eng Methodol 30(1):1–34

    Article  Google Scholar 

  • Gretton A, Bousquet O, Smola AJ, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: Algorithmic learning theory, pp 63–77

  • Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: International symposium on foundations of software engineering, pp 631–642

  • Gu X, Zhang H, Kim S (2018) Deep code search. In: International conference on software engineering, pp 933–944

  • Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: International conference on software maintenance and evolution, pp 159–170

  • Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) API method recommendation without worrying about the task-api knowledge gap. In: International conference on automated software engineering, pp 293–304

  • Jiang H, Zhang J, Li X, Ren Z, Lo D (2016) A more accurate model for finding tutorial segments explaining APIs. In: International conference on software analysis, evolution, and reengineering, pp 157–167

  • Jiang H, Zhang J, Ren Z, Zhang T (2017) An unsupervised approach for discovering relevant tutorial fragments for apis. In: International conference on software engineering, pp 38–48

  • Jiang M, Huang W, Huang Z, Yen GG (2017) Integration of global and local metrics for domain adaptation learning via dimensionality reduction. IEEE Trans Cybern 47(1):38–51

    Article  Google Scholar 

  • Jing X, Wu F, Dong X, Xu B (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339

    Article  Google Scholar 

  • Joachims T (1999) Transductive inference for text classification using support vector machines. Icml 99:200–209

    Google Scholar 

  • Kittler J, Hater M, Duin RP (1996) Combining classifiers. In: International conference on pattern recognition, vol 2, pp 897–901

  • Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174

    Article  MATH  Google Scholar 

  • Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196

  • Li J, Xing Z, Kabir M A (2018) Leveraging official content and social context to recommend software documentation. IEEE Trans Serv Comput 14(2):472–486

    Article  Google Scholar 

  • Li X, Fang M, Zhang J J, Wu J (2017) Domain adaptation from rgb-d to rgb images. Signal Process 131:27–35

    Article  Google Scholar 

  • Li X, Jiang H, Kamei Y, Chen X (2020a) Bridging semantic gaps between natural languages and apis with word embedding. IEEE Trans Softw Eng 46(10):1081–1097

    Article  Google Scholar 

  • Li Y, Sheng H, Cheng Y, Stroe D I, Teodorescu R (2020b) State-of-health estimation of lithium-ion batteries based on semi-supervised transfer component analysis. Appl Energy 115504:277

    Google Scholar 

  • Lin B, Zampetti F, Bavota G, Penta MD, Lanza M (2019) Pattern-based mining of opinions in q&a websites. In: International conference on software engineering, pp 548–559

  • Lin Z, Zou Y, Zhao J, Xie B (2017) Improving software text retrieval using conceptual knowledge in source code. In: International conference on automated software engineering, pp 123–134

  • Ma S, Xing Z, Chen C, Chen C, Qu L, Li G (2021) Easy-to-deploy api extraction by multi-level feature embedding and transfer learning. IEEE Trans Softw Eng 47(10):2296–2311

    Article  Google Scholar 

  • Maalej W, Robillard MP (2013) Patterns of knowledge in API reference documentation. IEEE Trans Softw Eng 39(9):1264–1282

    Article  Google Scholar 

  • Manning CD (2008) Introduction to information retrieval. Cambridge University Press, United Kingdom

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Annual conference on neural information processing systems, pp 3111–3119

  • Nguyen TD, Nguyen AT, Phan HD, Nguyen TN (2017) Exploring API embedding for API usages and applications. In: International conference on software engineering, pp 438–449

  • Nguyen TV, Tran NM, Phan H, Nguyen TD, Truong LH, Nguyen AT, Nguyen HA, Nguyen TN (2018) Complementing global and local contexts in representing API descriptions to improve API retrieval tasks. In: Joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 551–562

  • Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering, pp 61–67

  • Pan SJ, Tsang IW, Kwok JT, Yang Q (2009) Domain adaptation via transfer component analysis. In: International joint conference on artificial intelligence, pp 1187–1192

  • Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210

    Article  Google Scholar 

  • Parnin C, Treude C, Grammel L, Storey MA (2012) Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech Rep

    Google Scholar 

  • Petrosyan G, Robillard MP, De Mori R (2015) Discovering information explaining api types using text classification. In: International conference on software engineering, pp 869–879

  • Ponzanelli L, Bavota G, Mocci A, Oliveto R, Penta M D, Haiduc S, Russo B, Lanza M (2019) Automatic identification and classification of software development video tutorial fragments. IEEE Trans Softw Eng 45(5):464–488

    Article  Google Scholar 

  • Raghothaman M, Wei Y, Hamadi Y (2016) SWIM: synthesizing what i mean: code search and idiomatic snippet synthesis. In: International conference on software engineering, pp 357–367

  • Rahman MM, Roy CK (2015) An insight into the unresolved questions at stack overflow. In: Working conference on mining software repositories, pp 426–429

  • Rahman MM, Roy CK, Lo D (2016) RACK: automatic API recommendation using crowdsourced knowledge. In: International conference on software analysis, evolution, and reengineering, pp 349–359

  • Rahman MM, Roy CK, Lo D (2017) RACK: code search in the IDE using crowdsourced knowledge. In: International conference on software engineering, pp 51–54

  • Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC. Workshop on New Challenges for NLP Frameworks, Citeseer, p 2010

  • Robillard M P (2009) What makes apis hard to learn? answers from developers. IEEE Softw 26(6):27–34

    Article  Google Scholar 

  • Robillard MP, Chhetri YB (2015) Recommending reference API documentation. Empir Softw Eng 20(6):1558–1586

    Article  Google Scholar 

  • Robillard MP, DeLine R (2011) A field study of API learning obstacles. Empir Softw Eng 16(6):703–732

    Article  Google Scholar 

  • Rubei R, Di Sipio C, Nguyen PT, Di Rocco J, Di Ruscio D (2020) Postfinder: mining stack overflow posts to support software developers. Inf Softw Technol 106367:127

    Google Scholar 

  • Smola AJ, Gretton A, Song L, Schölkopf B (2007) A hilbert space embedding for distributions. In: Algorithmic learning theory, 18th international conference, pp 13–31

  • Subramanian S, Inozemtseva L, Holmes R (2014) Live API documentation. In: International conference on software engineering, pp 643–652

  • Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. CoRR:1808.01974

  • Thung F, Wang S, Lo D, Lawall J (2013) Automatic recommendation of api methods from feature requests. In: International conference on automated software engineering, pp 290–300

  • Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665

    Article  MathSciNet  Google Scholar 

  • Treude C, Robillard MP (2016) Augmenting API documentation with insights from stack overflow. In: International conference on software engineering, pp 392–403

  • Treude C, Barzilay O, Storey MD (2011) How do programmers ask and answer questions on the web?. In: International conference on software engineering, pp 804–807

  • Treude C, Robillard M P, Dagenais B (2015) Extracting development tasks to navigate software documentation. IEEE Trans Softw Eng 41(6):565–581

    Article  Google Scholar 

  • Uddin G, Khomh F, Roy CK (2020) Mining api usage scenarios from stack overflow. Inf Softw Technol 122:106277

    Article  Google Scholar 

  • Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics bulletin 1(6):80–83

    Article  Google Scholar 

  • Wu D, Jing XY, Chen H, Zhu X, Zhang H, Zuo M, Zi L, Zhu C (2018) Automatically answering api-related questions. In: International conference on software engineering: companion proceeedings, pp 270–271

  • Wu D, Jing XY, Zhang H, Kong X, Xie Y, Huang Z (2020) Data-driven approach to application programming interface documentation mining: a review. Wiley Interdiscip Rev Data Min Knowl Disc 10(5):e1369

    Google Scholar 

  • Wu D, Jing X, Zhang H, Li B, Xie Y, Xu B (2021a) Generating API tags for tutorial fragments from stack overflow. Empir Softw Eng 26(4):66

    Article  Google Scholar 

  • Wu D, Jing XY, Zhang H, Zhou Y, Xu B (2021b) Leveraging Stack Overflow to detect relevant tutorial fragments of apis. In: International conference on software analysis, evolution and reengineering, pp 35–46

  • Xie W, Peng X, Liu M, Treude C, Xing Z, Zhang X, Zhao W, Zimmermann T (2020) API method recommendation via explicit matching of functionality verb phrases. In: Devanbu P, Cohen MB (eds) Joint european software engineering conference and symposium on the foundations of software engineering, pp 1015–1026

  • Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: International conference on automated software engineering, pp 51–62

  • Xu B, Xing Z, Xia X, Lo D (2017) Answerbot: automated generation of answer summary to developers’ technical questions. In: International conference on automated software engineering, pp 706–716

  • Ye X, Shen H, Ma X, Bunescu RC, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: International conference on software engineering, pp 404–415

  • Zhang F, Niu H, Keivanloo I, Zou Y (2018) Expanding queries for code search using semantically related API class-names. IEEE Trans Softw Eng 44(11):1070–1082

    Article  Google Scholar 

  • Zhang J, Jiang H, Ren Z, Zhang T, Huang Z (2021) Enriching api documentation with code samples and usage scenarios from crowd knowledge. IEEE Trans Softw Eng 47(6):1299–1314

    Article  Google Scholar 

  • Zhang N, Huang Q, Xia X, Zou Y, Lo D, Xing Z (2020) Chatbot4qr: interactive query refinement for technical question retrieval. IEEE Trans Softw Eng 48(4):1185–1211

    Article  Google Scholar 

  • Zhao D, Xing Z, Chen C, Xia X, Li G (2019) Actionnet: vision-based workflow action recognition from programming screencasts. In: International conference on software engineering, pp 350–361

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This work was supported by the NSFC Project under Grant No. 62176069 and 61933013, the Innovation Group of Guangdong Education Department under Grant No. 2020KCXTD014, the 2019 Key Discipline project of Guangdong Province, and Jiangsu Funding Program for Excellent Postdoctoral Talent No. 20220ZB43. Hongyu Zhang is supported by Australian Research Council (ARC) Discovery Project DP220103044.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao-Yuan Jing.

Additional information

Communicated by: Christoph Treude

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, D., Jing, XY., Zhang, H. et al. Leveraging Stack Overflow to detect relevant tutorial fragments of APIs. Empir Software Eng 28, 12 (2023). https://doi.org/10.1007/s10664-022-10235-1

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10235-1

Keywords

Navigation