Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

Wu, Di; Jing, Xiao-Yuan; Zhang, Hongyu; Zhou, Yuming; Xu, Baowen

doi:10.1007/s10664-022-10235-1

Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

Published: 25 November 2022

Volume 28, article number 12, (2023)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Di Wu ORCID: orcid.org/0000-0003-1096-7074¹,
Xiao-Yuan Jing^1,2,3,
Hongyu Zhang⁴,
Yuming Zhou¹ &
…
Baowen Xu¹

456 Accesses
3 Citations
Explore all metrics

Abstract

Developers often use learning resources such as API tutorials and Stack Overflow (SO) to learn how to use an unfamiliar API. An API tutorial can be divided into a number of consecutive units that describe the same topic, denoted as tutorial fragments. We consider a tutorial fragment explaining the API usage knowledge as a relevant fragment of the API. Discovering relevant tutorial fragments of APIs can facilitate API understanding, learning, and application. However, existing approaches, based on supervised or unsupervised approaches, often suffer from either high manual efforts or lack of consideration of the relevance information. In this paper, we propose a novel approach, called SO2RT, to detect relevant tutorial fragments of APIs based on SO posts. SO2RT first automatically extracts relevant and irrelevant \(\left \langle API, QA \right \rangle \) pairs (QA stands for question and answer) and \(\left \langle API, FRA \right \rangle \) pairs (FRA stands for tutorial fragment). It then trains a semi-supervised transfer learning based detection model, which can transfer the API usage knowledge in SO Q&A pairs to tutorial fragments by utilizing the easy-to-extract \(\left \langle API, QA \right \rangle \) pairs. Finally, relevant fragments of APIs can be discovered by consulting the trained model. In this way, the effort for labeling the relevance between tutorial fragments and APIs can be reduced. We evaluate SO2RT on Java and Android datasets containing 21,008 \(\left \langle API, QA \right \rangle \) pairs. Experimental results show that SO2RT improves the state-of-the-art approaches in terms of F-Measure on both datasets. Our user study further confirms the effectiveness of SO2RT in practice. We also show a successful application of the relevant fragments to API recommendation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic recognizing relevant fragments of APIs using API references

Article 19 November 2023

Generating API tags for tutorial fragments from Stack Overflow

Article 08 May 2021

APIReal: an API recognition and linking approach for online developer forums

Article 05 March 2018

Data Availability

Our tool, experimental data, and results are publicly available at: https://sites.google.com/site/stcaso2rt.

Notes

https://www.joda.org/joda-time/userguide.html
https://www.joda.org/joda-time/apidocs/index.html
https://archive.org/download/stackexchange
https://www.joda.org/joda-time/apidocs/index.html
https://www.joda.org/joda-time/apidocs/index.html
http://commons.apache.org/proper/commons-math/javadocs/
https://www.oracle.com/technetwork/java/javase/documentation/index.html
http://download.igniterealtime.org/smack/docs/
https://developer.android.com/reference/packages
https://www.joda.org/joda-time/userguide.html#Intervals
Table 3 Precision, recall and F-measure of SO2RT and the baseline approaches on McGill dataset
Full size table
Table 4 Precision, recall and F-measure of SO2RT and the baseline approaches on android dataset
Full size table
https://stackoverflow.com/questions/15358409
https://www.mathworks.com/matlabcentral/fileexchange/69718-semi-supervised-learning-functions
https://sites.google.com/site/stcaso2rt/user-study
https://www.joda.org/joda-time/userguide.html#Custom_Formatters
https://stackoverflow.com/questions/6252678/
https://www.joda.org/joda-time/userguide.html#Time_fields

References

How to add one day to a date? (2018a). https://stackoverflow.com/questions/1005523/
Joda time - add weekdays to date (2018b). https://stackoverflow.com/questions/12728527/
JodaTime tutorial Construction (2018c). https://www.joda.org/joda-time/userguide.html#Construction
Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2020) CAPS: a supervised technique for classifying stack overflow posts concerning API issues. Empir Softw Eng 25(2):1493–1532
Article Google Scholar
Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Working conference on mining software repositories, pp 97–100
Bao L, Xing Z, Xia X, Lo D, Wu M, Yang X (2020) psc2code: Denoising code extraction from programming screencasts. ACM Trans Softw Eng Methodol 29(3):1–38
Article Google Scholar
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
MathSciNet MATH Google Scholar
Bulmer MG (1979) Principles of statistics. Courier Corporation, Massachusetts
MATH Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Chowdhury SA, Hindle A (2015) Mining stackoverflow to filter out off-topic IRC discussion. In: Working conference on mining software repositories, pp 422–425
Cliff N (2014) Ordinal methods for behavioral data analysis. Psychology Press, United Kingdom
Book Google Scholar
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9(Aug):1871–1874
MATH Google Scholar
Gao Z, Xia X, Lo D, Grundy J (2020) Technical q8a site answer recommendation via question boosting. ACM Trans Softw Eng Methodol 30(1):1–34
Article Google Scholar
Gretton A, Bousquet O, Smola AJ, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: Algorithmic learning theory, pp 63–77
Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: International symposium on foundations of software engineering, pp 631–642
Gu X, Zhang H, Kim S (2018) Deep code search. In: International conference on software engineering, pp 933–944
Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: a holistic look at effort-aware just-in-time defect prediction. In: International conference on software maintenance and evolution, pp 159–170
Huang Q, Xia X, Xing Z, Lo D, Wang X (2018) API method recommendation without worrying about the task-api knowledge gap. In: International conference on automated software engineering, pp 293–304
Jiang H, Zhang J, Li X, Ren Z, Lo D (2016) A more accurate model for finding tutorial segments explaining APIs. In: International conference on software analysis, evolution, and reengineering, pp 157–167
Jiang H, Zhang J, Ren Z, Zhang T (2017) An unsupervised approach for discovering relevant tutorial fragments for apis. In: International conference on software engineering, pp 38–48
Jiang M, Huang W, Huang Z, Yen GG (2017) Integration of global and local metrics for domain adaptation learning via dimensionality reduction. IEEE Trans Cybern 47(1):38–51
Article Google Scholar
Jing X, Wu F, Dong X, Xu B (2017) An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems. IEEE Trans Softw Eng 43(4):321–339
Article Google Scholar
Joachims T (1999) Transductive inference for text classification using support vector machines. Icml 99:200–209
Google Scholar
Kittler J, Hater M, Duin RP (1996) Combining classifiers. In: International conference on pattern recognition, vol 2, pp 897–901
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174
Article MATH Google Scholar
Le QV, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Li J, Xing Z, Kabir M A (2018) Leveraging official content and social context to recommend software documentation. IEEE Trans Serv Comput 14(2):472–486
Article Google Scholar
Li X, Fang M, Zhang J J, Wu J (2017) Domain adaptation from rgb-d to rgb images. Signal Process 131:27–35
Article Google Scholar
Li X, Jiang H, Kamei Y, Chen X (2020a) Bridging semantic gaps between natural languages and apis with word embedding. IEEE Trans Softw Eng 46(10):1081–1097
Article Google Scholar
Li Y, Sheng H, Cheng Y, Stroe D I, Teodorescu R (2020b) State-of-health estimation of lithium-ion batteries based on semi-supervised transfer component analysis. Appl Energy 115504:277
Google Scholar
Lin B, Zampetti F, Bavota G, Penta MD, Lanza M (2019) Pattern-based mining of opinions in q&a websites. In: International conference on software engineering, pp 548–559
Lin Z, Zou Y, Zhao J, Xie B (2017) Improving software text retrieval using conceptual knowledge in source code. In: International conference on automated software engineering, pp 123–134
Ma S, Xing Z, Chen C, Chen C, Qu L, Li G (2021) Easy-to-deploy api extraction by multi-level feature embedding and transfer learning. IEEE Trans Softw Eng 47(10):2296–2311
Article Google Scholar
Maalej W, Robillard MP (2013) Patterns of knowledge in API reference documentation. IEEE Trans Softw Eng 39(9):1264–1282
Article Google Scholar
Manning CD (2008) Introduction to information retrieval. Cambridge University Press, United Kingdom
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Annual conference on neural information processing systems, pp 3111–3119
Nguyen TD, Nguyen AT, Phan HD, Nguyen TN (2017) Exploring API embedding for API usages and applications. In: International conference on software engineering, pp 438–449
Nguyen TV, Tran NM, Phan H, Nguyen TD, Truong LH, Nguyen AT, Nguyen HA, Nguyen TN (2018) Complementing global and local contexts in representing API descriptions to improve API retrieval tasks. In: Joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 551–562
Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 workshop on machine learning for information filtering, pp 61–67
Pan SJ, Tsang IW, Kwok JT, Yang Q (2009) Domain adaptation via transfer component analysis. In: International joint conference on artificial intelligence, pp 1187–1192
Pan SJ, Tsang IW, Kwok JT, Yang Q (2011) Domain adaptation via transfer component analysis. IEEE Trans Neural Netw 22(2):199–210
Article Google Scholar
Parnin C, Treude C, Grammel L, Storey MA (2012) Crowd documentation: Exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech Rep
Google Scholar
Petrosyan G, Robillard MP, De Mori R (2015) Discovering information explaining api types using text classification. In: International conference on software engineering, pp 869–879
Ponzanelli L, Bavota G, Mocci A, Oliveto R, Penta M D, Haiduc S, Russo B, Lanza M (2019) Automatic identification and classification of software development video tutorial fragments. IEEE Trans Softw Eng 45(5):464–488
Article Google Scholar
Raghothaman M, Wei Y, Hamadi Y (2016) SWIM: synthesizing what i mean: code search and idiomatic snippet synthesis. In: International conference on software engineering, pp 357–367
Rahman MM, Roy CK (2015) An insight into the unresolved questions at stack overflow. In: Working conference on mining software repositories, pp 426–429
Rahman MM, Roy CK, Lo D (2016) RACK: automatic API recommendation using crowdsourced knowledge. In: International conference on software analysis, evolution, and reengineering, pp 349–359
Rahman MM, Roy CK, Lo D (2017) RACK: code search in the IDE using crowdsourced knowledge. In: International conference on software engineering, pp 51–54
Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC. Workshop on New Challenges for NLP Frameworks, Citeseer, p 2010
Robillard M P (2009) What makes apis hard to learn? answers from developers. IEEE Softw 26(6):27–34
Article Google Scholar
Robillard MP, Chhetri YB (2015) Recommending reference API documentation. Empir Softw Eng 20(6):1558–1586
Article Google Scholar
Robillard MP, DeLine R (2011) A field study of API learning obstacles. Empir Softw Eng 16(6):703–732
Article Google Scholar
Rubei R, Di Sipio C, Nguyen PT, Di Rocco J, Di Ruscio D (2020) Postfinder: mining stack overflow posts to support software developers. Inf Softw Technol 106367:127
Google Scholar
Smola AJ, Gretton A, Song L, Schölkopf B (2007) A hilbert space embedding for distributions. In: Algorithmic learning theory, 18th international conference, pp 13–31
Subramanian S, Inozemtseva L, Holmes R (2014) Live API documentation. In: International conference on software engineering, pp 643–652
Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C (2018) A survey on deep transfer learning. CoRR:1808.01974
Thung F, Wang S, Lo D, Lawall J (2013) Automatic recommendation of api methods from feature requests. In: International conference on automated software engineering, pp 290–300
Ting KM (2002) An instance-weighting method to induce cost-sensitive trees. IEEE Trans Knowl Data Eng 14(3):659–665
Article MathSciNet Google Scholar
Treude C, Robillard MP (2016) Augmenting API documentation with insights from stack overflow. In: International conference on software engineering, pp 392–403
Treude C, Barzilay O, Storey MD (2011) How do programmers ask and answer questions on the web?. In: International conference on software engineering, pp 804–807
Treude C, Robillard M P, Dagenais B (2015) Extracting development tasks to navigate software documentation. IEEE Trans Softw Eng 41(6):565–581
Article Google Scholar
Uddin G, Khomh F, Roy CK (2020) Mining api usage scenarios from stack overflow. Inf Softw Technol 122:106277
Article Google Scholar
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics bulletin 1(6):80–83
Article Google Scholar
Wu D, Jing XY, Chen H, Zhu X, Zhang H, Zuo M, Zi L, Zhu C (2018) Automatically answering api-related questions. In: International conference on software engineering: companion proceeedings, pp 270–271
Wu D, Jing XY, Zhang H, Kong X, Xie Y, Huang Z (2020) Data-driven approach to application programming interface documentation mining: a review. Wiley Interdiscip Rev Data Min Knowl Disc 10(5):e1369
Google Scholar
Wu D, Jing X, Zhang H, Li B, Xie Y, Xu B (2021a) Generating API tags for tutorial fragments from stack overflow. Empir Softw Eng 26(4):66
Article Google Scholar
Wu D, Jing XY, Zhang H, Zhou Y, Xu B (2021b) Leveraging Stack Overflow to detect relevant tutorial fragments of apis. In: International conference on software analysis, evolution and reengineering, pp 35–46
Xie W, Peng X, Liu M, Treude C, Xing Z, Zhang X, Zhao W, Zimmermann T (2020) API method recommendation via explicit matching of functionality verb phrases. In: Devanbu P, Cohen MB (eds) Joint european software engineering conference and symposium on the foundations of software engineering, pp 1015–1026
Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: International conference on automated software engineering, pp 51–62
Xu B, Xing Z, Xia X, Lo D (2017) Answerbot: automated generation of answer summary to developers’ technical questions. In: International conference on automated software engineering, pp 706–716
Ye X, Shen H, Ma X, Bunescu RC, Liu C (2016) From word embeddings to document similarities for improved information retrieval in software engineering. In: International conference on software engineering, pp 404–415
Zhang F, Niu H, Keivanloo I, Zou Y (2018) Expanding queries for code search using semantically related API class-names. IEEE Trans Softw Eng 44(11):1070–1082
Article Google Scholar
Zhang J, Jiang H, Ren Z, Zhang T, Huang Z (2021) Enriching api documentation with code samples and usage scenarios from crowd knowledge. IEEE Trans Softw Eng 47(6):1299–1314
Article Google Scholar
Zhang N, Huang Q, Xia X, Zou Y, Lo D, Xing Z (2020) Chatbot4qr: interactive query refinement for technical question retrieval. IEEE Trans Softw Eng 48(4):1185–1211
Article Google Scholar
Zhao D, Xing Z, Chen C, Xia X, Li G (2019) Actionnet: vision-based workflow action recognition from programming screencasts. In: International conference on software engineering, pp 350–361

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their constructive comments and suggestions. This work was supported by the NSFC Project under Grant No. 62176069 and 61933013, the Innovation Group of Guangdong Education Department under Grant No. 2020KCXTD014, the 2019 Key Discipline project of Guangdong Province, and Jiangsu Funding Program for Excellent Postdoctoral Talent No. 20220ZB43. Hongyu Zhang is supported by Australian Research Council (ARC) Discovery Project DP220103044.

Author information

Authors and Affiliations

The State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Di Wu, Xiao-Yuan Jing, Yuming Zhou & Baowen Xu
School of Computer, Wuhan University, Wuhan, China
Xiao-Yuan Jing
Guangdong University of Petrochemical Technology, Maoming, China
Xiao-Yuan Jing
School of Information and Physical Sciences, The University of Newcastle, Callaghan, NSW, Australia
Hongyu Zhang

Authors

Di Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiao-Yuan Jing
View author publications
You can also search for this author in PubMed Google Scholar
Hongyu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuming Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Baowen Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao-Yuan Jing.

Additional information

Communicated by: Christoph Treude

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, D., Jing, XY., Zhang, H. et al. Leveraging Stack Overflow to detect relevant tutorial fragments of APIs. Empir Software Eng 28, 12 (2023). https://doi.org/10.1007/s10664-022-10235-1

Download citation

Accepted: 14 August 2022
Published: 25 November 2022
DOI: https://doi.org/10.1007/s10664-022-10235-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

Abstract

Access this article

Similar content being viewed by others

Automatic recognizing relevant fragments of APIs using API references

Generating API tags for tutorial fragments from Stack Overflow

APIReal: an API recognition and linking approach for online developer forums

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging Stack Overflow to detect relevant tutorial fragments of APIs

Abstract

Access this article

Similar content being viewed by others

Automatic recognizing relevant fragments of APIs using API references

Generating API tags for tutorial fragments from Stack Overflow

APIReal: an API recognition and linking approach for online developer forums

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation