How do developers utilize source code from stack overflow?

Wu, Yuhao; Wang, Shaowei; Bezemer, Cor-Paul; Inoue, Katsuro

doi:10.1007/s10664-018-9634-5

How do developers utilize source code from stack overflow?

Published: 04 July 2018

Volume 24, pages 637–673, (2019)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Yuhao Wu¹,
Shaowei Wang ORCID: orcid.org/0000-0003-3823-1771²,
Cor-Paul Bezemer² &
…
Katsuro Inoue¹

2859 Accesses
63 Citations
3 Altmetric
Explore all metrics

Abstract

Technical question and answer Q&A platforms, such as Stack Overflow, provide a platform for users to ask and answer questions about a wide variety of programming topics. These platforms accumulate a large amount of knowledge, including hundreds of thousands lines of source code. Developers can benefit from the source code that is attached to the questions and answers on Q&A platforms by copying or learning from (parts of) it. By understanding how developers utilize source code from Q&A platforms, we can provide insights for researchers which can be used to improve next-generation Q&A platforms to help developers reuse source code fast and easily. In this paper, we first conduct an exploratory study on 289 files from 182 open-source projects, which contain source code that has an explicit reference to a Stack Overflow post. Our goal is to understand how developers utilize code from Q&A platforms and to reveal barriers that may make code reuse more difficult. In 31.5% of the studied files, developers needed to modify source code from Stack Overflow to make it work in their own projects. The degree of required modification varied from simply renaming variables to rewriting the whole algorithm. Developers sometimes chose to implement an algorithm from scratch based on the descriptions from Stack Overflow answers, even if there was an implementation readily available in the post. In 35.5% of the studied files, developers used Stack Overflow posts as an information source for later reference. To further understand the barriers of reusing code and to obtain suggestions for improving the code reuse process on Q&A platforms, we conducted a survey with 453 open-source developers who are also on Stack Overflow. We found that the top 3 barriers that make it difficult for developers to reuse code from Stack Overflow are: (1) too much code modification required to fit in their projects, (2) incomprehensive code, and (3) low code quality. We summarized and analyzed all survey responses and we identified that developers suggest improvements for future Q&A platforms along the following dimensions: code quality, information enhancement & management, data organization, license, and the human factor. For instance, developers suggest to improve the code quality by adding an integrated validator that can test source code online, and an outdated code detection mechanism. Our findings can be used as a roadmap for researchers and developers to improve code reuse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

An Exploratory Study on How Software Reuse is Discussed in Stack Overflow

The reproducibility of programming-related issues in Stack Overflow questions

Article 17 March 2022

Saikat Mondal, Mohammad Masudur Rahman, … Kevin Schneider

Usage and attribution of Stack Overflow code snippets in GitHub projects

Article 01 October 2018

Sebastian Baltes & Stephan Diehl

Notes

References

Abdalkareem R, Shihab E, Rilling J (2017) What do developers use the crowd for? a study using Stack Overflow. IEEE Soft 34(2):53–60
Article Google Scholar
Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions in Stack Overflow. In: Proceedings of the 13th international conference on mining software repositories (MSR), pp 402–412
Almeida DA, Murphy GC, Wilson G, Hoye M (2017) Do software developers understand open source licenses?. In: Proceedings of the 25th international conference on program comprehension (ICPC), pp 1–11. IEEE
Alnusair A, Rawashdeh M, Hossain MA, Alhamid MF (2016) Utilizing semantic techniques for automatic code reuse in software repositories. In: Quality software through reuse and integration, pp 42–62. Springer
An L, Mlouki O, Khomh F, Antoniol G (2017) Stack Overflow: A code laundering platform?. In: Proceedings of the 24th IEEE international conference on software analysis, evolution, and reengineering (SANER), pp 283–293. IEEE
Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: Proceedings of the 22nd international conference on World Wide Web (WWW), pp 95–106. ACM
Armaly A, McMillan C (2016) Pragmatic source code reuse via execution record and replay. J Soft Evolution Process 28(8):642–664
Article Google Scholar
Atwood J (2009) Attribution required – Stack Overflow blog. https://stackoverflow.blog/2009/06/25/attribution-required/. (last visited: Aug 25, 2017)
Azad S, Rigby PC, Guerrouj L (2017) Generating API call rules from version history and stack overflow posts. ACM Trans Softw Eng Methodol (TOSEM) 25(4):29
Article Google Scholar
Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: A search engine for open source code supporting structure-based search. In: Companion to the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications (OOPSLA), pp 681–682. ACM
Barzilay O (2011) Example embedding. In: Proceedings of the 10th SIGPLAN symposium on new ideas, new paradigms, and reflections on programming and software, Onward!, pp 137-144
Bian J, Gao B, Liu T-Y (2014) Knowledge-powered deep learning for word embedding. Springer, Berlin, pp 132–148
Google Scholar
Cavusoglu H, Li Z, Huang K-W (2015) Can gamification motivate voluntary contributions?: The case of StackOverflow Q&A community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, pp 171–174. ACM
Chen C, Gao S, Xing Z (2016) Mining analogical libraries in Q&A discussions - incorporating relational and categorical knowledge into word embedding. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), pp 338–348. IEEE
Chen C, Xing Z, Wang X (2017) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering (ICSE), pp 450–461. IEEE
Cottrell R, Walker RJ, Denzinger J (2008) Semi-automating small-scale source code reuse via structural correspondence. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering (SIGSOFT), pp 214–225. ACM
Feldthaus A, Møller A (2013) Semi-automatic rename refactoring for javascript. In: Proceedings of the 2013 ACM SIGPLAN international conference on object oriented programming systems languages & applications, vol 48, pp 323–338. ACM
Galenson J, Reames P, Bodik R, Hartmann B, Sen K (2014) Codehint: Dynamic and interactive synthesis of code snippets. In: Proceedings of the 36th international conference on software engineering, ICSE, pp 653-663
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: Elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston
MATH Google Scholar
Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 795–798
Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing Q&A sites. In: Proceedings of the 30th international conference on automated software engineering (ASE), pp 307–318
Gharehyazie M, Ray B, Filkov V (2017) Some from here, some from there: Cross-project code reuse in github. In: Proceedings of the 14th international conference on mining software repositories, MSR ’17, pp 291–301
Glaser B (2017) Discovery of grounded theory: Strategies for qualitative research. Routledge
Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering (FSE), pp 631–642. ACM
Gwet K et al (2002) Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series 2:1–9
Google Scholar
Hua L, Kim M, McKinley KS (2015) Does automated refactoring obviate systematic editing?. In: IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 392–402. IEEE
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014a) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101. ACM
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014b) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101
Krumia (2014) Introduce an “obsolete answer” vote. https://meta.stackoverflow.com/questions/272651/introduce-an-obsolete-answer-vote,. (last visited: Aug 25)
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2267–2273. AAAI Press
Liu P, Joty SR, Meng HM (2015) Fine-grained opinion mining with recurrent neural networks and word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), pp 1433–1443. The Association for Computational Linguistics
Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) CodeHow: Effective code search based on API understanding and extended boolean model. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), pp 260–270. IEEE
McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: Finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 111–120
Meng N, Kim M, McKinley KS (2011) Systematic editing: Generating program transformations from an example. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation (PLDI), pages 329–342
Meng N, Kim M, McKinley KS (2013) Lase: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 international conference on software engineering, pp 502–511. IEEE
Nguyen AT, Nguyen TT, Nguyen HA, Tamrawi A, Nguyen HV, Al-Kofahi J, Nguyen TN (2012) Graph-based pattern-oriented, context-sensitive source code completion. In: Proceedings of the 34th international conference on software engineering (ICSE), pp 69–79
Ponzanelli L, Bacchelli A, Lanza M (2013) Leveraging crowd knowledge for software comprehension and development. In: Proceedings of the 17th european conference on software maintenance and reengineering (CSMR), pp 57–66. IEEE
Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014a) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories, pp 102–111. ACM
Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Prompter: A self-confident recommender system. In: ICSME, pp 577–580
Ponzanelli L, Mocci A, Bacchelli A, Lanza M (2014c) Understanding and classifying the quality of technical forum questions. In: Proceedings of the 14th international conference on quality software (QSIC), pp 343–352
Raychev V, Vechev M, Yahav E (2014) Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 419–428
Reja U, Manfreda KL, Hlebec V, Vehovar V (2003) Open-ended vs. close-ended questions in web questionnaires. Developments in Applied Statistics (Metodološ,ki zvezki) 19:159–77
Google Scholar
Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of the 2013 international conference on software engineering (ICSE), pp 832–841. IEEE
Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng (TSE) 25(4):557–572
Article Google Scholar
Seaman CB, Shull F, Regardie M, Elbert D, Feldmann RL, Guo Y, Godfrey S (2008) Defect categorization: making use of a decade of widely varying historical data. In: Proceedings of the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement, pp 149–157. ACM
Searchcode (2016a) Searchcode - API. https://searchcode.com/api/. (last visited: Aug 25, 2017)
Searchcode (2016b) Searchcode - Homepage. https://searchcode.com/. (last visited: Aug 25, 2017)
Sillito J, Maurer F, Nasehi SM, Burns C (2012) What makes a good code example?: A study of programming Q&A in StackOverflow. In: Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), pp 25–34
Stack Exchange (2015) The MIT license — clarity on using code on stack overflow and stack exchange. https://meta.stackexchange.com/q/271080/337948,. (last visited: Aug 25, 2017)
Stack Exchange (2017) All sites - Stack Exchange. https://stackexchange.com/sites,. (last visited: Aug 25, 2017)
Stack Overflow (2014) Feedback requested: Runnable code snippets in questions and answers. https://meta.stackoverflow.com/questions/269753/feedback-requested-runnable-code-snippets-in-questions-and-answers. (last visited: Aug 25, 2017)
Stack Overflow (2016) Stack Overflow developer survey results 2016. http://stackoverflow.com/research/developer-survey-2016,. (last visited: Aug 25, 2017)
Stack Overflow (2017) Stack Overflow - Homepage. https://stackoverflow.com/,. (last visited: Aug 25, 2017)
Treude C, Robillard MP (2016) Augmenting API documentation with insights from Stack Overflow. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 392–403. ACM
Treude C, Robillard MP (2017) Understanding stack overflow code fragments. In: 2017 IEEE international conference on software maintenance and evolution, ICSME 2017, Shanghai, China, September 17-22, pp 509-513
Treude C, Barzilay O, Storey M-A (2011) How do programmers ask and answer questions on the web? (NIER track). In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 804–807
Vasilescu B, Filkov V, Serebrenik A (2013) StackOverflow and GitHub: Associations between software development and crowdsourced knowledge. In: Proceedings of 2013 international conference on social computing (SocialCom), pp 188–195. IEEE
Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: A rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 783–792
Wang S, Lo D, Jiang L (2014a) Active code search: Incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering (ASE), pp 677–682
Wang S, Lo D, Vasilescu B, Serebrenik A (2014b) EnTagRec: An enhanced tag recommendation system for software information sites. In: Proceedings of the 2014 IEEE international conference on software maintenance and evolution (ICSME), pp 291–300
Wang S, Lo D, Jiang L (2016a) Autoquery: automatic construction of dependency queries for code search. Autom Softw Eng 23(3):393–425
Article Google Scholar
Wang S, Lo D, Vasilescu B, Serebrenik A (2017a) EnTagRec ++: An enhanced tag recommendation system for software information sites. Empirical Software Engineering
Wang S, Chen T.-H., Hassan AE (2017b) Understanding the factors for fast answers in technical Q&A websites, Empirical Software Engineering, pp 1–42
Wang X, Pollock LL, Vijay-Shanker K (2014c) Automatic segmentation of method code into meaningful blocks: Design and evaluation. J Soft Evolution Process 26(1):27–49
Article Google Scholar
Wang X, Pollock LL, Vijay-Shanker K (2017) Automatically generating natural language descriptions for object-related statement sequences. In: IEEE 24th international conference on software analysis, evolution and reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, pp 205–216
Wang Y, Feng Y, Martins R, Kaushik A, Dillig I, Reiss SP (2016b) Hunter: next-generation code reuse for java. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 1028–1032. ACM
Wang Z, Hamza W, Florian R (2017d) Bilateral multi-perspective matching for natural language sentences. arXiv:1702.03814
Wong T-L, Lam W, Wong T-S (2008) An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In: Proceedings of the 31st annual international acm sigir conference on research and development in information retrieval (SIGIR), pp 35–42
Wu Y, Wang S, Bezemer C-P, Inoue K (2017) Online appendix of manuscript ”How Do Developers Utilize Source Code from Stack Overflow?”. https://zenodo.org/record/1116508
Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empirical Software Engineering
Xin X, Lingfeng B, David L, Zhenchang X, Ahmed EH, Shanping L (2017) Measuring program comprehension: A large-scale field study with professionals. IEEE Trans Softw Eng (TSE) 99(26):1–1
Google Scholar
Yellin DM, Strom RE (1997) Protocol specifications and component adaptors. ACM Trans Program Lang Syst (TOPLAS) 19(2):292–333
Article Google Scholar
Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv:1704.01696
Yu J, Zha Z-J, Wang M, Chua T-S (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - vol 1, pp 1496–1505
Zagalsky A, German DM, Storey M-A, Teshima CG, Poo-Caamaño G (2017) How the R community creates and curates knowledge: an extended study of Stack Overflow and mailing lists. Empirical Software Engineering
Zhang WE, Sheng QZ, Lau JH, Abebe E (2017) Detecting duplicate posts in programming qa communities via latent semantics and association rules. In: Proceedings of the 26th international conference on World Wide Web (WWW), pp 1221–1229
Zhang Y, Lo D, Xia X, Sun J-L (2015) Multi-factor duplicate question detection in Stack Overflow. J Comput Sci Technol 30(5):981–997
Article Google Scholar
Zhao L, Li C (2009) Ontology based opinion mining for movie reviews. Springer, Berlin, pp 204–214
Google Scholar
Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: Proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER), pp 272–282. IEEE

Download references

Author information

Authors and Affiliations

Graduate School of Information Science and Technology, Osaka University, Suita, Japan
Yuhao Wu & Katsuro Inoue
SAIL, Queen’s University, Kingston, ON, Canada
Shaowei Wang & Cor-Paul Bezemer

Authors

Yuhao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shaowei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Cor-Paul Bezemer
View author publications
You can also search for this author in PubMed Google Scholar
Katsuro Inoue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shaowei Wang.

Additional information

Communicated by: Emerson Murphy-Hill

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Below are the questions and options in our online survey. Single-selection options are marked with circle marks in front; multi-selection options are marked with box marks in front. When participants choose the option “Other”, they are allowed to input a free text as an additional answer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Wang, S., Bezemer, CP. et al. How do developers utilize source code from stack overflow?. Empir Software Eng 24, 637–673 (2019). https://doi.org/10.1007/s10664-018-9634-5

Download citation

Published: 04 July 2018
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10664-018-9634-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How do developers utilize source code from stack overflow?

Abstract

Access this article

Similar content being viewed by others

An Exploratory Study on How Software Reuse is Discussed in Stack Overflow

The reproducibility of programming-related issues in Stack Overflow questions

Usage and attribution of Stack Overflow code snippets in GitHub projects

Notes

References