Skip to main content

Advertisement

Log in

How do developers utilize source code from stack overflow?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Technical question and answer Q&A platforms, such as Stack Overflow, provide a platform for users to ask and answer questions about a wide variety of programming topics. These platforms accumulate a large amount of knowledge, including hundreds of thousands lines of source code. Developers can benefit from the source code that is attached to the questions and answers on Q&A platforms by copying or learning from (parts of) it. By understanding how developers utilize source code from Q&A platforms, we can provide insights for researchers which can be used to improve next-generation Q&A platforms to help developers reuse source code fast and easily. In this paper, we first conduct an exploratory study on 289 files from 182 open-source projects, which contain source code that has an explicit reference to a Stack Overflow post. Our goal is to understand how developers utilize code from Q&A platforms and to reveal barriers that may make code reuse more difficult. In 31.5% of the studied files, developers needed to modify source code from Stack Overflow to make it work in their own projects. The degree of required modification varied from simply renaming variables to rewriting the whole algorithm. Developers sometimes chose to implement an algorithm from scratch based on the descriptions from Stack Overflow answers, even if there was an implementation readily available in the post. In 35.5% of the studied files, developers used Stack Overflow posts as an information source for later reference. To further understand the barriers of reusing code and to obtain suggestions for improving the code reuse process on Q&A platforms, we conducted a survey with 453 open-source developers who are also on Stack Overflow. We found that the top 3 barriers that make it difficult for developers to reuse code from Stack Overflow are: (1) too much code modification required to fit in their projects, (2) incomprehensive code, and (3) low code quality. We summarized and analyzed all survey responses and we identified that developers suggest improvements for future Q&A platforms along the following dimensions: code quality, information enhancement & management, data organization, license, and the human factor. For instance, developers suggest to improve the code quality by adding an integrated validator that can test source code online, and an outdated code detection mechanism. Our findings can be used as a roadmap for researchers and developers to improve code reuse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Listing 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6
Listing 7
Figure 6
Listing 8
Figure 7
Figure 8
Figure 9
Figure 10

Similar content being viewed by others

Notes

  1. https://goo.gl/X84SFi

  2. https://creativecommons.org/licenses/by-sa/3.0/

  3. https://goo.gl/KKbPWk

  4. https://goo.gl/aC4auZ

  5. https://goo.gl/gzMCgy

  6. https://goo.gl/8foVXq

References

  • Abdalkareem R, Shihab E, Rilling J (2017) What do developers use the crowd for? a study using Stack Overflow. IEEE Soft 34(2):53–60

    Article  Google Scholar 

  • Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions in Stack Overflow. In: Proceedings of the 13th international conference on mining software repositories (MSR), pp 402–412

  • Almeida DA, Murphy GC, Wilson G, Hoye M (2017) Do software developers understand open source licenses?. In: Proceedings of the 25th international conference on program comprehension (ICPC), pp 1–11. IEEE

  • Alnusair A, Rawashdeh M, Hossain MA, Alhamid MF (2016) Utilizing semantic techniques for automatic code reuse in software repositories. In: Quality software through reuse and integration, pp 42–62. Springer

  • An L, Mlouki O, Khomh F, Antoniol G (2017) Stack Overflow: A code laundering platform?. In: Proceedings of the 24th IEEE international conference on software analysis, evolution, and reengineering (SANER), pp 283–293. IEEE

  • Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: Proceedings of the 22nd international conference on World Wide Web (WWW), pp 95–106. ACM

  • Armaly A, McMillan C (2016) Pragmatic source code reuse via execution record and replay. J Soft Evolution Process 28(8):642–664

    Article  Google Scholar 

  • Atwood J (2009) Attribution required – Stack Overflow blog. https://stackoverflow.blog/2009/06/25/attribution-required/. (last visited: Aug 25, 2017)

  • Azad S, Rigby PC, Guerrouj L (2017) Generating API call rules from version history and stack overflow posts. ACM Trans Softw Eng Methodol (TOSEM) 25(4):29

    Article  Google Scholar 

  • Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: A search engine for open source code supporting structure-based search. In: Companion to the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications (OOPSLA), pp 681–682. ACM

  • Barzilay O (2011) Example embedding. In: Proceedings of the 10th SIGPLAN symposium on new ideas, new paradigms, and reflections on programming and software, Onward!, pp 137-144

  • Bian J, Gao B, Liu T-Y (2014) Knowledge-powered deep learning for word embedding. Springer, Berlin, pp 132–148

    Google Scholar 

  • Cavusoglu H, Li Z, Huang K-W (2015) Can gamification motivate voluntary contributions?: The case of StackOverflow Q&A community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, pp 171–174. ACM

  • Chen C, Gao S, Xing Z (2016) Mining analogical libraries in Q&A discussions - incorporating relational and categorical knowledge into word embedding. In: IEEE 23rd international conference on software analysis, evolution, and reengineering (SANER), pp 338–348. IEEE

  • Chen C, Xing Z, Wang X (2017) Unsupervised software-specific morphological forms inference from informal discussions. In: Proceedings of the 39th international conference on software engineering (ICSE), pp 450–461. IEEE

  • Cottrell R, Walker RJ, Denzinger J (2008) Semi-automating small-scale source code reuse via structural correspondence. In: Proceedings of the 16th ACM SIGSOFT international symposium on foundations of software engineering (SIGSOFT), pp 214–225. ACM

  • Feldthaus A, Møller A (2013) Semi-automatic rename refactoring for javascript. In: Proceedings of the 2013 ACM SIGPLAN international conference on object oriented programming systems languages & applications, vol 48, pp 323–338. ACM

  • Galenson J, Reames P, Bodik R, Hartmann B, Sen K (2014) Codehint: Dynamic and interactive synthesis of code snippets. In: Proceedings of the 36th international conference on software engineering, ICSE, pp 653-663

  • Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: Elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co., Inc., Boston

    MATH  Google Scholar 

  • Ganguly D, Roy D, Mitra M, Jones GJ (2015) Word embedding based generalized language model for information retrieval. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval (SIGIR), pp 795–798

  • Gao Q, Zhang H, Wang J, Xiong Y, Zhang L, Mei H (2015) Fixing recurring crash bugs via analyzing Q&A sites. In: Proceedings of the 30th international conference on automated software engineering (ASE), pp 307–318

  • Gharehyazie M, Ray B, Filkov V (2017) Some from here, some from there: Cross-project code reuse in github. In: Proceedings of the 14th international conference on mining software repositories, MSR ’17, pp 291–301

  • Glaser B (2017) Discovery of grounded theory: Strategies for qualitative research. Routledge

  • Gu X, Zhang H, Zhang D, Kim S (2016) Deep API learning. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering (FSE), pp 631–642. ACM

  • Gwet K et al (2002) Inter-rater reliability: dependency on trait prevalence and marginal homogeneity. Statistical Methods for Inter-Rater Reliability Assessment Series 2:1–9

    Google Scholar 

  • Hua L, Kim M, McKinley KS (2015) Does automated refactoring obviate systematic editing?. In: IEEE/ACM 37th IEEE international conference on software engineering (ICSE), vol 1, pp 392–402. IEEE

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014a) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101. ACM

  • Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014b) The promises and perils of mining GitHub. In: Proceedings of the 11th working conference on mining software repositories (MSR), pp 92–101

  • Krumia (2014) Introduce an “obsolete answer” vote. https://meta.stackoverflow.com/questions/272651/introduce-an-obsolete-answer-vote,. (last visited: Aug 25)

  • Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI conference on artificial intelligence, pp 2267–2273. AAAI Press

  • Liu P, Joty SR, Meng HM (2015) Fine-grained opinion mining with recurrent neural networks and word embeddings. In: Proceedings of the 2015 conference on empirical methods in natural language processing (EMNLP), pp 1433–1443. The Association for Computational Linguistics

  • Lv F, Zhang H, Lou J-G, Wang S, Zhang D, Zhao J (2015) CodeHow: Effective code search based on API understanding and extended boolean model. In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), pp 260–270. IEEE

  • McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C (2011) Portfolio: Finding relevant functions and their usage. In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 111–120

  • Meng N, Kim M, McKinley KS (2011) Systematic editing: Generating program transformations from an example. In: Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation (PLDI), pages 329–342

  • Meng N, Kim M, McKinley KS (2013) Lase: locating and applying systematic edits by learning from examples. In: Proceedings of the 2013 international conference on software engineering, pp 502–511. IEEE

  • Nguyen AT, Nguyen TT, Nguyen HA, Tamrawi A, Nguyen HV, Al-Kofahi J, Nguyen TN (2012) Graph-based pattern-oriented, context-sensitive source code completion. In: Proceedings of the 34th international conference on software engineering (ICSE), pp 69–79

  • Ponzanelli L, Bacchelli A, Lanza M (2013) Leveraging crowd knowledge for software comprehension and development. In: Proceedings of the 17th european conference on software maintenance and reengineering (CSMR), pp 57–66. IEEE

  • Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014a) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories, pp 102–111. ACM

  • Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Prompter: A self-confident recommender system. In: ICSME, pp 577–580

  • Ponzanelli L, Mocci A, Bacchelli A, Lanza M (2014c) Understanding and classifying the quality of technical forum questions. In: Proceedings of the 14th international conference on quality software (QSIC), pp 343–352

  • Raychev V, Vechev M, Yahav E (2014) Code completion with statistical language models. In: Proceedings of the 35th ACM SIGPLAN conference on programming language design and implementation (PLDI), pp 419–428

  • Reja U, Manfreda KL, Hlebec V, Vehovar V (2003) Open-ended vs. close-ended questions in web questionnaires. Developments in Applied Statistics (Metodološ,ki zvezki) 19:159–77

    Google Scholar 

  • Rigby PC, Robillard MP (2013) Discovering essential code elements in informal documentation. In: Proceedings of the 2013 international conference on software engineering (ICSE), pp 832–841. IEEE

  • Seaman CB (1999) Qualitative methods in empirical studies of software engineering. IEEE Trans Softw Eng (TSE) 25(4):557–572

    Article  Google Scholar 

  • Seaman CB, Shull F, Regardie M, Elbert D, Feldmann RL, Guo Y, Godfrey S (2008) Defect categorization: making use of a decade of widely varying historical data. In: Proceedings of the 2nd ACM-IEEE international symposium on Empirical software engineering and measurement, pp 149–157. ACM

  • Searchcode (2016a) Searchcode - API. https://searchcode.com/api/. (last visited: Aug 25, 2017)

  • Searchcode (2016b) Searchcode - Homepage. https://searchcode.com/. (last visited: Aug 25, 2017)

  • Sillito J, Maurer F, Nasehi SM, Burns C (2012) What makes a good code example?: A study of programming Q&A in StackOverflow. In: Proceedings of the 2012 IEEE international conference on software maintenance (ICSM), pp 25–34

  • Stack Exchange (2015) The MIT license — clarity on using code on stack overflow and stack exchange. https://meta.stackexchange.com/q/271080/337948,. (last visited: Aug 25, 2017)

  • Stack Exchange (2017) All sites - Stack Exchange. https://stackexchange.com/sites,. (last visited: Aug 25, 2017)

  • Stack Overflow (2014) Feedback requested: Runnable code snippets in questions and answers. https://meta.stackoverflow.com/questions/269753/feedback-requested-runnable-code-snippets-in-questions-and-answers. (last visited: Aug 25, 2017)

  • Stack Overflow (2016) Stack Overflow developer survey results 2016. http://stackoverflow.com/research/developer-survey-2016,. (last visited: Aug 25, 2017)

  • Stack Overflow (2017) Stack Overflow - Homepage. https://stackoverflow.com/,. (last visited: Aug 25, 2017)

  • Treude C, Robillard MP (2016) Augmenting API documentation with insights from Stack Overflow. In: Proceedings of the 38th international conference on software engineering (ICSE), pp 392–403. ACM

  • Treude C, Robillard MP (2017) Understanding stack overflow code fragments. In: 2017 IEEE international conference on software maintenance and evolution, ICSME 2017, Shanghai, China, September 17-22, pp 509-513

  • Treude C, Barzilay O, Storey M-A (2011) How do programmers ask and answer questions on the web? (NIER track). In: Proceedings of the 33rd international conference on software engineering (ICSE), pp 804–807

  • Vasilescu B, Filkov V, Serebrenik A (2013) StackOverflow and GitHub: Associations between software development and crowdsourced knowledge. In: Proceedings of 2013 international conference on social computing (SocialCom), pp 188–195. IEEE

  • Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: A rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 783–792

  • Wang S, Lo D, Jiang L (2014a) Active code search: Incorporating user feedback to improve code search relevance. In: Proceedings of the 29th ACM/IEEE international conference on automated software engineering (ASE), pp 677–682

  • Wang S, Lo D, Vasilescu B, Serebrenik A (2014b) EnTagRec: An enhanced tag recommendation system for software information sites. In: Proceedings of the 2014 IEEE international conference on software maintenance and evolution (ICSME), pp 291–300

  • Wang S, Lo D, Jiang L (2016a) Autoquery: automatic construction of dependency queries for code search. Autom Softw Eng 23(3):393–425

    Article  Google Scholar 

  • Wang S, Lo D, Vasilescu B, Serebrenik A (2017a) EnTagRec ++: An enhanced tag recommendation system for software information sites. Empirical Software Engineering

  • Wang S, Chen T.-H., Hassan AE (2017b) Understanding the factors for fast answers in technical Q&A websites, Empirical Software Engineering, pp 1–42

  • Wang X, Pollock LL, Vijay-Shanker K (2014c) Automatic segmentation of method code into meaningful blocks: Design and evaluation. J Soft Evolution Process 26(1):27–49

    Article  Google Scholar 

  • Wang X, Pollock LL, Vijay-Shanker K (2017) Automatically generating natural language descriptions for object-related statement sequences. In: IEEE 24th international conference on software analysis, evolution and reengineering, SANER 2017, Klagenfurt, Austria, February 20-24, pp 205–216

  • Wang Y, Feng Y, Martins R, Kaushik A, Dillig I, Reiss SP (2016b) Hunter: next-generation code reuse for java. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, pp 1028–1032. ACM

  • Wang Z, Hamza W, Florian R (2017d) Bilateral multi-perspective matching for natural language sentences. arXiv:1702.03814

  • Wong T-L, Lam W, Wong T-S (2008) An unsupervised framework for extracting and normalizing product attributes from multiple web sites. In: Proceedings of the 31st annual international acm sigir conference on research and development in information retrieval (SIGIR), pp 35–42

  • Wu Y, Wang S, Bezemer C-P, Inoue K (2017) Online appendix of manuscript ”How Do Developers Utilize Source Code from Stack Overflow?”. https://zenodo.org/record/1116508

  • Xia X, Bao L, Lo D, Kochhar PS, Hassan AE, Xing Z (2017) What do developers search for on the web? Empirical Software Engineering

  • Xin X, Lingfeng B, David L, Zhenchang X, Ahmed EH, Shanping L (2017) Measuring program comprehension: A large-scale field study with professionals. IEEE Trans Softw Eng (TSE) 99(26):1–1

    Google Scholar 

  • Yellin DM, Strom RE (1997) Protocol specifications and component adaptors. ACM Trans Program Lang Syst (TOPLAS) 19(2):292–333

    Article  Google Scholar 

  • Yin P, Neubig G (2017) A syntactic neural model for general-purpose code generation. arXiv:1704.01696

  • Yu J, Zha Z-J, Wang M, Chua T-S (2011) Aspect ranking: Identifying important product aspects from online consumer reviews. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - vol 1, pp 1496–1505

  • Zagalsky A, German DM, Storey M-A, Teshima CG, Poo-Caamaño G (2017) How the R community creates and curates knowledge: an extended study of Stack Overflow and mailing lists. Empirical Software Engineering

  • Zhang WE, Sheng QZ, Lau JH, Abebe E (2017) Detecting duplicate posts in programming qa communities via latent semantics and association rules. In: Proceedings of the 26th international conference on World Wide Web (WWW), pp 1221–1229

  • Zhang Y, Lo D, Xia X, Sun J-L (2015) Multi-factor duplicate question detection in Stack Overflow. J Comput Sci Technol 30(5):981–997

    Article  Google Scholar 

  • Zhao L, Li C (2009) Ontology based opinion mining for movie reviews. Springer, Berlin, pp 204–214

    Google Scholar 

  • Zhou P, Liu J, Yang Z, Zhou G (2017) Scalable tag recommendation for software information sites. In: Proceedings of the 24th international conference on software analysis, evolution and reengineering (SANER), pp 272–282. IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaowei Wang.

Additional information

Communicated by: Emerson Murphy-Hill

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Below are the questions and options in our online survey. Single-selection options are marked with circle marks in front; multi-selection options are marked with box marks in front. When participants choose the option “Other”, they are allowed to input a free text as an additional answer.

figure s
figure t

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Wang, S., Bezemer, CP. et al. How do developers utilize source code from stack overflow?. Empir Software Eng 24, 637–673 (2019). https://doi.org/10.1007/s10664-018-9634-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-018-9634-5

Keywords

Navigation