Understanding stack overflow code quality: A recommendation of caution

https://doi.org/10.1016/j.scico.2020.102516Get rights and content

Abstract

Community Question and Answer (CQA) platforms use the power of online groups to solve problems, or gain information. While these websites host useful information, it is critical that the details provided on these platforms are of high quality, and that users can trust the information. This is particularly necessary for software development, given the ubiquitous use of software across all sections of contemporary society. Stack Overflow is the leading CQA platform for programmers, with a community comprising over 10 million contributors. While research confirms the popularity of Stack Overflow, concerns have been raised about the quality of answers that are provided to questions on Stack Overflow. Code snippets often contained in these answers have been investigated; however, the quality of these artefacts remains unclear. This could be problematic for the software engineering community, as evidence has shown that Stack Overflow snippets are frequently used in both open source and commercial software. This research fills this gap by evaluating the quality of code snippets on Stack Overflow. We explored various aspects of code snippet quality, including reliability and conformance to programming rules, readability, performance and security. Outcomes show variation in the quality of Stack Overflow code snippets for the different dimensions; however, overall, quality issues in Stack Overflow snippets were not always severe. Vigilance is encouraged for those reusing Stack Overflow code snippets.

Introduction

Community Question and Answer (CQA) platforms facilitate the use of the power of the crowd (i.e., online communities) to solve problems [32]. Platforms such as Stack Overflow and Yahoo!Answers1 provide a service that benefits those who look to the internet to answer questions or find particular information [56]. In particular, such platforms benefit software practitioners seeking information, as it is likely that many other people have faced a similar problem, and so, a relevant question may have already been asked that has invoked a suitable answer. On the other hand, these platforms also allow new questions to be created, and experts can lend their specific experience, which allows them to solve a problem and gain the respect of their peers in the community.

While these websites host useful information, it is critical that the information provided on these platforms is of good quality, and that users can trust the information. This justifies a research agenda to afford this form of quality assurance. However, as this research area is still developing, even the terminologies used to identify such platforms are inconsistent. Srba and Bielikova [61] found that multiple terms are used for referring to CQA platforms. For example, while this paper refers to the types of community question and answer platforms as CQA, it is common for such platforms to be labelled Q&A, social Q&A and community forums [55], [57]. Beyond the terms used to identify CQA portals, Krüger et al. [41] conducted a secondary study of CQA papers and suggest that the quality of questions and answers is a central challenge of CQA systems. In addition, Srba and Bielikova [61] completed a CQA survey and noted preservation of long-term sustainability as a key area for future research of these platforms. For CQA platforms to have long-term sustainability, however, establishing the quality of these sites is key to ensuring that users trust the platform and feel that they can continue to participate. Other approaches for encouraging sustainability of CQA platforms in inspiring user participation are to provide tools and employ gamification techniques [33].

Stack Overflow has been noted as a successful platform due to its user participation, suggesting that its employment of gamification techniques indeed contributes to its prominence [16]. However, there are questions surrounding the quality of answers on Stack Overflow [25], [64]. In addition, given that developers use many of the code snippets in posts on Stack Overflow during development [62], it is important to evaluate the quality of these artefacts. In particular, while the Stack Overflow community's collective surveillance may help to identify and improve errors in code snippets on this platform and users may appropriately use code snippets by adapting them to their specific problem/task, this is not always the case. Wu et al. [68] examined how Stack Overflow code snippets were used in open-source projects and found that only 44% of the files containing Stack Overflow snippets showed that these snippets were modified prior to reuse by software developers. In fact, Bi [11] has shown that even the throwing/catching of generic exceptions is missed by software developers reusing Stack Overflow code snippets. We thus set out to understand the quality of code snippets that are often provided in Stack Overflow posts. The findings of this research could be helpful for directing future research on Stack Overflow, by facilitating investigations aimed at providing mechanisms to further scrutinise the aspects of quality that Stack Overflow code snippets do not meet. Additionally, developers selecting code snippets from Stack Overflow during software development will have a better understanding of the potential limitations of the code they use.

However, addressing the objective of this research project requires code snippet quality to be defined. This is done by assessing research and considering code snippet quality in relation to well established and understood software quality measurements [35]. This has led to our consideration of code reliability and conformance to programming rules, readability, performance and security (refer to Section 2.1 for discussion on this issue). Given our definition of quality, consisting of multiple dimensions, code snippets are extracted from Stack Overflow and analysed against these criteria. We then provide results at multiple levels of granularity, covering all violations (or errors), snippet specific violations and qualitative analysis assessing the implications for the presence of violations, and Stack Overflow community's efforts towards addressing these. The outcomes provided in this work are a survey of Stack Overflow code snippets' violations against these quality dimensions, as well as the types of violation that are evident in code provided by contributors. We also outline the basis for how the software development community may craft an agenda towards maintaining software quality, notwithstanding the utility that Stack Overflow provides.

The remaining sections of this study are structured as follows. Section 2 considers the literature relating to Stack Overflow, and outlines subsequent research questions. In Section 3, the proposed methodology of this research is introduced and discussed. This leads to Section 4, which presents the results of the study in relation to the associated research questions. Section 5 provides a discussion of the results along with the implications. The threats of the research are next discussed in Section 6. Finally, Section 7 concludes the work and outlines potential areas for future research.

Section snippets

Background and research questions

In order to achieve the objective of this research, literature related to code quality on Stack Overflow is reviewed in Section 2.1, with particular emphasis on understanding works that have evaluated code. Thereafter, we synthesise the related literature to identify gaps and outline our research questions in Section 2.2.

Methodology

In line with the open nature of our overarching question, what is the quality of code snippets provided in answers on Stack Overflow?, we employ an exploratory tone to our analyses, starting with an exhaustive body of quantitative analysis before performing deeper qualitative analysis. This later analysis considers implications for quality violations, and Stack Overflow community's effort towards ensuring contributors' awareness of potential code quality shortcomings. We provide further details

Results

In Fig. 6, we visualise the general patterns of outcomes for violations that were found by the PMD, Checkstyle and FindBugs tools (excluding those violations that were introduced (during pre-processing) by the class declaration wrapper or code snippet unique identifier file name). As seen in Fig. 3, the results in Fig. 6 show that the code snippets that produce errors and the code snippets that do not produce errors have somewhat similar properties (refer to Fig. 6.a). The majority of the code

Discussion and implications

Q&A portals such as Stack Overflow are now central to the way software developers address their knowledge deficiencies when creating software [26]. Thus, there is increasing interest and drive to understand the quality of the content that is provided on such portals [25]. We set out to understand the quality of code snippets that are often provided in Stack Overflow posts. In order to do so, we operationalise code snippet quality along the dimensions of reliability and conformance to

Threats

The tools used to assess Stack Overflow code snippets have certain limitations. However, these tools are accepted by the software engineering community [34], and have also been validated by academic research [7]. For instance, Ayewah and Pugh [7] evaluated FindBugs using a dataset from Google, finding that this tool was very effective at detecting bugs, and its use was considered to be beneficial for saving time and money for developers. That said, we performed multiple rounds of manual checks

Conclusion and future research

In this study we set out to answer the overarching question: what is the quality of code snippets provided in answers on Stack Overflow? We observed that while studies have expressed reservation around the quality of the content provided in Stack Overflow posts, there has been limited effort aimed at evaluating the quality of code snippets on this platform. This is undesirable, as evidence has shown that this platform is used heavily by developers for solving problems during software

Declaration of Competing Interest

The authors whose names are listed immediately above certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in

Acknowledgements

We thank Stack Overflow for granting us access to the data that were analysed in this study. Thanks to the reviewers for their detailed and insightful comments on the early version of this work. This work is funded by a University of Otago Commerce Research Grant Award (CRG-2019) — accessed through the Otago Business School Research Committee.

References (71)

  • N. Ayewah et al.

    The Google findbugs fixit

  • T. Bakota et al.

    A probabilistic software quality model

  • S. Baltes et al.

    Attribution required: stack overflow code snippets in GitHub projects

  • V. Bauer et al.

    An exploratory study on reuse at Google

  • F. Bi

    Nissan app developer busted for copying code from Stack Overflow

  • R.P. Buse et al.

    Learning a metric for code readability

    IEEE Trans. Softw. Eng.

    (2010)
  • R.P. Buse et al.

    A metric for software readability

  • U. Campos et al.

    Mining rule violations in JavaScript code snippets

  • CAST

    Software Intelligence for digital leaders

  • H. Cavusoglu et al.

    Can gamification motivate voluntary contributions? The case of StackOverflow Q&A community

  • F. Chen et al.

    Crowd debugging

  • J. Cohen

    Statistical Power Analysis for the Behavioral Sciences

    (2013)
  • W. Cunningham

    The WyCash portfolio management system

    ACM SIGPLAN OOPS Messenger

    (1993)
  • M. di Biase et al.

    The delta maintainability model: measuring maintainability of fine-grained code changes

  • M. Duijn et al.

    Quality questions need quality code: classifying code fragments on stack overflow

  • S. Ercan et al.

    Predicting answering times on stack overflow

  • N.A. Ernst et al.

    Measure it? Manage it? Ignore it? Software practitioners and technical debt

  • F. Fischer et al.

    Stack overflow considered harmful? The impact of copy&paste on Android application security

  • A.L. Ginsca et al.

    User profiling for answer quality assessment in Q&A communities

  • R. Gupta et al.

    Learning from gurus: analysis and modeling of reopened questions on stack overflow

  • S. Haefliger et al.

    Code reuse in open source software

    Manag. Sci.

    (2008)
  • I. Heitlager, A practical model for measuring maintainability – a preliminary...
  • O.R. Holsti

    Content Analysis for the Social Sciences and Humanities

    (1969)
  • J. Holvitie et al.

    Co-existence of the ‘Technical Debt’ and ‘Software Legacy’ concepts

  • Y. Jin et al.

    Quick trigger on stack overflow: a study of gamification-influenced member tendencies

  • Cited by (21)

    • How have views on Software Quality differed over time? Research and practice viewpoints

      2023, Journal of Systems and Software
      Citation Excerpt :

      They claim that reusing code from knowledge-sharing communities boosts productivity and increases project quality. Such a view may be deemed contentious, however, especially when considering that Stack Overflow code may possess quality concerns (Meldrum et al., 2020a). What are the programming languages popularly used to evaluate code snippet quality?

    • How are framework code samples maintained and used by developers? The case of Android and Spring Boot

      2022, Journal of Systems and Software
      Citation Excerpt :

      They assess using regular expressions and manual checks. Meldrum et al. (2020) evaluate the quality of code snippets on Stack Overflow, exploring aspects as reliability and conformance to programming rules, readability, performance, and security. Finally, studies are analyzing the adoption of code snippets (Heinemann et al., 2011; Roy and Cordy, 2010; Yang et al., 2017).

    • Recency and quality-based ranking question in CQAs: A Stack Overflow case study

      2021, Information Processing and Management
      Citation Excerpt :

      Elalfy et al. (2018) proposed a model that considers content features consisting of question–answer features, answer content features, answer–answer features, and non-content features extracted from user profiles (confidence and expertise level). The proposal described in (Meldrum, Licorish, Owen, & Savarimuthu, 2020) evaluates the quality of code snippets on Stack Overflow, exploring aspects of code snippet quality, such as features of reliability and conformance to programming rules, readability, performance, and security. Some works use multi-features-based approaches for predicting or ranking answers in CQA systems (Burel, He, & Alani, 2012; Dalip et al., 2017; Fu, Wu, & Oh, 2015; Gkotsis, Stepanyan, Pedrinaci, Domingue, & Liakata, 2014; Liu, Feng, Liu, Hu, & Wang, 2015; YueLiu, Tang, FeiCai, Ren, & Sun, 2019). (

    View all citing articles on Scopus
    View full text