Understanding stack overflow code quality: A recommendation of caution

doi:10.1016/j.scico.2020.102516

Science of Computer Programming

Volume 199, 1 November 2020, 102516

https://doi.org/10.1016/j.scico.2020.102516 Get rights and content

Abstract

Community Question and Answer (CQA) platforms use the power of online groups to solve problems, or gain information. While these websites host useful information, it is critical that the details provided on these platforms are of high quality, and that users can trust the information. This is particularly necessary for software development, given the ubiquitous use of software across all sections of contemporary society. Stack Overflow is the leading CQA platform for programmers, with a community comprising over 10 million contributors. While research confirms the popularity of Stack Overflow, concerns have been raised about the quality of answers that are provided to questions on Stack Overflow. Code snippets often contained in these answers have been investigated; however, the quality of these artefacts remains unclear. This could be problematic for the software engineering community, as evidence has shown that Stack Overflow snippets are frequently used in both open source and commercial software. This research fills this gap by evaluating the quality of code snippets on Stack Overflow. We explored various aspects of code snippet quality, including reliability and conformance to programming rules, readability, performance and security. Outcomes show variation in the quality of Stack Overflow code snippets for the different dimensions; however, overall, quality issues in Stack Overflow snippets were not always severe. Vigilance is encouraged for those reusing Stack Overflow code snippets.

Introduction

Community Question and Answer (CQA) platforms facilitate the use of the power of the crowd (i.e., online communities) to solve problems [32]. Platforms such as Stack Overflow and Yahoo!Answers¹ provide a service that benefits those who look to the internet to answer questions or find particular information [56]. In particular, such platforms benefit software practitioners seeking information, as it is likely that many other people have faced a similar problem, and so, a relevant question may have already been asked that has invoked a suitable answer. On the other hand, these platforms also allow new questions to be created, and experts can lend their specific experience, which allows them to solve a problem and gain the respect of their peers in the community.

While these websites host useful information, it is critical that the information provided on these platforms is of good quality, and that users can trust the information. This justifies a research agenda to afford this form of quality assurance. However, as this research area is still developing, even the terminologies used to identify such platforms are inconsistent. Srba and Bielikova [61] found that multiple terms are used for referring to CQA platforms. For example, while this paper refers to the types of community question and answer platforms as CQA, it is common for such platforms to be labelled Q&A, social Q&A and community forums [55], [57]. Beyond the terms used to identify CQA portals, Krüger et al. [41] conducted a secondary study of CQA papers and suggest that the quality of questions and answers is a central challenge of CQA systems. In addition, Srba and Bielikova [61] completed a CQA survey and noted preservation of long-term sustainability as a key area for future research of these platforms. For CQA platforms to have long-term sustainability, however, establishing the quality of these sites is key to ensuring that users trust the platform and feel that they can continue to participate. Other approaches for encouraging sustainability of CQA platforms in inspiring user participation are to provide tools and employ gamification techniques [33].

Stack Overflow has been noted as a successful platform due to its user participation, suggesting that its employment of gamification techniques indeed contributes to its prominence [16]. However, there are questions surrounding the quality of answers on Stack Overflow [25], [64]. In addition, given that developers use many of the code snippets in posts on Stack Overflow during development [62], it is important to evaluate the quality of these artefacts. In particular, while the Stack Overflow community's collective surveillance may help to identify and improve errors in code snippets on this platform and users may appropriately use code snippets by adapting them to their specific problem/task, this is not always the case. Wu et al. [68] examined how Stack Overflow code snippets were used in open-source projects and found that only 44% of the files containing Stack Overflow snippets showed that these snippets were modified prior to reuse by software developers. In fact, Bi [11] has shown that even the throwing/catching of generic exceptions is missed by software developers reusing Stack Overflow code snippets. We thus set out to understand the quality of code snippets that are often provided in Stack Overflow posts. The findings of this research could be helpful for directing future research on Stack Overflow, by facilitating investigations aimed at providing mechanisms to further scrutinise the aspects of quality that Stack Overflow code snippets do not meet. Additionally, developers selecting code snippets from Stack Overflow during software development will have a better understanding of the potential limitations of the code they use.

However, addressing the objective of this research project requires code snippet quality to be defined. This is done by assessing research and considering code snippet quality in relation to well established and understood software quality measurements [35]. This has led to our consideration of code reliability and conformance to programming rules, readability, performance and security (refer to Section 2.1 for discussion on this issue). Given our definition of quality, consisting of multiple dimensions, code snippets are extracted from Stack Overflow and analysed against these criteria. We then provide results at multiple levels of granularity, covering all violations (or errors), snippet specific violations and qualitative analysis assessing the implications for the presence of violations, and Stack Overflow community's efforts towards addressing these. The outcomes provided in this work are a survey of Stack Overflow code snippets' violations against these quality dimensions, as well as the types of violation that are evident in code provided by contributors. We also outline the basis for how the software development community may craft an agenda towards maintaining software quality, notwithstanding the utility that Stack Overflow provides.

The remaining sections of this study are structured as follows. Section 2 considers the literature relating to Stack Overflow, and outlines subsequent research questions. In Section 3, the proposed methodology of this research is introduced and discussed. This leads to Section 4, which presents the results of the study in relation to the associated research questions. Section 5 provides a discussion of the results along with the implications. The threats of the research are next discussed in Section 6. Finally, Section 7 concludes the work and outlines potential areas for future research.

Section snippets

Background and research questions

In order to achieve the objective of this research, literature related to code quality on Stack Overflow is reviewed in Section 2.1, with particular emphasis on understanding works that have evaluated code. Thereafter, we synthesise the related literature to identify gaps and outline our research questions in Section 2.2.

Methodology

In line with the open nature of our overarching question, what is the quality of code snippets provided in answers on Stack Overflow?, we employ an exploratory tone to our analyses, starting with an exhaustive body of quantitative analysis before performing deeper qualitative analysis. This later analysis considers implications for quality violations, and Stack Overflow community's effort towards ensuring contributors' awareness of potential code quality shortcomings. We provide further details

Results

In Fig. 6, we visualise the general patterns of outcomes for violations that were found by the PMD, Checkstyle and FindBugs tools (excluding those violations that were introduced (during pre-processing) by the class declaration wrapper or code snippet unique identifier file name). As seen in Fig. 3, the results in Fig. 6 show that the code snippets that produce errors and the code snippets that do not produce errors have somewhat similar properties (refer to Fig. 6.a). The majority of the code

Discussion and implications

Q&A portals such as Stack Overflow are now central to the way software developers address their knowledge deficiencies when creating software [26]. Thus, there is increasing interest and drive to understand the quality of the content that is provided on such portals [25]. We set out to understand the quality of code snippets that are often provided in Stack Overflow posts. In order to do so, we operationalise code snippet quality along the dimensions of reliability and conformance to

Threats

The tools used to assess Stack Overflow code snippets have certain limitations. However, these tools are accepted by the software engineering community [34], and have also been validated by academic research [7]. For instance, Ayewah and Pugh [7] evaluated FindBugs using a dataset from Google, finding that this tool was very effective at detecting bugs, and its use was considered to be beneficial for saving time and money for developers. That said, we performed multiple rounds of manual checks

Conclusion and future research

In this study we set out to answer the overarching question: what is the quality of code snippets provided in answers on Stack Overflow? We observed that while studies have expressed reservation around the quality of the content provided in Stack Overflow posts, there has been limited effort aimed at evaluating the quality of code snippets on this platform. This is undesirable, as evidence has shown that this platform is used heavily by developers for solving problems during software

Declaration of Competing Interest

The authors whose names are listed immediately above certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in

Acknowledgements

We thank Stack Overflow for granting us access to the data that were analysed in this study. Thanks to the reviewers for their detailed and insightful comments on the early version of this work. This work is funded by a University of Otago Commerce Research Grant Award (CRG-2019) — accessed through the Otago Business School Research Committee.

References (71)

R. Abdalkareem et al.
On code reuse from StackOverflow: an exploratory study on Android apps
Inf. Softw. Technol.
(2017)
J. Holvitie et al.
Technical debt and agile software development practices and processes: An industry practitioner survey
Inf. Softw. Technol.
(2018)
M. Hosseini et al.
Crowdsourcing: a taxonomy and systematic mapping study
Comput. Sci. Rev.
(2015)
C. Shah et al.
Research agenda for social Q&A
Library Inf. Sci. Res.
(2009)
S. Wagner et al.
Operationalised product quality models and assessment: the Quamoco approach
Inf. Softw. Technol.
(2015)
Y. Acar et al.
You get where you're looking for: the impact of information sources on code security
M. Ahmad et al.
Impact of stack overflow code snippets on software cohesion: a preliminary study
B. Amir et al.
There is no random sampling in software engineering research
D. Anand et al.
Investigations into the goodness of posts in Q&A forums—popularity versus quality
M. Asaduzzaman et al.
Answering questions about unanswered questions of stack overflow

N. Ayewah et al.

The Google findbugs fixit

T. Bakota et al.

A probabilistic software quality model

S. Baltes et al.

Attribution required: stack overflow code snippets in GitHub projects

V. Bauer et al.

An exploratory study on reuse at Google

F. Bi

Nissan app developer busted for copying code from Stack Overflow

R.P. Buse et al.

Learning a metric for code readability

IEEE Trans. Softw. Eng.

(2010)

R.P. Buse et al.

A metric for software readability

U. Campos et al.

Mining rule violations in JavaScript code snippets

CAST

Software Intelligence for digital leaders

H. Cavusoglu et al.

Can gamification motivate voluntary contributions? The case of StackOverflow Q&A community

F. Chen et al.

Crowd debugging

J. Cohen

Statistical Power Analysis for the Behavioral Sciences

(2013)

W. Cunningham

The WyCash portfolio management system

ACM SIGPLAN OOPS Messenger

(1993)

M. di Biase et al.

The delta maintainability model: measuring maintainability of fine-grained code changes

M. Duijn et al.

Quality questions need quality code: classifying code fragments on stack overflow

S. Ercan et al.

Predicting answering times on stack overflow

N.A. Ernst et al.

Measure it? Manage it? Ignore it? Software practitioners and technical debt

F. Fischer et al.

Stack overflow considered harmful? The impact of copy&paste on Android application security

A.L. Ginsca et al.

User profiling for answer quality assessment in Q&A communities

R. Gupta et al.

Learning from gurus: analysis and modeling of reopened questions on stack overflow

S. Haefliger et al.

Code reuse in open source software

Manag. Sci.

(2008)

I. Heitlager, A practical model for measuring maintainability – a preliminary...

O.R. Holsti

Content Analysis for the Social Sciences and Humanities

(1969)

J. Holvitie et al.

Co-existence of the ‘Technical Debt’ and ‘Software Legacy’ concepts

Y. Jin et al.

Quick trigger on stack overflow: a study of gamification-influenced member tendencies

Cited by (21)

Developers’ information seeking in Question & Answer websites through a gender lens
2024, Journal of Computer Languages
Question & Answer websites for developers, such as Stack Overflow, contain enormous programming knowledge which can be redundant and cost substantial time and cognitive effort. We investigated the information seeking behavior of developers on Stack Overflow using Information Foraging Theory. To understand the influence of gender on foraging patterns, we conducted a gender-balanced think-aloud lab study with 12 participants, followed by retrospective interviews. The participants performed two debugging tasks: (1) understand foraging between question variants and (2) understand foraging between answer variants, on Stack Overflow. Various cues and strategies were utilized by the participants to find relevant question and optimal answer on Stack Overflow. The effect of gender on their foraging pattern was observed as male participants used 19.7% more cues and spent 55% more time than female participants. We also categorized various cues in terms of cost-value proposition and reported a debugging foraging model for Stack Overflow. Our study has implications for Question and Answer websites as well as Information Foraging Theory.
How are websites used during development and what are the implications for the coding process?
2023, Journal of Systems and Software
Websites are frequently used to support the development process. This paper investigates how websites are used when writing code and programmers’ perceptions of the potential impact of this on their behaviour and the quality of the resulting software. We interviewed 18 programmers (13 students enrolled in undergraduate computer science courses, and 5 experienced professionals), and analysed the data thematically. The findings were used to develop a survey, which was distributed to 276 programmers (251 students, 25 experienced professionals). The results indicate that use of websites, especially Stack Overflow, is viewed as an essential part of programming by both students completing coursework and professionals developing code in industry.
We also found that developers have experience of encountering a diverse set of problematic code snippets online, that copying code from websites without checking its quality or understanding how it worked is common, and that using online resources in this way had a potentially counter-productive effect on learning. Based on these findings, we make a number of recommendations, including better consideration of online code reuse in taught programmes, co-development and code-reuse practices in professional settings, and software licensing training for professional developers.
Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
How have views on Software Quality differed over time? Research and practice viewpoints
2023, Journal of Systems and Software
Citation Excerpt :
They claim that reusing code from knowledge-sharing communities boosts productivity and increases project quality. Such a view may be deemed contentious, however, especially when considering that Stack Overflow code may possess quality concerns (Meldrum et al., 2020a). What are the programming languages popularly used to evaluate code snippet quality?
Over the years, there has been debate about what constitutes software quality and how it should be measured. This controversy has caused uncertainty across the software engineering community, affecting levels of commitment to the many potential determinants of quality among developers. An up-to-date catalogue of software quality views could provide developers with contemporary guidelines and templates. In fact, it is necessary to learn about views on the quality of code on frequently used online collaboration platforms (e.g., Stack Overflow), given that the quality of code snippets can affect the quality of software products developed. If quality models are unsuitable for aiding developers because they lack relevance, developers will hold relaxed or inappropriate views of software quality, thereby lacking awareness and commitment to such practices.
We aim to explore differences in interest in quality characteristics across research and practice. We also seek to identify quality characteristics practitioners consider important when judging code snippet quality. First, we examine the literature for quality characteristics used frequently for judging software quality, followed by the quality characteristics commonly used by researchers to study code snippet quality. Finally, we investigate quality characteristics used by practitioners to judge the quality of code snippets.
We conducted two systematic literature reviews followed by semi-structured interviews of 50 practitioners to address this gap.
The outcomes of the semi-structured interviews revealed that most practitioners judged the quality of code snippets using five quality dimensions: Functionality, Readability, Efficiency, Security and Reliability. However, other dimensions were also considered (i.e., Reusability, Maintainability, Usability, Compatibility and Completeness). This outcome differed from how the researchers judged code snippet quality.
Practitioners today mainly rely on code snippets from online code resources, and specific models or quality characteristics are emphasised based on their need to address distinct concerns (e.g., mobile vs web vs standalone applications, regular vs machine learning applications, or open vs closed source applications). Consequently, software quality models should be adapted for the domain of consideration and not seen as one-size-fits-all. This study will lead to targeted support for various clusters of the software development community.
How are framework code samples maintained and used by developers? The case of Android and Spring Boot
2022, Journal of Systems and Software
Citation Excerpt :
They assess using regular expressions and manual checks. Meldrum et al. (2020) evaluate the quality of code snippets on Stack Overflow, exploring aspects as reliability and conformance to programming rules, readability, performance, and security. Finally, studies are analyzing the adoption of code snippets (Heinemann et al., 2011; Roy and Cordy, 2010; Yang et al., 2017).
Modern software systems are commonly built on top of frameworks. To accelerate the learning process of features provided by frameworks, code samples are made available to assist developers. However, we know little about how code samples are developed and consumed. In this paper, we aim to fill this gap by assessing the characteristics of framework code samples. We provide insights into how code samples are maintained and used by developers. We analyze over 230 code samples provided by Android and Spring Boot, and assess aspects related to their code, evolution, popularity, and client usage. We find that most code samples are small and simple, provide a working environment for the clients, and rely on automated build tools. They frequently change, for example, to adapt to new framework versions. We also detect that clients commonly fork the code samples, however, they rarely modify them. To further understand the problems faced by developers, we analyze 614 Stack Overflow questions about the code samples and 269 issues from code sample repositories. We find that developers face problems when trying to modify the code samples and the most common issue is related to improvement. Finally, we propose implications to creators and clients of code samples to improve maintenance and usage activities.
An empirical study of COVID-19 related posts on Stack Overflow: Topics and technologies
2021, Journal of Systems and Software
The COVID-19 outbreak, also known as the coronavirus pandemic, has left its mark on every aspect of our lives and at the time of this writing is still an ongoing battle. Beyond the immediate global-wide health response, the pandemic has triggered a significant number of IT initiatives to track, visualize, analyze and potentially mitigate the phenomenon. For individuals or organizations interested in developing COVID-19 related software, knowledge-sharing communities such as Stack Overflow proved to be an effective source of information for tackling commonly encountered problems. As an additional contribution to the investigation of this unprecedented health crisis and to assess how fast and how well the community of developers has responded, we performed a study on COVID-19 related posts in Stack Overflow. In particular, we profiled relevant questions based on key post features and their evolution, identified the most prominent technologies adopted for developing COVID-19 software and their interrelations and focused on the most persevering problems faced by developers. For the analysis of posts we employed descriptive statistics, Association Rule Graphs, Survival Analysis and Latent Dirichlet Allocation. The results reveal that the response of the developers’ community to the pandemic was immediate and that the interest of developers on COVID-19 related challenges was sustained after its initial peak. In terms of the problems addressed, the results show a clear focus on COVID-19 data collection, analysis and visualization from/to the web, in line with the general needs for monitoring the pandemic.
Recency and quality-based ranking question in CQAs: A Stack Overflow case study
2021, Information Processing and Management
Citation Excerpt :
Elalfy et al. (2018) proposed a model that considers content features consisting of question–answer features, answer content features, answer–answer features, and non-content features extracted from user profiles (confidence and expertise level). The proposal described in (Meldrum, Licorish, Owen, & Savarimuthu, 2020) evaluates the quality of code snippets on Stack Overflow, exploring aspects of code snippet quality, such as features of reliability and conformance to programming rules, readability, performance, and security. Some works use multi-features-based approaches for predicting or ranking answers in CQA systems (Burel, He, & Alani, 2012; Dalip et al., 2017; Fu, Wu, & Oh, 2015; Gkotsis, Stepanyan, Pedrinaci, Domingue, & Liakata, 2014; Liu, Feng, Liu, Hu, & Wang, 2015; YueLiu, Tang, FeiCai, Ren, & Sun, 2019). (
Recency ranking, in Community-based Question Answering (CQA), would refer to put recent answers in a list’s top positions. To be recent is not related to how new is the date of creation or editing of a given answer, but how current is the content of the answer. A good ranking should also consider the answers’ quality since a current but no quality answer may be useless. Similarly, a high-quality answer, presenting adequate text and references with obsolete information, may be valueless. Combining these two issues (recency and quality) is crucial as users usually hope for current solutions and need to have fast/easy access (top items in the ranking) to the best answers to solve their problems quickly. The CQAs usually provide voting mechanisms so that the users can indicate the best quality answers. However, this method is not concerned with the recency of the answers besides being a slow and subjective process, which does not keep up with new content’s dynamism. Therefore, we propose an automatic approach that, besides the quality, also considers the answer’s recency to generating the ranking. We have used textual and non-textual features that indicate the answers’ quality and recency, extracted from the users’ answers in the CQA environment as a whole. In our approach, quality is used to classify the answers between good and poor, using a threshold value, generating two sets of answers: high quality and low quality. Then, both sets are sorted into recency order. Finally, these sets are concatenated, giving rise to the final ranking, so that the best and most current answers are in the top positions. To verify our proposal’s effectiveness, we have performed a case study in Stack Overflow CQA with a set of experiments, using different combinations of characteristics and different learning to rank Stack Overflow. Then, our main contributions are: (1) an approach to ranking answers of a questions dataset on the recency and quality of an answer; (2) a thorough evaluation of 9 learning to rank algorithms, showing that Coordinate Ascent and LambdaMart have the best performance in this task; (3) a feature analysis, which has shown that features related to the age of the response contributed to improving the ranking performance taking recency and quality into account. Furthermore, as far as we know, it is the first work that considers the recency of an answer in this task.

View all citing articles on Scopus

View full text

Understanding stack overflow code quality: A recommendation of caution

Abstract

Introduction

Section snippets

Background and research questions

Methodology

Results

Discussion and implications

Threats

Conclusion and future research

Declaration of Competing Interest

Acknowledgements

Inf. Softw. Technol.

Inf. Softw. Technol.

Comput. Sci. Rev.

Library Inf. Sci. Res.

Inf. Softw. Technol.

You get where you're looking for: the impact of information sources on code security

Impact of stack overflow code snippets on software cohesion: a preliminary study

There is no random sampling in software engineering research

Investigations into the goodness of posts in Q&A forums—popularity versus quality

Answering questions about unanswered questions of stack overflow

The Google findbugs fixit

A probabilistic software quality model

Attribution required: stack overflow code snippets in GitHub projects

An exploratory study on reuse at Google

Nissan app developer busted for copying code from Stack Overflow

Learning a metric for code readability

IEEE Trans. Softw. Eng.

A metric for software readability

Mining rule violations in JavaScript code snippets

Software Intelligence for digital leaders

Can gamification motivate voluntary contributions? The case of StackOverflow Q&A community

Crowd debugging

Statistical Power Analysis for the Behavioral Sciences

The WyCash portfolio management system

ACM SIGPLAN OOPS Messenger

The delta maintainability model: measuring maintainability of fine-grained code changes

Quality questions need quality code: classifying code fragments on stack overflow

Predicting answering times on stack overflow

Measure it? Manage it? Ignore it? Software practitioners and technical debt

Stack overflow considered harmful? The impact of copy&paste on Android application security

User profiling for answer quality assessment in Q&A communities

Learning from gurus: analysis and modeling of reopened questions on stack overflow

Code reuse in open source software

Manag. Sci.

Content Analysis for the Social Sciences and Humanities

Co-existence of the ‘Technical Debt’ and ‘Software Legacy’ concepts

Quick trigger on stack overflow: a study of gamification-influenced member tendencies