Skip to main content
Log in

Understanding the factors for fast answers in technical Q&A websites

An empirical study of four stack exchange websites

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Technical questions and answers (Q&A) websites accumulate a significant amount of knowledge from users. Developers are especially active on these Q&A websites, since developers are constantly facing new development challenges that require help from other experts. Over the years, Q&A website designers have derived several incentive systems (e.g., gamification) to encourage users to answer questions that are posted by others. However, the current incentive systems primarily focus on the quantity and quality of the answers instead of encouraging the rapid answering of questions. Improving the speed of getting an answer can significantly improve the user experience and increase user engagement on such Q&A websites. In this paper, we explore how one may improve the current incentive systems to motivate fast answering of questions. We use a logistic regression model to analyze 46 factors along four dimensions (i.e., question, asker, answer, and answerer dimension) in order to understand the relationship between the studied factors and the needed time to get an accepted answer. We conduct our study on the four most popular (i.e., with the most questions) Q&A Stack Exchange websites: Stack Overflow, Mathematics, Ask Ubuntu, and Superuser. We find that i) factors in the answerer dimension have the strongest effect on the needed time to get an accepted answer, after controlling for other factors; ii) the current incentive system does not recognize non-frequent answerers who often answer questions which frequent answerers are not able to answer. Such questions that are answered by non-frequent answerers are as important (i.e., have similar range of scores) as those that are answered by frequent answerers; iii) the current incentive system motivates frequent answerers well, but such frequent answerers tend to answer short questions. Our findings suggest that Q&A website designers should improve their incentive systems to motivate non-frequent answerers to be more active and to answer questions fast, in order to shorten the waiting time to receive an answer (especially for questions that require specific knowledge that frequent answerers might not possess). In addition, the question answering incentive system needs to factor in the value and difficulty of answering the questions (e.g., providing more rewards to harder questions or questions that remain unanswered for a long period of time).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://stackoverflow.com/

  2. http://stackoverflow.com/help/whats-reputation

  3. http://stackoverflow.com/help/whats-reputation

  4. http://stackoverflow.com/questions/3481828/how-to-split-a-string-in-java, last accessed Sep 21th, 2017.

  5. https://cran.r-project.org/web/packages/tm/index.html

  6. https://stackexchange.com/sites

  7. http://stackoverflow.com/

  8. http://math.stackexchange.com/

  9. http://askubuntu.com/

  10. http://superuser.com/

  11. https://archive.org/details/stackexchange

  12. http://sail.cs.queensu.ca/replication/AnswerSpeedStackExchange/Index.html

  13. https://cran.r-project.org/web/packages/rms/index.html

  14. https://superuser.com/questions/950009/cortana-search-is-not-finding-applications-on-windows-10/

  15. http://stackoverflow.com/questions/28553722/pyplot-error-in-julia-type-pyobject-has-no-field-set-yscale

  16. https://meta.stackoverflow.com/questions/295688/how-to-highlight-difficult-or-old-questions-and-then-reward-the-answers; the post received 56 up votes, 10 favorite votes, and 1,189 views

  17. https://cran.r-project.org/web/packages/rms/index.html

  18. https://cran.r-project.org/web/packages/rms/index.html

References

  • Adamic LA, Zhang J, Bakshy E, Ackerman MS (2008) Knowledge sharing and yahoo answers: everyone knows something. In: Proceedings of the 17th international conference on world wide web, WWW ’08, pp 665–674

  • Aizawa A (2003) An information-theoretic perspective of tf–idf measures. Inf. Process. Manage. 39(1):45–65

    Article  Google Scholar 

  • Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’12, pp 850–858

  • Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2013) Steering user behavior with badges. In: Proceedings of the 22nd international conference on world wide web, WWW ’13, pp 95–106

  • Antin J, Churchill EF (2011) Badges in social media: a social psychological perspective. In: CHI 2011 Gamification workshop proceedings (Vancouver, BC, Canada, 2011)

  • Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13, pp 97–100

  • Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t touch my code!: Examining the effects of ownership on software quality. In: Proceedings of the 19th ACM SIGSOFT symposium and the 13th european conference on foundations of software engineering, pp 4–14

  • Briggs A, Clark T, Wolstenholme J, Clark P (2003) Missing... presumed at random: cost analysis of incomplete data. Health Econ 12(5):377–393

    Article  Google Scholar 

  • Bulmer M (1979) Principles of statistics. Dover books on mathematics series. Dover Publications, New York

    Google Scholar 

  • Cavusoglu H, Li Z, Huang K-W (2015) Can gamification motivate voluntary contributions?: the case of stackoverflow Q&A community. In: Proceedings of the 18th ACM conference companion on computer supported cooperative work & social computing, CSCW’15 companion, pp 171–174

  • Chambers JM (1991) Statistical models in S. CRC Press, Inc., Boca Raton

    Google Scholar 

  • Chen T-H, Thomas SW, Nagappan M, Hassan AE (2012) Explaining software defects using topic models. In: Proceedings of the 9th IEEE working conference on mining software repositories, MSR ’12, pp 189–198

  • Cliff N (1993) Dominance statistics: ordinal analyses to answer ordinal questions. Psychol Bull 114:494–509

    Article  Google Scholar 

  • Colburn L (2016) SLI systems granted auto complete patent. https://www.sli-systems.com/sli-systems-granted-auto-complete-patent

  • Cornea R, Weininger N (2014) Providing autocomplete suggestions. US Patent 8,645,825

  • Dunn OJ (1961) Multiple comparisons among means. American Statistical Association 56:52–64

    Article  MathSciNet  Google Scholar 

  • Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81(394):461–470

    Article  MathSciNet  Google Scholar 

  • Freedman D (2005) Statistical models: theory and practice. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Han J (2005) Data mining: concepts and techniques. Morgan Kaufmann Publishers Inc, San Francisco

    Google Scholar 

  • Harrell FE Jr (2006) Regression modeling strategies. Springer-Verlag New York, Inc., Secaucus

    Google Scholar 

  • Linares-Vásquez M, Bavota G, Di Penta M, Oliveto R, Poshyvanyk D (2014) How do api changes trigger stack overflow discussions? A study on the android sdk. In: Proceedings of the 22nd international conference on program comprehension, ICPC 2014, New York, NY, USA, pp 83–94

  • McIntosh S, Kamei Y, Adams B, Hassan AE (2016) An empirical study of the impact of modern code review practices on software quality. Empir Softw Eng 21(5):2146–2189

    Article  Google Scholar 

  • Miller HJ, Han J (2001) Geographic data mining and knowledge discovery. Taylor & Francis Inc, New York

    Book  Google Scholar 

  • Mockus A (2008) Missing data in software engineering. Springer, London, pp 185–200

    Google Scholar 

  • Moore D, Maccabe G, Craig B (2009) Introduction to the practice of statistics. W.H. Freeman and Company, San Francisco

    Google Scholar 

  • Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014a) Improving low quality stack overflow post detection. In: 30th IEEE international conference on software maintenance and evolution, Victoria, BC, Canada, September 29–October 3, 2014, pp 541–544

  • Ponzanelli L, Bavota G, Di Penta M, Oliveto R, Lanza M (2014b) Mining stackoverflow to turn the IDE into a self-confident programming prompter. In: Proceedings of the 11th working conference on mining software repositories, MSR ’13, pp 102–111

  • Ponzanelli L, Mocci A, Bacchelli A, Lanza M (2014c) Understanding and classifying the quality of technical forum questions. In: 14th international conference on quality software, pp 343–352

  • Rahman MM, Roy CK (2015) An insight into the unresolved questions at stack overflow. In: Proceedings of the 12th working conference on mining software repositories, MSR ’15, pp 426–429

  • San Pedro J, Karatzoglou A (2014) Question recommendation for collaborative question answering systems with RankSLDA. In: Proceedings of the 8th ACM conference on recommender systems. ACM, pp 193–200

  • Squire M (2015) Should we move to stack overflow? Measuring the utility of social media for developer support. In: Proceedings of the 37th international conference on software engineering, pp 219–228

  • StackOverflow (2016) Developer survey results 2016. http://stackoverflow.com/research/user-survey-2016

  • Thongtanunam P, McIntosh S, Hassan AE, Iida H (2016) Revisiting code ownership and its relationship with software quality in the scope of modern code review. In: Proceedings of the 38th international conference on software engineering, pp 1039–1050

  • Treude C, Barzilay O, Storey M-A (2011) How do programmers ask and answer questions on the web? (nier track). In: Proceedings of the 33rd international conference on software engineering, pp 804–807

  • Vasilescu B, Serebrenik A, Devanbu P, Filkov V (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM conference on computer supported cooperative work & social computing, CSCW ’14, pp 342–354

  • Wang S, Lo D, Jiang L (2013) An empirical study on developer interactions in stackoverflow. In: Proceedings of the 28th annual ACM symposium on applied computing, SAC ’13, Coimbra, Portugal, March 18–22, 2013, pp 1019–1024

  • Wang S, Lo D, Vasilescu B, Serebrenik A (2014) Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of the international conference on software maintenance and evolution, pp 291–300

  • Xia X, Lo D, Wang X, Zhou B (2013) Tag recommendation in software information sites. In: Proceedings of the 10th working conference on mining software repositories, MSR ’13, pp 287–296

  • Yang L, Bao S, Lin Q, Wu X, Han D, Su Z, Yu Y (2011) Analyzing and predicting not-answered questions in community-based question answering services. In: Proceedings of the twenty-fifth AAAI conference on artificial intelligence, AAAI’11. AAAI Press, pp 1273–1278

  • Yao Y, Tong H, Xie T, Akoglu L, Xu F, Lu J (2013) Want a good answer? Ask a good question first! arXiv:1311.6876

  • Zhou M, Mockus A (2011) Does the initial environment impact the future of developers?. In: Proceedings of the 33rd international conference on software engineering, ICSE ’11, pp 271–280

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaowei Wang.

Additional information

Communicated by: Per Runeson

Appendix A: Model Building and Analysis Process

Appendix A: Model Building and Analysis Process

In this appendix, we present the detail of our model building process.

Figure 12 shows an overview of our model building process. We use the R package rms Footnote 17 as the implementation of our logistic regression model. Below, we describe the detailed steps of our model building process.

Fig. 12
figure 12

An overview of our model construction and analysis approaches

1. Label Assignment

Since we use a classification model to understand the impact of the studied factors on the speed. We first need to select the questions that are used to build the model and assign the label (i.e., fast-answered question or slow-answered question) to these questions.

As the results shown in Section 5, more than half of the questions were answered within one hour. Thus, the needed time to answer a question is very close (i.e., within minutes) for most questions. Such skewness in the data will have a negative impact on the resulting model (i.e., increase bias).

Figure 13 presents the percentage of the questions that are received in the time window that are around the median cut-off point (i.e., median of TimeToGetAcceptedAnswer). We see that the number of questions that is around the median cut-off point is notably large. For example, 10.8% (6027) of the questions receive an accepted answer within a time window of 5 minutes less or larger than the median value of TimeToGetAcceptedAnswer on Stack Overflow. If we loosen the time window to 20 minutes, 53.4% (29,834) of the questions receive an accepted answer in 20 minutes less or larger than the median value of TimeToGetAcceptedAnswer. In other word, more than half of the questions on Stack Overflow land on the boundary, which probably could result in having a large amount of noise in our built model.

Fig. 13
figure 13

The percentage of questions that received answers within time window of median (TimeToGetAcceptedAnswer) ± x minutes

To reduce such noise, we sort the questions based on their needed time to get an accepted answer, and then label the top 20% of questions as the fast-answered questions and bottom 20% of questions as the slow-answered question. This approach intuitively fits with goals of our study (studying the speed of answering where a few minutes difference should not be used to distinguish between a fast-answered question and a slow-answered question). The mean values of TimeToGetAcceptedAnswer of two groups are shown in Table 10. We could observe that the fast-answered questions were answered within 0.1 hours on average, while slow-answered questions needed at least 10 days to be answered.

Table 10 The comparison of mean values of TimeToGetAcceptedAnswer between fast-answered and slow-answered questions

2. Normality Adjustment

When building a logistic regression model, the model prefers the explanatory variables to be normally distributed in order to produce a more stable and robust model (Freedman 2005). In our case, most of the studied factors are skewed. All studied factors are considered as highly skewed (i.e., the skewness is larger than 1) (Bulmer 1979) except for Tag_Level_Difference, Tag_Number, Q_Title_Popularity, Mean_Down_Votes, Median_Down_Votes, and Sum_Down_Votes. Therefore, we apply a logarithm transformation [ l n(x + 1)] to all the studied factors to reduce skewness.

3. Correlation & Redundancy Analysis

We remove correlated and redundant factors using the following steps: i) removing factors with zero variance; ii) removing highly correlated factors; iii) and removing redundant factors.

We first remove factors with zero variance, since these factors do not have any contribution to the model. For example, the variance of Median_Down_Votes of Super User is 0, which indicates the value of Median_Down_Votes of the studied Super User data (top 20% and bottom 20%) is unique (i.e., 0 in this case).

Highly correlated factors can cause multicollinearity problems in our model. Thus, we perform a correlation analysis to remove highly correlated factors using a variable clustering analysis technique by following prior studies (Thongtanunam et al. 2016; McIntosh et al. 2016). We construct a hierarchical overview of the correlation among the factors and select one factor from each cluster of highly-correlated variables, i.e., |ρ| > 0.7 (Thongtanunam et al. 2016).

After this step, there remains 28, 28, 26, and 27 factors in the Stack Overflow, Mathematics, Ask Ubuntu, and Super User data, respectively (see the remained factors at Table 4).

Correlation analysis reduces multicollinearity among the factors, but it may not detect all of the redundant factors (i.e., factors that do not have a unique signal relative to the other factors). We remove redundant factors by using the redun function in the R package rms Footnote 18 with the default R 2 threshold of 0.9. However, no factors were removed in this step. The final factors are presented in Table 4.

4. Non-linear Term Allocation

When building a logistic regression model, some factors potentially share non-linear relationships with the response variable. However, logistic regression models are mainly used for modeling linear relationships. Thus, we use restricted cubic splines (Harrell 2006) to add the non-linear terms of factors into the model by following prior studies (Thongtanunam et al. 2016; McIntosh et al. 2016). We measure the non-linear relationship by calculating the Spearman multiple ρ 2 between the dependent variable y and linear and quadratic forms of each factor(x i , x i2). A large ρ 2 indicates that there is a high chance for a non-linear relationship between a factor and the response variable, which indicates that the factor should be assigned a larger degree of freedom. By observing the rough clustering of the factors according to their ρ 2, we cluster the factors into four groups according to the Spearman multiple ρ 2 values across the four websites (see Figure 14). We give factors in the first, second, and third groups five, four, and three degrees, respectively.

Fig. 14
figure 14

Dotplot of the Spearman multiple ρ 2 of each factor in the four studied websites. The larger the ρ 2 value, the more likely the factor has a non-linear relationship with the response variable. The first, second, and third groups of factors (categorized by the ρ 2 value) are highlighted with red circle, green triangle, and blue plus, respectively

5. Logistic Regression Model Building

Finally, after selecting the factors and specifying the non-linear terms of the factors, we build our regression models using the preprocessed data. When building the model, we consider text-related question factors, asker factors, answer factors as control variables by including it in the model; an approach that is commonly used in regression models (Miller and Han 2001; Bird et al. 2011; Chen et al. 2012). We use the function lrm in the R package rms as the implementation of logistic regression model and use the rcs function in rms as the implementation of restricted cubic splines.

6. Model assessment

We use AUC and bootstrapping to assess the explanatory power of the logistic regression model (i.e., ability of the model to capture the relationship between the explanatory variables and the response variable). AUC is the area under the Receiver Operating Characteristic (ROC) curve (Han 2005). The area under ROC curve is often used as a measure of the quality of classification models. A random classifier has an AUC of 0.5, while the AUC for a perfect classifier is equal to 1. In practice, most of the regression models have an AUC between 0.5 and 1.

Since AUC can be an overestimation (i.e., higher than it actually is) if the model is overfitted to the data, we further evaluate the stability of our model. Similar to prior work (McIntosh et al. 2016; Thongtanunam et al. 2016), we reduce such overestimation by using a bootstrap-derived approach (Efron 1986). The steps of the bootstrap-derived approach are listed below:

  1. 1.

    From the original dataset with n records (i.e., 55,853, 70,336, 7,134, and 10,776 for Stack Overflow, Mathematics, Ask Ubuntu, and Super User, respectively), select a bootstrap sample, i.e., a random sample of n records with replacement.

  2. 2.

    In the bootstrap sample, we build a model using the same allocation of knots as was used in the original dataset.

  3. 3.

    Apply the model that is built using the bootstrap sample on the bootstrapped and the original datasets. We calculate the AUC for each model.

  4. 4.

    The optimism is the difference in the AUC of the bootstrap sample and the original sample. Note that optimism is not an absolute value. A positive sign indicates that AUC of the original sample is larger than that of the bootstrap sample; a negative sign indicates that AUC of the bootstrap sample is larger than that of the original sample.

The above process is repeated 1,000 times and the average (mean) optimism is calculated. Small optimism values indicate that the model does not suffer from overfitting.

7. Explanatory Variables Analysis

After our model assessment step, if the AUC value is high and the optimism value is low (i.e., our model can explain the TimeToGetAcceptedAnswer well with low bias), we can then use the model to study the impact of each factor on the TimeToGetAcceptedAnswer. We measure the impact of each factor on the TimeToGetAcceptedAnswer using the Wald χ 2 test (Chambers 1991). The Wald χ 2 test is commonly used in biostatistic (Harrell 2006) and software engineering (McIntosh et al. 2016; Thongtanunam et al. 2016) research to understand the impact of factors in a model.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Chen, TH. & Hassan, A.E. Understanding the factors for fast answers in technical Q&A websites. Empir Software Eng 23, 1552–1593 (2018). https://doi.org/10.1007/s10664-017-9558-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-017-9558-5

Keywords

Navigation