Skip to main content
Log in

The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Programming-specific Q&A sites (e.g., Stack Overflow) are being used extensively by software developers for knowledge sharing and acquisition. Due to the cross-reference of questions and answers (note that users also reference URLs external to the Q&A site. In this paper, URL sharing refers to internal URLs within the Q&A site, unless otherwise stated), knowledge is diffused in the Q&A site, forming a large knowledge network. In Stack Overflow, why do developers share URLs? How is the community feedback to the knowledge being shared? What are the unique topological and semantic properties of the resulting knowledge network in Stack Overflow? Has this knowledge network become stable? If so, how does it reach to stability? Answering these questions can help the software engineering community better understand the knowledge diffusion process in programming-specific Q&A sites like Stack Overflow, thereby enabling more effective knowledge sharing, knowledge use, and knowledge representation and search in the community. Previous work has focused on analyzing user activities in Q&A sites or mining the textual content of these sites. In this article, we present a methodology to analyze URL sharing activities in Stack Overflow. We use open coding method to analyze why users share URLs in Stack Overflow, and develop a set of quantitative analysis methods to study the structural and dynamic properties of the emergent knowledge network in Stack Overflow. We also identify system designs, community norms, and social behavior theories that help explain our empirical findings. Through this study, we obtain an in-depth understanding of the knowledge diffusion process in Stack Overflow and expose the implications of URL sharing behavior for Q&A site design, developers who use crowdsourced knowledge in Stack Overflow, and future research on knowledge representation and search.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. In network science, assortative mixing is a bias in favor of connections between nodes with similar characteristics; disassortative mixing is a bias in favor of connections between dissimilarly characterized nodes. (Newman 2002)

  2. Modularity is one measure of network structure. A detected module is also called a community or a cluster. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.

  3. We do not list the detailed answer-to-question distribution, since the focus of this work is on those non-isolated knowledge units and their associations.

  4. How do I ask a good question? http://stackoverflow.com/help/how-to-ask

  5. How does Stack Overflow handle spam? http://meta.stackexchange.com/q/2765

  6. Visit the web service we built http://knowledge-so.appspot.com to see more examples.

  7. The results of other top N knowledge units are similar. Visit our web service at http://knowledge-so.appspot.com to see more results.

References

  • Allamanis M., Sutton C. (2013) Why, when, and what: analyzing stack overflow questions by topic, type, and code. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 53–56

  • Amer-Yahia S., Bonchi F., Castillo C., Feuerstein E., Mendez-Diaz I., Zabala P. (2014) Composite retrieval of diverse and complementary bundles

  • Anderson A., Huttenlocher D., Kleinberg J., Leskovec J. (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 850–858

  • Bajaj K., Pattabiraman K., Mesbah A. (2014) Mining questions asked by web developers. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 112–121

  • Barabási A-L, Albert R. (1999) Emergence of scaling in random networks. Science 286(5439):509–512

    Article  MathSciNet  MATH  Google Scholar 

  • Barua A., Thomas S.W., Hassan A.E. (2014) What are developers talking about? An analysis of topics and trends in stack overflow. Empir Softw Eng:619–654

  • Bastian M., Heymann S., Jacomy M., et al. (2009) Gephi: an open source software for exploring and manipulating networks. ICWSM 8:361–362

    Google Scholar 

  • Blei D.M., Ng A.Y., Jordan M.I. (2003) Latent dirichlet allocation. J Mach Learn Res:993–1022

  • Blondel V.D., Guillaume J-L, Lambiotte R., Lefebvre E. (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008

    Article  Google Scholar 

  • Bordes A., Gabrilovich E. (ACM) Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1967–1967

  • Broder A., Kumar R., Maghoul F., Raghavan P., Rajagopalan S., Stata R., Tomkins A., Wiener J. (2000) Graph structure in the web. Comput Netw 33(1):309–320

    Article  Google Scholar 

  • Cattuto C., Loreto V., Pietronero L. (2007) Semiotic dynamics and collaborative tagging. Proc Natl Acad Sci 104(5):1461–1464

    Article  Google Scholar 

  • Clauset A., Shalizi C.R., Newman M.E. (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703

    Article  MathSciNet  MATH  Google Scholar 

  • Cucerzan S. (2007) Large-scale named entity disambiguation based on wikipedia data. EMNLP-CoNLL 7:708–716

    Google Scholar 

  • Ferré S., Hermann A. (2011) Semantic search: Reconciling expressive querying and exploratory search. In: The Semantic Web? ISWC 2011, pp 177–192

  • Fourney A., Morris M.R. (2013) Enhancing technical Q&A forums with citehistory. In: Proceedings of the seventh international conference on Weblogs and social media, ICWSM 2013

  • Fu W-T, Kannampallil T., Kang R., He J. (2010) Semantic imitation in social tagging. ACM Transactions on Computer-Human Interaction (TOCHI) 17(3):12

    Article  Google Scholar 

  • Fugelstad P., Dwyer P., Filson Moses J., Kim J., Mannino C.A., Terveen L., Snyder M. (2012) What makes users rate (share, tag, edit...)?: predicting patterns of participation in online communities. In: Proceedings of the ACM 2012 conference on computer supported cooperative work. ACM, pp 969–978

  • Golder S.A., Huberman B.A. (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208

    Article  Google Scholar 

  • Gómez C., Cleary B., Singer L. (2013) A study of innovation diffusion through link sharing on stack overflow. In: 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 81–84

  • Guerrouj L., Azad S., Rigby P.C. (2015) The influence of App churn on App success and StackOverflow discussions. In: IEEE 22nd international conference on software analysis, evolution and reengineering (SANER), pp 321–330

  • Halpin H., Robu V., Shepherd H. (2007) The complex dynamics of collaborative tagging. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 211–220

  • Hassan A.E., Holt R.C. (2004) The small world of software reverse engineering. In: Proceedings 11th working conference on reverse engineering, pp 278–283

  • Haveliwala T.H. (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 517–526

  • Java A., Song X., Finin T., Tseng B. (2007) Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, pp 56–65

  • Lin J. (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151

    Article  MathSciNet  MATH  Google Scholar 

  • Louridas P., Spinellis D., Vlachos V. (2008) Power laws in software. ACM Trans Softw Eng Methodol 18(1):2–1226

    Article  Google Scholar 

  • Mamykina L., Manoim B., Mittal M., Hripcsak G., Hartmann B. (2011) Design lessons from the fastest Q&A site in the West. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 2857–2866

  • Manning C.D., Raghavan P., Schütze H. (2008) Introduction to information retrieval. Cambridge University Press, New York. ISBN 0521865719, 9780521865715

    Book  MATH  Google Scholar 

  • Marchionini G. (2006) Exploratory search: from finding to understanding. Commun ACM 49(4):41–46

    Article  Google Scholar 

  • Maslov S., Sneppen K., Zaliznyak A. (2004) Detection of topological patterns in complex networks: correlation profile of the internet. Physica A: Statistical Mechanics and its Applications 333:529– 540

    Article  Google Scholar 

  • Meusel R., Vigna S., Lehmberg O., Bizer C. (2014) Graph structure in the web?revisited: a trick of the heavy tail. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion, International World Wide Web Conferences Steering Committee. International World Wide Web Conferences Steering Committee, pp 427–432

  • Newman M.E. (2002) Assortative mixing in networks. Phys Rev Lett 89(20):208701

    Article  Google Scholar 

  • Novielli N., Calefato F., Lanubile F. (2015) The challenges of sentiment detection in the social programmer ecosystem. In: Proceedings of the 7th international workshop on social software engineering. ACM, pp 33–40

  • Pal A., Chang S., Konstan J.A. (2012) Evolution of experts in question answering communities. In: ICWSM

  • Parnin C., Treude C., Grammel L., Storey M-A (2012) Crowd documentation: exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech. Rep

  • Pressman R. (2010) Software engineering: a practitioner?s approach, 7th edn. McGraw-Hill, Inc., New York

    Google Scholar 

  • Preusse J., Kunegis J., Thimm M., Staab S., Gottron T. (2013) Structural dynamics of knowledge networks. In: ICWSM

  • Rahman M.M., Yeasmin S., Roy C.K. (2014) Towards a context-aware ide-based meta search engine for recommendation about programming errors and exceptions. In: 2014 software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE). IEEE, pp 194–203

  • Rosen C., Shihab E. (2015) What are mobile developers asking about? a large scale study using stack overflow. Empir Softw Eng:1–32

  • Squire M. (2015) Should we move to stack overflow?: measuring the utility of social media for developer support. In: Proceedings of the 37th international conference on software engineering - volume 2. ICSE ’15. IEEE Press, Piscataway, pp 219–228

  • Subramanian S., Inozemtseva L., Holmes R. (2014) Live API documentation. In: Proceedings of the 36th international conference on software engineering. ICSE 2014. ISBN 978-1-4503-2756-5. ACM, New York, pp 643–652

  • Sunshine J., Herbsleb J.D., Aldrich J. (2015) Searching the state space: a qualitative study of api protocol usability. In: Proceedings of the 22Nd international conference on program comprehension. ICPC

  • Tjong Kim Sang E.F., De Meulder F. (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, Association for Computational Linguistics. Association for Computational Linguistics, pp 142–147

  • Treude C., Barzilay O., Storey M-A (2011) How do programmers ask and answer questions on the web?: Nier track. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 804–807

  • Vasilescu B., Serebrenik A., Devanbu P., Filkov V. (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, pp 342–354

  • Wagner C., Singer P., Strohmaier M., Huberman B.A. Semantic stability in social tagging streams. In: Proceedings of the 23rd international conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2014. International World Wide Web Conferences Steering Committee, pp 735–746

  • Wang S., Lo D., Jiang L. (2013) An empirical study on developer interactions in stackoverflow. ACM

  • Wang S., Lo D., Vasilescu B., Serebrenik A. (2014) Entagrec: an enhanced tag recommendation system for software information sites. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 291–300

  • Ye D., Xing Z., Foo C.Y., Ang Z.Q., Li J., Kapre N. Software-specific named entity recognition in software engineering social content. In: The 23rd IEEE international conference on software analysis, Evolution and Reengineering (SANER 2016), 2016 Accepted to appear. Preprint available at: http://yedeheng.weebly.com/uploads/5/0/3/9/50390459/saner2016.pdf

  • Zhang J., Ackerman M.S., Adamic L. (2007) Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 221–230

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deheng Ye.

Additional information

Communicated by: Helen Sharp

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, D., Xing, Z. & Kapre, N. The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow. Empir Software Eng 22, 375–406 (2017). https://doi.org/10.1007/s10664-016-9430-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-016-9430-z

Keywords

Navigation