Abstract
Programming-specific Q&A sites (e.g., Stack Overflow) are being used extensively by software developers for knowledge sharing and acquisition. Due to the cross-reference of questions and answers (note that users also reference URLs external to the Q&A site. In this paper, URL sharing refers to internal URLs within the Q&A site, unless otherwise stated), knowledge is diffused in the Q&A site, forming a large knowledge network. In Stack Overflow, why do developers share URLs? How is the community feedback to the knowledge being shared? What are the unique topological and semantic properties of the resulting knowledge network in Stack Overflow? Has this knowledge network become stable? If so, how does it reach to stability? Answering these questions can help the software engineering community better understand the knowledge diffusion process in programming-specific Q&A sites like Stack Overflow, thereby enabling more effective knowledge sharing, knowledge use, and knowledge representation and search in the community. Previous work has focused on analyzing user activities in Q&A sites or mining the textual content of these sites. In this article, we present a methodology to analyze URL sharing activities in Stack Overflow. We use open coding method to analyze why users share URLs in Stack Overflow, and develop a set of quantitative analysis methods to study the structural and dynamic properties of the emergent knowledge network in Stack Overflow. We also identify system designs, community norms, and social behavior theories that help explain our empirical findings. Through this study, we obtain an in-depth understanding of the knowledge diffusion process in Stack Overflow and expose the implications of URL sharing behavior for Q&A site design, developers who use crowdsourced knowledge in Stack Overflow, and future research on knowledge representation and search.
Similar content being viewed by others
Notes
In network science, assortative mixing is a bias in favor of connections between nodes with similar characteristics; disassortative mixing is a bias in favor of connections between dissimilarly characterized nodes. (Newman 2002)
Modularity is one measure of network structure. A detected module is also called a community or a cluster. Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.
We do not list the detailed answer-to-question distribution, since the focus of this work is on those non-isolated knowledge units and their associations.
How do I ask a good question? http://stackoverflow.com/help/how-to-ask
How does Stack Overflow handle spam? http://meta.stackexchange.com/q/2765
Visit the web service we built http://knowledge-so.appspot.com to see more examples.
The results of other top N knowledge units are similar. Visit our web service at http://knowledge-so.appspot.com to see more results.
References
Allamanis M., Sutton C. (2013) Why, when, and what: analyzing stack overflow questions by topic, type, and code. In: Proceedings of the 10th working conference on mining software repositories. IEEE Press, pp 53–56
Amer-Yahia S., Bonchi F., Castillo C., Feuerstein E., Mendez-Diaz I., Zabala P. (2014) Composite retrieval of diverse and complementary bundles
Anderson A., Huttenlocher D., Kleinberg J., Leskovec J. (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 850–858
Bajaj K., Pattabiraman K., Mesbah A. (2014) Mining questions asked by web developers. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 112–121
Barabási A-L, Albert R. (1999) Emergence of scaling in random networks. Science 286(5439):509–512
Barua A., Thomas S.W., Hassan A.E. (2014) What are developers talking about? An analysis of topics and trends in stack overflow. Empir Softw Eng:619–654
Bastian M., Heymann S., Jacomy M., et al. (2009) Gephi: an open source software for exploring and manipulating networks. ICWSM 8:361–362
Blei D.M., Ng A.Y., Jordan M.I. (2003) Latent dirichlet allocation. J Mach Learn Res:993–1022
Blondel V.D., Guillaume J-L, Lambiotte R., Lefebvre E. (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):10008
Bordes A., Gabrilovich E. (ACM) Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1967–1967
Broder A., Kumar R., Maghoul F., Raghavan P., Rajagopalan S., Stata R., Tomkins A., Wiener J. (2000) Graph structure in the web. Comput Netw 33(1):309–320
Cattuto C., Loreto V., Pietronero L. (2007) Semiotic dynamics and collaborative tagging. Proc Natl Acad Sci 104(5):1461–1464
Clauset A., Shalizi C.R., Newman M.E. (2009) Power-law distributions in empirical data. SIAM Rev 51(4):661–703
Cucerzan S. (2007) Large-scale named entity disambiguation based on wikipedia data. EMNLP-CoNLL 7:708–716
Ferré S., Hermann A. (2011) Semantic search: Reconciling expressive querying and exploratory search. In: The Semantic Web? ISWC 2011, pp 177–192
Fourney A., Morris M.R. (2013) Enhancing technical Q&A forums with citehistory. In: Proceedings of the seventh international conference on Weblogs and social media, ICWSM 2013
Fu W-T, Kannampallil T., Kang R., He J. (2010) Semantic imitation in social tagging. ACM Transactions on Computer-Human Interaction (TOCHI) 17(3):12
Fugelstad P., Dwyer P., Filson Moses J., Kim J., Mannino C.A., Terveen L., Snyder M. (2012) What makes users rate (share, tag, edit...)?: predicting patterns of participation in online communities. In: Proceedings of the ACM 2012 conference on computer supported cooperative work. ACM, pp 969–978
Golder S.A., Huberman B.A. (2006) Usage patterns of collaborative tagging systems. J Inf Sci 32(2):198–208
Gómez C., Cleary B., Singer L. (2013) A study of innovation diffusion through link sharing on stack overflow. In: 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 81–84
Guerrouj L., Azad S., Rigby P.C. (2015) The influence of App churn on App success and StackOverflow discussions. In: IEEE 22nd international conference on software analysis, evolution and reengineering (SANER), pp 321–330
Halpin H., Robu V., Shepherd H. (2007) The complex dynamics of collaborative tagging. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 211–220
Hassan A.E., Holt R.C. (2004) The small world of software reverse engineering. In: Proceedings 11th working conference on reverse engineering, pp 278–283
Haveliwala T.H. (2002) Topic-sensitive pagerank. In: Proceedings of the 11th international conference on World Wide Web. ACM, pp 517–526
Java A., Song X., Finin T., Tseng B. (2007) Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis. ACM, pp 56–65
Lin J. (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37(1):145–151
Louridas P., Spinellis D., Vlachos V. (2008) Power laws in software. ACM Trans Softw Eng Methodol 18(1):2–1226
Mamykina L., Manoim B., Mittal M., Hripcsak G., Hartmann B. (2011) Design lessons from the fastest Q&A site in the West. In: Proceedings of the SIGCHI conference on Human factors in computing systems. ACM, pp 2857–2866
Manning C.D., Raghavan P., Schütze H. (2008) Introduction to information retrieval. Cambridge University Press, New York. ISBN 0521865719, 9780521865715
Marchionini G. (2006) Exploratory search: from finding to understanding. Commun ACM 49(4):41–46
Maslov S., Sneppen K., Zaliznyak A. (2004) Detection of topological patterns in complex networks: correlation profile of the internet. Physica A: Statistical Mechanics and its Applications 333:529– 540
Meusel R., Vigna S., Lehmberg O., Bizer C. (2014) Graph structure in the web?revisited: a trick of the heavy tail. In: Proceedings of the companion publication of the 23rd international conference on World wide web companion, International World Wide Web Conferences Steering Committee. International World Wide Web Conferences Steering Committee, pp 427–432
Newman M.E. (2002) Assortative mixing in networks. Phys Rev Lett 89(20):208701
Novielli N., Calefato F., Lanubile F. (2015) The challenges of sentiment detection in the social programmer ecosystem. In: Proceedings of the 7th international workshop on social software engineering. ACM, pp 33–40
Pal A., Chang S., Konstan J.A. (2012) Evolution of experts in question answering communities. In: ICWSM
Parnin C., Treude C., Grammel L., Storey M-A (2012) Crowd documentation: exploring the coverage and the dynamics of api discussions on stack overflow. Georgia Institute of Technology, Tech. Rep
Pressman R. (2010) Software engineering: a practitioner?s approach, 7th edn. McGraw-Hill, Inc., New York
Preusse J., Kunegis J., Thimm M., Staab S., Gottron T. (2013) Structural dynamics of knowledge networks. In: ICWSM
Rahman M.M., Yeasmin S., Roy C.K. (2014) Towards a context-aware ide-based meta search engine for recommendation about programming errors and exceptions. In: 2014 software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE). IEEE, pp 194–203
Rosen C., Shihab E. (2015) What are mobile developers asking about? a large scale study using stack overflow. Empir Softw Eng:1–32
Squire M. (2015) Should we move to stack overflow?: measuring the utility of social media for developer support. In: Proceedings of the 37th international conference on software engineering - volume 2. ICSE ’15. IEEE Press, Piscataway, pp 219–228
Subramanian S., Inozemtseva L., Holmes R. (2014) Live API documentation. In: Proceedings of the 36th international conference on software engineering. ICSE 2014. ISBN 978-1-4503-2756-5. ACM, New York, pp 643–652
Sunshine J., Herbsleb J.D., Aldrich J. (2015) Searching the state space: a qualitative study of api protocol usability. In: Proceedings of the 22Nd international conference on program comprehension. ICPC
Tjong Kim Sang E.F., De Meulder F. (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003-Volume 4, Association for Computational Linguistics. Association for Computational Linguistics, pp 142–147
Treude C., Barzilay O., Storey M-A (2011) How do programmers ask and answer questions on the web?: Nier track. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 804–807
Vasilescu B., Serebrenik A., Devanbu P., Filkov V. (2014) How social Q&A sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing. ACM, pp 342–354
Wagner C., Singer P., Strohmaier M., Huberman B.A. Semantic stability in social tagging streams. In: Proceedings of the 23rd international conference on World Wide Web, International World Wide Web Conferences Steering Committee, 2014. International World Wide Web Conferences Steering Committee, pp 735–746
Wang S., Lo D., Jiang L. (2013) An empirical study on developer interactions in stackoverflow. ACM
Wang S., Lo D., Vasilescu B., Serebrenik A. (2014) Entagrec: an enhanced tag recommendation system for software information sites. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 291–300
Ye D., Xing Z., Foo C.Y., Ang Z.Q., Li J., Kapre N. Software-specific named entity recognition in software engineering social content. In: The 23rd IEEE international conference on software analysis, Evolution and Reengineering (SANER 2016), 2016 Accepted to appear. Preprint available at: http://yedeheng.weebly.com/uploads/5/0/3/9/50390459/saner2016.pdf
Zhang J., Ackerman M.S., Adamic L. (2007) Expertise networks in online communities: structure and algorithms. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 221–230
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Helen Sharp
Rights and permissions
About this article
Cite this article
Ye, D., Xing, Z. & Kapre, N. The structure and dynamics of knowledge network in domain-specific Q&A sites: a case study of stack overflow. Empir Software Eng 22, 375–406 (2017). https://doi.org/10.1007/s10664-016-9430-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-016-9430-z