Skip to main content

Analysis and Modeling of Activity-Selection Behavior in Collaborative Knowledge-Building

  • Chapter
  • First Online:
Transactions on Computational Collective Intelligence XXXVI

Abstract

People neither behave uniformly in their social lives nor is their behavior entirely arbitrary. Rather, their behavior depends on various factors such as their skills, motives, and backgrounds. Our analysis shows that such a behavior also prevails in the websites of Stack Exchange. We collect and analyze the data of over 5.3 million users from 156 Stack Exchange websites. In these websites, users’ diverse behavior shows up in the form of different activities that they choose to perform as well as how they stimulate each other for more contribution. Using the insights gained from the empirical analysis as well as the classical cognitive theories, we build a general cognitive model depicting the users’ interaction behavior emerging in collaborative knowledge-building setups. Further, the analysis of the model indicates that for any given collaborative system, there is an optimal distribution of users across its activities that leads to the maximum knowledge generation. We also apply the model on Stack Exchange websites and identify the under-represented activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

eBook
USD 12.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Note

A preliminary version of this work is published as a poster in The Web Conference 2020 [13] and is published in the proceedings of the 12th International Conference on Computational Collective Intelligence (ICCCI) 2020 [11].

Notes

  1. 1.

    http://area51.stackexchange.com/.

  2. 2.

    https://archive.org/download/stackexchange.

  3. 3.

    The analysis does not consider those users who created their account but never contributed to the website in any way. There is a large number of users on these websites that create an account, however, they remain passive knowledge consumers. On Stack Exchange websites, the average fraction of users who did not contribute at all in questioning, answering or voting was found to be 54.10%.

  4. 4.

    To particularly focus on the effect of user-distribution on the amount of knowledge produced, we assume that the number of users remains fixed over time. Nevertheless, the outcomes of the model may be used even for the cases where the number of users keeps changing, by evaluating the given system at small time windows considering the average number of users present in that time window.

  5. 5.

    https://meta.stackexchange.com/questions/141648/what-is-the-association-bonus.

Reference

  1. Adamic, L.A., Zhang, J., Bakshy, E., Ackerman, M.S.: Knowledge sharing and Yahoo answers: everyone knows something. In: Proceedings of the 17th International Conference on World Wide Web, pp. 665–674. ACM (2008)

    Google Scholar 

  2. Agrawal, R., Rajagopalan, S., Srikant, R., Xu, Y.: Mining newsgroups using networks arising from social behavior. In: Proceedings of the 12th International Conference on World Wide Web, pp. 529–535. ACM (2003)

    Google Scholar 

  3. Anderson, A., Huttenlocher, D., Kleinberg, J., Leskovec, J.: Steering user behavior with badges. In: Proceedings of the 22nd International Conference on World Wide Web, pp. 95–106. ACM (2013)

    Google Scholar 

  4. Anthony, D., Smith, S.W., Williamson, T.: Explaining Quality in Internet Collective Goods: Zealots and Good Samaritans in the Case of Wikipedia. Dartmouth College, Hanover (2005)

    Google Scholar 

  5. Arazy, O., Lifshitz-Assaf, H., Nov, O., Daxenberger, J., Balestra, M., Cheshire, C.: On the ‘how’ and ‘why’ of emergent role behaviors in Wikipedia. In: Conference on Computer-Supported Cooperative Work and Social Computing, vol. 35 (2017)

    Google Scholar 

  6. Arazy, O., Nov, O., Patterson, R., Yeo, L.: Information quality in Wikipedia: the effects of group composition and task conflict. J. Manag. Inf. Syst. 27(4), 71–98 (2011)

    Article  Google Scholar 

  7. Arazy, O., Ortega, F., Nov, O., Yeo, L., Balila, A.: Functional roles and career paths in Wikipedia. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, pp. 1092–1105. ACM (2015)

    Google Scholar 

  8. Aubin, J.P.: Applied Functional Analysis, vol. 47. Wiley, Hoboken (2011)

    Google Scholar 

  9. Chhabra, A., Iyengar, S.S.: Characterizing the triggering phenomenon in Wikipedia. In: Proceedings of the 14th International Symposium on Open Collaboration, p. 11. ACM (2018)

    Google Scholar 

  10. Chhabra, A., Iyengar, S.: How does knowledge come by? arXiv preprint arXiv:1705.06946 (2017)

  11. Chhabra, A., Iyengar, S.R.S., Saini, J.S., Malik, V.: Activity-selection behavior and optimal user-distribution in Q&A websites. In: Nguyen, N.T., Hoang, B.H., Huynh, C.P., Hwang, D., Trawiński, B., Vossen, G. (eds.) ICCCI 2020. LNCS (LNAI), vol. 12496, pp. 853–865. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63007-2_67

    Chapter  Google Scholar 

  12. Chhabra, A., Iyengar, S., Saini, P., Bhat, R.S.: Presence of an ecosystem: a catalyst in the knowledge building process in crowdsourced annotation environments. In: Proceedings of the 2015 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2015) (2015)

    Google Scholar 

  13. Chhabra, A., RS Iyengar, S.: Activity-selection behavior of users in stackexchange websites. In: Companion Proceedings of the Web Conference 2020, pp. 105–106 (2020)

    Google Scholar 

  14. Cress, U., Kimmerle, J.: A systemic and cognitive view on collaborative knowledge building with wikis. Int. J. Comput.-Support. Collab. Learn. 3(2), 105–122 (2008). https://doi.org/10.1007/s11412-007-9035-z

    Article  Google Scholar 

  15. Erickson, L., Petrick, I., Trauth, E.: Hanging with the right crowd: matching crowdsourcing need to crowd characteristics (2012)

    Google Scholar 

  16. Fisher, D., Smith, M., Welser, H.T.: You are who you talk to: detecting roles in usenet newsgroups. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, HICSS 2006, vol. 3, pp. 59b–59b. IEEE (2006)

    Google Scholar 

  17. Fisher, K., Lipson, J.I.: Information processing interpretation of errors in college science learning. Instr. Sci. 14(1), 49–74 (1985)

    Article  Google Scholar 

  18. Fugelstad, P., et al.: What makes users rate (share, tag, edit...)?: predicting patterns of participation in online communities. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 969–978. ACM (2012)

    Google Scholar 

  19. Furtado, A., Andrade, N., Oliveira, N., Brasileiro, F.: Contributor profiles, their dynamics, and their importance in five Q&A sites. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 1237–1252. ACM (2013)

    Google Scholar 

  20. Golder, S.A., Donath, J.: Social roles in electronic communities. Internet Res. 5, 19–22 (2004)

    Google Scholar 

  21. Gruenfeld, D.H., Mannix, E.A., Williams, K.Y., Neale, M.A.: Group composition and decision making: how member familiarity and information distribution affect process and performance. Organ. Behav. Hum. Decis. Process. 67(1), 1–15 (1996)

    Article  Google Scholar 

  22. Guzdial, M., Rick, J., Kerimbaev, B.: Recognizing and supporting roles in CSCW. In: Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, pp. 261–268. ACM (2000)

    Google Scholar 

  23. Hanrahan, B.V., Convertino, G., Nelson, L.: Modeling problem difficulty and expertise in stackoverflow. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work Companion, pp. 91–94. ACM (2012)

    Google Scholar 

  24. He, J., Tan, A.H., Tan, C.L., Sung, S.Y.: On quantitative evaluation of clustering systems. In: Wu, W., Xiong, H., Shekhar, S. (eds.) Clustering and Information Retrieval. NETA, vol. 11, pp. 105–133. Springer, Boston (2004). https://doi.org/10.1007/978-1-4613-0227-8_4

    Chapter  Google Scholar 

  25. Heaps, H.S.: Information Retrieval, Computational and Theoretical Aspects. Academic Press, Cambridge (1978)

    MATH  Google Scholar 

  26. Hong, L., Page, S.E.: Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proc. Natl. Acad. Sci. U.S.A. 101(46), 16385–16389 (2004)

    Article  Google Scholar 

  27. Iacopini, I., Milojević, S., Latora, V.: Network dynamics of innovation processes. Phys. Rev. Lett. 120(4), 048301 (2018)

    Google Scholar 

  28. Jehn, K.A., Northcraft, G.B., Neale, M.A.: Why differences make a difference: a field study of diversity, conflict and performance in workgroups. Adm. Sci. Q. 44(4), 741–763 (1999)

    Article  Google Scholar 

  29. Just, M.A., Carpenter, P.A.: A theory of reading: from eye fixations to comprehension. Psychol. Rev. 87(4), 329 (1980)

    Article  Google Scholar 

  30. Kobren, A., Tan, C.H., Ipeirotis, P., Gabrilovich, E.: Getting more for less: optimized crowdsourcing with dynamic tasks and goals. In: Proceedings of the 24th International Conference on World Wide Web, pp. 592–602. International World Wide Web Conferences Steering Committee (2015)

    Google Scholar 

  31. Kriplean, T., Beschastnikh, I., McDonald, D.W.: Articulations of WikiWork: uncovering valued work in Wikipedia through barnstars. In: Proceedings of the 2008 ACM Conference on Computer Supported Cooperative Work, pp. 47–56. ACM (2008)

    Google Scholar 

  32. Liu, J., Ram, S.: Who does what: collaboration patterns in the Wikipedia and their impact on article quality. ACM Trans. Manag. Inf. Syst. (TMIS) 2(2), 11 (2011)

    Google Scholar 

  33. Luhmann, N.: Social Systems. Stanford University Press (1995)

    Google Scholar 

  34. Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G., Hartmann, B.: Design lessons from the fastest Q&A site in the west. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 2857–2866. ACM (2011)

    Google Scholar 

  35. Marengo, L., Zeppini, P.: The arrival of the new. J. Evol. Econ. 26(1), 171–194 (2016)

    Article  Google Scholar 

  36. Minsky, M.: Frame-System Theory. Thinking: Readings in Cognitive Science, pp. 355–376 (1977)

    Google Scholar 

  37. Movshovitz-Attias, D., Movshovitz-Attias, Y., Steenkiste, P., Faloutsos, C.: Analysis of the reputation system and user contributions on a question answering website: stackoverflow. In: 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 886–893. IEEE (2013)

    Google Scholar 

  38. Nam, K.K., Ackerman, M.S., Adamic, L.A.: Questions in, knowledge in?: a study of Naver’s question answering community. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 779–788. ACM (2009)

    Google Scholar 

  39. Norman, D.A.: Categorization of action slips. Psychol. Rev. 88(1), 1 (1981)

    Article  Google Scholar 

  40. Pal, A., Chang, S., Konstan, J.A.: Evolution of experts in question answering communities. In: ICWSM (2012)

    Google Scholar 

  41. Piaget, J.: Piaget’s Theory. Springer, Heidelberg (1976)

    Book  Google Scholar 

  42. Piaget, J.: The Development of Thought: Equilibration of Cognitive Structures. (Trans A. Rosin). Viking (1977)

    Google Scholar 

  43. Rezgui, A., Crowston, K.: Stigmergic coordination in Wikipedia. In: Proceedings of the 14th International Symposium on Open Collaboration, p. 19. ACM (2018)

    Google Scholar 

  44. Rumelhart, D.E.: Understanding Understanding. Memories, Thoughts and Emotions: Essays in Honor of George Mandler, pp. 257–275 (1991)

    Google Scholar 

  45. Secretan, J.: Stigmergic dimensions of online creative interaction. Cogn. Syst. Res. 21, 65–74 (2013)

    Article  Google Scholar 

  46. Tausczik, Y.R., Pennebaker, J.W.: Participation in an online mathematics community: differentiating motivations to add. In: Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, pp. 207–216 (2012)

    Google Scholar 

  47. Tria, F., Loreto, V., Servedio, V.D.P., Strogatz, S.H.: The dynamics of correlated novelties. Sci. Rep. 4, 5890 (2014)

    Article  Google Scholar 

  48. Turner, T.C., Smith, M.A., Fisher, D., Welser, H.T.: Picturing usenet: mapping computer-mediated collective action. J. Comput.-Mediated Commun. 10(4), JCMC1048 (2005)

    Google Scholar 

  49. Welser, H.T., et al.: Finding social roles in Wikipedia. In: Proceedings of the 2011 iConference, pp. 122–129. ACM (2011)

    Google Scholar 

  50. Welser, H.T., Gleave, E., Fisher, D., Smith, M.: Visualizing the signatures of social roles in online discussion groups. J. Soc. Struct. 8(2), 1–32 (2007)

    Google Scholar 

  51. Yang, D., Halfaker, A., Kraut, R.E., Hovy, E.H.: Who did what: editor role identification in Wikipedia. In: ICWSM, pp. 446–455 (2016)

    Google Scholar 

  52. Yang, J., Tao, K., Bozzon, A., Houben, G.-J.: Sparrows and owls: characterisation of expert behaviour in stackoverflow. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, G.-J. (eds.) UMAP 2014. LNCS, vol. 8538, pp. 266–277. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08786-3_23

    Chapter  Google Scholar 

Download references

Acknowledgments

This work was supported by WOS-A (Women Scientists - A), Department of Science and Technology, India [SR/WOS-A/ET-1058/2014] and CSRI (Cognitive Science Research Initiative), Department of Science and Technology, India [SR/CSRI/344/2016].

Author information

Authors and Affiliations

Authors

Contributions

A.C. and S.R.S.I. designed the project. A.C. and V.M. collected the data and performed the experiments. A.C. analyzed the results. A.C. and J.S.S. worked on the model. A.C. wrote the manuscript. A.C. and S.R.S.I. reviewed the manuscript.

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Stack Exchange Data Set Statistics

Table 2. Data Set Statistics (The websites are sorted as per their creation time)

1.2 A.2 Stack Exchange Policies Regarding Commenting and Editing

As per StackExchange rules, users require at least 50 reputation points to be able to unlock the feature of commenting on questions and answers that they don’t own. This policy has indeed been laid in order to discourage spam comments by casual users as well as to emphasize that Stackexchange restricts itself to a Q&A portal rather than a discussion forum such as ‘ubuntuforums.org’ where even comments such as ‘Thanks, that was useful’, ‘I agree’, ‘I have the same problem’ are allowed as answers. Moreover, commenting is done to better understand a question or an answer. It basically adds a small discussion thread along with a question or an answer, which Stack Exchange community discourages. However, one may ask the reason for having encountered some number of Uni-C in commenting in such a scenario. The reason for that are two more StackExchange policies, whereby if a user gets 200 reputation points on any one Stack Exchange site, then that user automatically gets an association bonus of 100 on every site, enabling him to contribute across any activity on any of the StackExchange websites. Also, StackExchange automatically converts trivial answers containing a link to another question in the network to comments on the question. Due to these reasons, we could find some, although small, number of Uni-C in commenting. On the other hand, the reason for having less number of Uni-C in editing is supposed to be the requirement that until a user gathers 2000 reputation points, their edits are likely to be rejected, i.e. they can not actually edit the content; they can only suggest the edits. Additionally, there is an upper limit of the reputation points that can be gained by editing others’ content viz. 1000 points. Beyond this, a user can not earn more reputation by editing. This further discourages the users to become an Uni-C in editing.

1.3 A.3 Uni-C, Bi-C and Tri-C in Stack Exchange Websites

Table 3. Percentage of Uni-C, Bi-C and Tri-C across websites. (The websites are sorted as per their creation time.)

1.4 A.4 Proportion of Uni-C Across the Activities in Stack Exchange Websites

Table 4. Percentage of Uni-C across the activities. (The websites are sorted as per their creation time.)

It should be noted that as per StackExchange policies, users require atleast 15 reputation points to be able to vote. The reason for the presence of uni-C in voting is the association bonusFootnote 5, whereby users who have atleast 200 reputation points on any of the StackExchange websites, get a bonus of 100 on each new StackExchange website that they register, in addition to the 1 reputation point that they normally get upon registering. This leads to a total of 101 reputation points automatically provided to them, enabling them to upvote or downvote content on the new website despite no contribution in questioning or answering on these new websites.

Further, the presence of less than 1% (i.e., 0.34%) uni-C in voting on StackOverflow depicts the possibility of users gaining bonus reputation points on other websites due to their contribution on StackOverflow rather than the other way around, as it is the oldest website.

1.5 A.5 Method Used for Finding the Optimal k

To verify the optimal value of k, we use a method provided by He et al. [24]. In their method, the authors compute two parameters ‘Cluster compactness (CMP)’ and ‘Cluster separation (SEP)’, where CMP captures the intra-cluster distances and SEP captures the inter-cluster distances. The formulae for CMP and SEP are given as below:

Cluster Compactness (CMP):

$$CMP = \frac{1}{C}\sum _{i}^{C}{\frac{v(c_i)}{v(X)}}$$

where,

$$v(X) = \sqrt{{\frac{1}{N}}\sum _{i=1}^{N}d^2(x_i, \bar{x})}$$

Cluster Separation (SEP):

$$SEP = \frac{1}{C(C-1)}\sum _{i=1}^{C}\sum _{j=1, j\ne i}^{C}\exp \left( {-\frac{d^2({x_c}_i, {x_c}_j)}{2\sigma ^2}}\right) $$

The formula for SEP is such that a smaller value of SEP indicates a larger inter-cluster distance. Further, the clusters should also be compact (measured by CMP). Therefore, for the optimal value of k, the values of both CMP, as well as SEP, should be minimum. The authors suggest using another parameter OCQ (Overall Cluster Quality) which is given as:

$$OCQ(\alpha ) = \alpha * CMP + (1 - \alpha ) * SEP$$

where \(\alpha \) indicates the relative weight assigned to inter-cluster and intra-cluster distances and lies between 0 and 1. A value of 1/2 for \(\alpha \) indicates equal weight for both CMP and SEP. For our analysis, we considered \(\alpha \) to be 1/2.

We used this method to compute the optimal k for all the websites. The Table in Fig. 5 shows the value of k along with the number of websites for which that value of k was found optimal. For most of the websites, k = 3 was the optimal value of k.

Table 5. Number of websites with the given ideal k. For 73.71% of the websites, ideal k value was found to be 3.

1.6 A.6 User-Distribution Obtained For Stack Exchange Websites

Table 6. Percentage Distribution of questioners, answerers and voters for each website.

1.7 A.7 Model Parameters for the Systems Studied in Chapter 3

The values in the matrix T were chosen uniformly at random between 0.00007 and 0.005 making sure that \(\rho (NT) < 1\).

System 1: n = 100, \(K_c(\infty )\) = 14289.74, \(\mathcal {D}\) = (22, 35, 43)

$$ T = \left[ {\begin{array}{ccc} 0.00045596 &{} 0.00435622 &{}0.00287159\\ 0.00382782 &{} 0.00076362 &{} 0.00499575\\ 0.00399529&{} 0.00348565 &{} 0.0018039\\ \end{array}}\right] $$

System 2: n = 100, \(K_c(\infty )\) = 13331.63, \(\mathcal {D}\) = (19, 51, 30)

$$ T = \left[ {\begin{array}{ccc} 0.00078797 &{} 0.00359952 &{} 0.00363374 \\ 0.00194002 &{} 0.0018636 &{} 0.00456399 \\ 0.0026924 &{} 0.00233821 &{} 0.00057316 \\ \end{array}}\right] $$

System 3: n = 200, \(K_c(\infty )\) = 46239.11, \(\mathcal {D}\) = (64.5, 18.5, 17)

$$ T = \left[ {\begin{array}{ccc} 0.00257957 &{} 0.00330136 &{} 0.00199484 \\ 0.0033283 &{} 0.00098843 &{} 0.00397102\\ 0.00491812 &{} 0.0016636 &{} 0.00101264\\ \end{array}}\right] $$

System 4: n = 200, \(K_c(\infty )\) = 37732.05, \(\mathcal {D}\) = (44, 44, 12)

$$ T = \left[ {\begin{array}{ccc} 0.00025509 &{} 0.00491508 &{} 0.00236663\\ 0.00365481 &{} 0.00047938 &{} 0.00425228\\ 0.0033825 &{} 0.0004316 &{} 0.00066393\\ \end{array}}\right] $$

Systems with Self-triggering = 0: n = 100 in all three systems.

System 1: \(K_c(\infty )\) = 12536.23, \(\mathcal {D}\) = (38, 32, 30)

$$ T = \left[ {\begin{array}{ccc} 0. &{} 0.00265766 &{} 0.00448791 \\ 0.00400489 &{} 0. &{} 0.00111834\\ 0.00194009 &{} 0.00390131 &{} 0. \\ \end{array}}\right] $$

System 2: \(K_c(\infty )\) = 11114.29, \(\mathcal {D}\) = (39, 37, 24)

$$ T = \left[ {\begin{array}{ccc} 0. &{} 0.00299839 &{} 0.00122855\\ 0.00070994 &{} 0. &{} 0.00168737\\ 0.00149221 &{} 0.00082065 &{} 0. \\ \end{array}}\right] $$

System 3: \(K_c(\infty )\) = 11924.27, \(\mathcal {D}\) = (34, 28, 38)

$$ T = \left[ {\begin{array}{ccc} 0. &{} 0.00264557 &{} 0.00119586\\ 0.00115856 &{} 0. &{} 0.00398131\\ 0.00448212 &{} 0.00097011 &{} 0. \\ \end{array}}\right] $$

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chhabra, A., Iyengar, S.R.S., Saini, J.S., Malik, V. (2021). Analysis and Modeling of Activity-Selection Behavior in Collaborative Knowledge-Building. In: Nguyen, N.T., Kowalczyk, R., Motylska-Kuźma, A., Mercik, J. (eds) Transactions on Computational Collective Intelligence XXXVI. Lecture Notes in Computer Science(), vol 13010. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-64563-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-64563-5_7

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-64562-8

  • Online ISBN: 978-3-662-64563-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics