Abstract
Collaborations such as Wikipedia are a key part of the value of the modern Internet. At the same time there is concern that these collaborations are threatened by high levels of member withdrawal. In this paper we borrow ideas from topic analysis to study editor activity on Wikipedia over time using latent space analysis, which offers an insight into the evolving patterns of editor behaviour. This latent space representation reveals a number of different categories of editor (e.g. Technical Experts, Social Networkers) and we show that it does provide a signal that predicts an editor’s departure from the community. We also show that long term editors generally have more diversified edit preference and experience relatively soft evolution in their editor profiles, while short term editors generally distribute their contribution at random among the namespaces and categories of articles and experience considerable fluctuation in the evolution of their editor profiles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Common work archetypes refer to the types of contribution that users make in online platforms, e.g. answering questions in Q&A sites and editing main pages in Wikipedia.
- 2.
- 3.
At the time we collected data for this work, there was 22 macro/top-categories: http://en.wikipedia.org/wiki/Category:Main_topic_classifications.
- 4.
- 5.
The category graph is a directed one due to the nature of category-subcategory structure.
- 6.
- 7.
Available at: http://www.cs.princeton.edu/~blei/topicmodeling.html.
- 8.
We experimented with different number of topics \(k \in \) [5, 45] with steps of 5 on the quarterly dataset using Non-negative Matrix Factorization (NMF) clustering, and then employed the mean pairwise normalized mutual information (NPMI) and mean pairwise Jaccard similarity (MPJ) as suggested by [11] to assess the coherence and generality of the topics for different ks. To cluster the quarterly data matrix efficiently, we used the fast alternating least squares variant of NMF introduced by Lin [10]. To produce deterministic results and avoid a poor local minimum, we used the Non-negative Double Singular Value Decomposition (NNDSVD) strategy [5] to choose initial factors for NMF. We found that overall, the run with 10 topics generates more coherent and general topics, and thus provides more interpretable and expressiveness results in terms of interpretation and overlapping between different topics.
- 9.
In Wikipedia, bots are generally programs or scripts that make repetitive automated or semi-automated edits without the necessity of human decision-making: http://en.wikipedia.org/wiki/Wikipedia:Bot_policy.
- 10.
The implementation of the test for R and Python envoriment can refer to: http://jpktd.blogspot.ie/2013/03/multiple-comparison-and-tukey-hsd-or_25.html.
References
Ahmed, A., Low, Y., Aly, M., Josifovski, V., Smola, A.J.: Scalable distributed inference of dynamic user interests for behavioral targeting. In: Proceedings of KDD, pp. 114–122. ACM (2011)
Ahmed, A., Xing, E.P.: Timeline: a dynamic hierarchical dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of UAI, pp. 20–29 (2010)
Blei, D.M., Lafferty, J.D.: Dynamic topic models. In: Proceedings of ICML, pp. 113–120 (2006)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Boutsidis, C., Gallopoulos, E.: SVD based initialization: a head start for non-negative matrix factorization. In: Pattern Recognition (2008)
Chan, J., Hayes, C., Daly, E.M.: Decomposing discussion forums using user roles. In: Proceedings of ICWSM, pp. 215–218 (2010)
Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J., Potts, C.: No country for old members: user lifecycle and linguistic change in online communities. In: Proceedings of WWW, pp. 307–318. Rio de Janeiro, Brazil (2013)
Furtado, A., Andrade, N., Oliveira, N., Brasileiro, F.: Contributor profiles, their dynamics, and their importance in five Q&A sites. In: Proceedings of CSCW, pp. 1237–1252, Texas (2013)
Hulpus, I., Hayes, C., Karnstedt, M., Greene, D.: Unsupervised graph-based topic labelling using dbpedia. In: Proceedings of WSDM, pp. 465–474. ACM (2013)
Lin, C.: Projected gradient methods for non-negative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)
O’Callaghan, D., Greene, D., Carthy, J., Cunningham, P.: An analysis of the coherence of descriptors in topic modeling. Expert Syst. Appl. 42(13), 5645–5657 (2015)
Panciera, K., Halfaker, A., Terveen, L.: Wikipedians are born, not made: a study of power editors on wikipedia. In: Proceedings of GROUP, pp. 51–60. ACM (2009)
Jin, Y., Zhang, S., Zhao, Y., Chen, H., Sun, J., Zhang, Y., Chen, C.: Mining and information integration practice for chinese bibliographic database of life sciences. In: Perner, P. (ed.) ICDM 2013. LNCS, vol. 7987, pp. 1–10. Springer, Heidelberg (2013)
Tukey, J.W.: Comparing individual means in the analysis of variance. Biometrics 5(2), 99–114 (1949)
Wang, X., McCallum, A.: Topics over time: a non-Markov continuous-time model of topical trends. In: Proceedings of KDD, pp. 424–433. ACM (2006)
Weia, C.P., Chiub, I.T.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Syst. Appl. 23(2), 103–112 (2002)
Welser, H.T., Cosley, D., Kossinets, G., Lin, A., Dokshin, F., Gay, G., Smith, M.: Finding social roles in wikipedia. In: Proceedings of iConference, pp. 122–129. ACM (2011)
Acknowledgements
This work is supported by Science Foundation Ireland (SFI) under Grant No. SFI/12/RC/2289 (Insight Centre for Data Analytics). Xiangju Qin is funded by University College Dublin and China Scholarship Council (UCD-CSC Joint Scholarship 2011).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Qin, X., Greene, D., Cunningham, P. (2016). A Latent Space Analysis of Editor Lifecycles in Wikipedia. In: Atzmueller, M., Chin, A., Janssen, F., Schweizer, I., Trattner, C. (eds) Big Data Analytics in the Social and Ubiquitous Context. SENSEML MUSE MSM 2015 2014 2014. Lecture Notes in Computer Science(), vol 9546. Springer, Cham. https://doi.org/10.1007/978-3-319-29009-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-29009-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29008-9
Online ISBN: 978-3-319-29009-6
eBook Packages: Computer ScienceComputer Science (R0)