skip to main content
10.1145/3479986.3479995acmotherconferencesArticle/Chapter ViewAbstractPublication PagesopencollabConference Proceedingsconference-collections
research-article

Extracting and Visualizing User Engagement on Wikipedia Talk Pages

Published: 15 October 2021 Publication History

Abstract

As Wikipedia has grown in popularity, it is important to investigate its diverse user community and collaborative editorial base. Although all user data, from traffic to user edits, are available for download under a free and open license, it is difficult to work with this data due to its scale.
In this paper, we demonstrate how consumer hardware can be used to create a local database of Wikipedia’s full edit history from their public XML data dumps. Using this database, we create and present the first visualizations of how editing on talk pages differs between user groups. Our visualizations demonstrate that low quality edits are primarily performed by IP users, rather than blocked users, and that overall engagement with talk pages has plateaued over the last 10 years across all user groups. Finally, we investigate the feasibility of classifying blocked users using this dataset as an example of future research directions. However, we demonstrate the difficulty of this task and find that additional data or a more advanced model would be needed to classify them, as our approach didn’t provide sufficient information to do this.
We anticipate that our visualizations and data extraction process are of interest to the community and will provide researchers with the tools needed to use Wikipedia’s valuable data when resources are limited.

References

[1]
Alex Woodson. 2007. Wikipedia remains go-to site for online news. https://www.reuters.com/article/us-media-wikipedia/wikipedia-remains-go-to-site-for-online-news-idUSN0819429120070708. [Accessed 5-May-2020].
[2]
Daniel Bégin, Rodolphe Devillers, and Stéphane Roche. 2018. The life cycle of contributors in collaborative online communities—the case of OpenStreetMap. International Journal of Geographical Information Science 32, 8(2018), 1611–1630. https://doi.org/10.1080/13658816.2018.1458312 arXiv:https://doi.org/10.1080/13658816.2018.1458312
[3]
Anamika Chhabra, Rishemjit Kaur, and S. R.S. Iyengar. 2020. Dynamics of Edit War Sequences in Wikipedia. In Proceedings of the 16th International Symposium on Open Collaboration (Virtual conference, Spain) (OpenSym 2020). Association for Computing Machinery, New York, NY, USA, Article 8, 10 pages. https://doi.org/10.1145/3412569.3412585
[4]
Christine de Kock and Andreas Vlachos. 2021. I Beg to Differ: A study of constructive disagreement in online conversations. arxiv:2101.10917 [cs.CL]
[5]
Brian Everitt. 1998. The Cambridge Dictionary of Statistics. Cambridge University Press, Cambridge, UK ; New York.
[6]
Aaron Halfaker. 2017. Mediawiki-utilities/mwxml - MediaWiki. https://www.mediawiki.org/wiki/Mediawiki-utilities/mwxml [Online; accessed 8-May-2020].
[7]
Aaron Halfaker, R. Stuart Geiger, Jonathan Morgan, and John Riedl. 2013. The Rise and Decline of an Open Collaboration System: How Wikipedia’s reaction to sudden popularity is causing its decline. American Behavioral Scientist 57, 5 (May 2013), 664–688. https://doi.org/10.1177/0002764212469365
[8]
Sara Javanmardi, David W McDonald, and Cristina V Lopes. 2011. Vandalism detection in Wikipedia: a high-performing, feature-rich model and its reduction through Lasso. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration. ACM, 82–90.
[9]
Carlin MacKenzie. 2020. Namespace Database - A tool to create a database of Wikipedia edits. https://www.github.com/carlinmack/NamespaceDatabase/. https://doi.org/10.5281/zenodo.3817987
[10]
Martin von Gagern. 2014. GNU Wdiff. https://www.gnu.org/software/wdiff/. [Accessed 8-May-2020].
[11]
Sergio Martinez-Ortuno, Deepak Menghani, and Lars Roemheld. 2014. Sentiment as a Predictor of Wikipedia Editor Activity. (2014).
[12]
Paolo Massa. 2011. Social networks of wikipedia. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia. 221–230. https://www.gnuband.org/papers/social_networks_of_wikipedia/
[13]
David McCandless. 2020. Wikipedia’s lamest edit wars. https://informationisbeautiful.net/visualizations/wikipedia-lamest-edit-wars/
[14]
Charu Rawat, Arnab Sarkar, Sameer Singh, Rafael Alvarado, and Lane Rasberry. 2019. Automatic Detection of Online Abuse and Analysis of Problematic Users in Wikipedia. In 2019 Systems and Information Engineering Design Symposium (SIEDS). IEEE. https://doi.org/10.1109/sieds.2019.8735592
[15]
Jodi Schneider, John G Breslin, and Alexandre Passant. 2010. A content analysis: How Wikipedia talk pages are used. (2010).
[16]
Julian Seward. 1996. bzip2 and libbzip2. http://sourceware.org/bzip2/ [Online; accessed 20-June-2021].
[17]
Bongwon Suh, Gregorio Convertino, Ed H. Chi, and Peter Pirolli. 2009. The Singularity is Not near: Slowing Growth of Wikipedia. In Proceedings of the 5th International Symposium on Wikis and Open Collaboration (Orlando, Florida) (WikiSym ’09). Association for Computing Machinery, New York, NY, USA, Article 8, 10 pages. https://doi.org/10.1145/1641309.1641322
[18]
Dario Taraborelli and Giovanni Luca Ciampaglia. 2010. Beyond Notability. Collective Deliberation on Content Inclusion in Wikipedia. In 2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems Workshop. 122–125. https://doi.org/10.1109/SASOW.2010.26
[19]
Wikipedia contributors. 2021. Wikipedia:Database download. https://en.wikipedia.org/wiki/Wikipedia:Database_download [Online; accessed 13-June-2021].
[20]
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn(Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 44–60.
[21]
Justine Zhang, Jonathan Chang, Cristian Danescu-Niculescu-Mizil, Lucas Dixon, Yiqing Hua, Dario Taraborelli, and Nithum Thain. 2018. Conversations Gone Awry: Detecting Early Signs of Conversational Failure. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Melbourne, Australia, 1350–1361. https://doi.org/10.18653/v1/P18-1125

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
OpenSym '21: Proceedings of the 17th International Symposium on Open Collaboration
September 2021
136 pages
ISBN:9781450385008
DOI:10.1145/3479986
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wikipedia
  2. classification
  3. data extraction
  4. data visualization
  5. talk pages

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

OpenSym 2021

Acceptance Rates

Overall Acceptance Rate 108 of 195 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 67
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)1
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media