skip to main content
10.1145/2462932.2462948acmconferencesArticle/Chapter ViewAbstractPublication PageswikisymConference Proceedingsconference-collections
research-article

Classifying Wikipedia articles using network motif counts and ratios

Published: 27 August 2012 Publication History

Abstract

Because the production of Wikipedia articles is a collaborative process, the edit network around a article can tell us something about the quality of that article. Articles that have received little attention will have sparse networks; at the other end of the spectrum, articles that are Wikipedia battle grounds will have very crowded networks. In this paper we evaluate the idea of characterizing edit networks as a vector of motif counts that can be used in clustering and classification. Our objective is not immediately to develop a powerful classifier but to assess what is the signal in network motifs. We show that this motif count vector representation is effective for classifying articles on the Wikipedia quality scale. We further show that ratios of motif counts can effectively overcome normalization problems when comparing networks of radically different sizes.

References

[1]
B. Adler and L. De Alfaro. A content-driven reputation system for the Wikipedia. In Proceedings of the 16th International Conference on World Wide Web, page 270. ACM, 2007.
[2]
B. Adler, L. de Alfaro, I. Pye, and V. Raman. Measuring author contributions to the Wikipedia. In Proceedings of the 4th International Symposium on Wikis, pages 1--10. ACM, 2008.
[3]
E. G. Allan, Jr., W. H. Turkett, Jr., and E. W. Fulp. Using network motifs to identify application protocols. In Proceedings of the 28th IEEE Conference on Global Telecommunications, GLOBECOM'09, pages 4266--4272, Piscataway, NJ, USA, 2009. IEEE Press.
[4]
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08, pages 16--24, New York, NY, USA, 2008. ACM.
[5]
P. Boykin and V. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61--68, 2005.
[6]
U. Brandes, P. Kenis, J. Lerner, and D. van Raaij. Network analysis of collaboration structure in Wikipedia. In Proceedings of the 18th International Conference on World Wide Web, pages 731--740. ACM, 2009.
[7]
M. Cord and P. Cunningham. Machine learning techniques for multimedia: case studies on organization and retrieval. Springer-Verlag New York Inc., 2008.
[8]
D. Dalip, M. Gonçalves, M. Cristo, and P. Calado. Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia. In Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 295--304, 2009.
[9]
D. Eppstein and E. Spiro. The h-Index of a Graph and its Application to Dynamic Subgraph Statistics. In F. Dehne, M. Gavrilova, J. Sack, and C. Tóth, editors, Proceedings of the 11th International Symposium on Algorithms and Data Structures (WADS'09), pages 278--289. Springer, 2009.
[10]
J. Giles. Internet encyclopaedias go head to head. Nature, 438(7070):900--901, 2005.
[11]
I. Gradshtein, I. Ryzhik, and A. Jeffrey. Table of integrals, series, and products. Academic Press, 2000.
[12]
S. Javanmardi, D. McDonald, and C. Lopes. Vandalism detection in wikipedia: a high-performing, feature-rich model and its reduction through lasso. In Proceedings of the 7th International Symposium on Wikis and Open Collaboration, pages 82--90. ACM, 2011.
[13]
K. Juszczyszyn, P. Kazienko, and K. Musial. Local topology of social network based on motif analysis. In I. Lovrek, R. Howlett, and L. Jain, editors, Knowledge-Based Intelligent Information and Engineering Systems, volume 5178 of Lecture Notes in Computer Science, pages 97--105. Springer Berlin/Heidelberg, 2008.
[14]
N. Korfiatis, M. Poulos, and G. Bokos. Evaluating authoritative sources using social networks: an insight from Wikipedia. Online Information Review, 30(3):252--262, 2006.
[15]
D. Laniado and R. Tasso. Co-authorship 2.0: Patterns of collaboration in Wikipedia. In Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia, pages 201--210. ACM, 2011.
[16]
A. Lih. Wikipedia as participatory journalism: reliable sources? metrics for evaluating collaborative media as a news resource. In In Proceedings of the 5th International Symposium on Online Journalism, pages 16--17, 2004.
[17]
N. Lipka and B. Stein. Identifying featured articles in Wikipedia: writing style matters. In Proceedings of the 19th International Conference on World Wide Web, pages 1147--1148. ACM, 2010.
[18]
B. McKay. Practical graph isomorphism. Congressus Numerantium, 30(30):47--87, 1981.
[19]
K. Paton. An algorithm for finding a fundamental set of cycles of a graph. Communications of the ACM, 12(9):514--518, 1969.
[20]
N. Pržulj. Biological network comparison using graphlet degree distribution. Bioinformatics, 23(2):e177--e183, 2007.
[21]
E. Raymond. The cathedral and the bazaar. Knowledge, Technology & Policy, 12(3):23--49, 1999.
[22]
J. Surowiecki, M. Silverman, et al. The wisdom of crowds. American Journal of Physics, 75:190, 2007.
[23]
S. Wernicke and F. Rasche. FANMOD: a tool for fast network motif detection. Bioinformatics, 22(9):1152, 2006.
[24]
G. Wu, M. Harrigan, and P. Cunningham. A Characterization of Wikipedia Content Based on Motifs in the Edit Graph. In 22nd Irish Conference on Artificial Intelligence and Cognitive Science (AICS'11), pages 166--173, September 2011.
[25]
G. Wu, M. Harrigan, and P. Cunningham. Characterizing wikipedia pages using edit network motif profiles. In Proceedings of the 3rd international workshop on Search and mining user-generated contents, SMUC '11, pages 45--52, New York, NY, USA, 2011. ACM.

Cited By

View all
  • (2023)Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature ReviewACM Computing Surveys10.1145/362528656:4(1-37)Online publication date: 10-Nov-2023
  • (2022)Assessing Information Quality of Wikipedia Articles Through Google’s E-A-T ModelIEEE Access10.1109/ACCESS.2022.317296210(52196-52209)Online publication date: 2022
  • (2022)Understanding the characteristics of COVID-19 misinformation communities through graphlet analysisOnline Social Networks and Media10.1016/j.osnem.2021.10017827(100178)Online publication date: Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WikiSym '12: Proceedings of the Eighth Annual International Symposium on Wikis and Open Collaboration
August 2012
295 pages
ISBN:9781450316057
DOI:10.1145/2462932
  • General Chair:
  • Cliff Lampe
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 August 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Wikipedia quality
  2. edit networks

Qualifiers

  • Research-article

Funding Sources

Conference

WikiSym '12
Sponsor:

Acceptance Rates

WikiSym '12 Paper Acceptance Rate 21 of 37 submissions, 57%;
Overall Acceptance Rate 69 of 145 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature ReviewACM Computing Surveys10.1145/362528656:4(1-37)Online publication date: 10-Nov-2023
  • (2022)Assessing Information Quality of Wikipedia Articles Through Google’s E-A-T ModelIEEE Access10.1109/ACCESS.2022.317296210(52196-52209)Online publication date: 2022
  • (2022)Understanding the characteristics of COVID-19 misinformation communities through graphlet analysisOnline Social Networks and Media10.1016/j.osnem.2021.10017827(100178)Online publication date: Jan-2022
  • (2021)Structural Analysis of Wikigraph to Investigate Quality Grades of Wikipedia ArticlesCompanion Proceedings of the Web Conference 202110.1145/3442442.3452345(584-590)Online publication date: 19-Apr-2021
  • (2021)Measuring Quality of Wikipedia Articles by Feature Fusion‐based Stack LearningProceedings of the Association for Information Science and Technology10.1002/pra2.44958:1(206-217)Online publication date: 13-Oct-2021
  • (2020)Assessing temporal and spatial features in detecting disruptive users on RedditProceedings of the 12th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1109/ASONAM49781.2020.9381426(892-896)Online publication date: 7-Dec-2020
  • (2019)Quality Assessment of Peer-Produced Content in Knowledge Repositories Using Big Data and Social NetworksACM SIGMIS Database: the DATABASE for Advances in Information Systems10.1145/3371041.337104550:4(28-51)Online publication date: 1-Nov-2019
  • (2019)Understanding the Signature of Controversial Wikipedia Articles through Motifs in Editor Revision NetworksCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316754(1180-1187)Online publication date: 13-May-2019
  • (2018)Graphlet-orbit Transitions (GoT): A fingerprint for temporal network comparisonPLOS ONE10.1371/journal.pone.020549713:10(e0205497)Online publication date: 18-Oct-2018
  • (2018)Social-collaborative determinants of content quality in online knowledge production systems: comparing Wikipedia and Stack OverflowSocial Network Analysis and Mining10.1007/s13278-018-0512-38:1Online publication date: 5-May-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media