Skip to main content

Wikipedia Revision Graph Extraction Based on N-Gram Cover

  • Conference paper
Web-Age Information Management (WAIM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7419))

Included in the following conference series:

Abstract

During the past decade, mass collaboration systems have emerged and thrived on the World-Wide Web, with numerous user contents generated. As one of such systems, Wikipedia allows users to add and edit articles in this encyclopedic knowledge base and piles of revisions have been contributed. Wikipedia maintains a linear record of edit history with timestamp for each article, which includes precious information on how each article has evolved. However, meaningful revision evolution features like branching and revert are implicit and needed to be reconstructed. Also, existence of merges from multiple ancestors indicates that the edit history shall be modeled as a directed acyclic graph. To address these issues, we propose a revision graph extraction method based on n-gram cover that effectively find branching and revert. We evaluate the accuracy of our method by comparing with manually constructed revision graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Adler, T.B., de Alfaro, L.: A Content-driven reputation system for the Wikipedia. In: WWW (2007)

    Google Scholar 

  2. Cao, Z., Iwaihara, M.: Wikipedia version tree reconstruction by clustering revisions through keywords. IEICE Technical Report DE2011-32 (2011)

    Google Scholar 

  3. Doan, R.R., Halevy, A.Y.: Crowdsourcing systems on the World-Wide Web. Commun. ACM 54(4), 86–96 (2011)

    Article  Google Scholar 

  4. Heintze, N.: Scalable document fingerprinting (extended abstract). In: Proc. USENIX Workshop on Electronic Commerce (1996)

    Google Scholar 

  5. Hoad, T., Zobel, J.: Methods for Identifying Versioned and Plagiarised Documents. Journal of the American Society for Information Science and Technology 54 (2003)

    Google Scholar 

  6. Lih, A.: Wikipedia as participatory journalism: Reliablesources? Metrics for evaluating collaborative media as a news resource. In: Proc. Int. Symp. Online Journalism (2004)

    Google Scholar 

  7. Navallo, G.: A Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1) (2001)

    Google Scholar 

  8. Sabel, M.: Structuring wiki revision history. In: WikiSym, pp. 125–130 (2007)

    Google Scholar 

  9. de Brum Saccol, D., Edelweiss, N., de Matos Galante, R., Zaniolo, C.: XML version detection. In: Proc. ACM DocEng 2007, pp. 79–88 (2007)

    Google Scholar 

  10. Ukkonen, E.: Approximate String Matching with q-grams and maximal matches. Theor. Comput. Sci. 1, 191–211 (1992)

    Article  MathSciNet  Google Scholar 

  11. Viégas, F.B., Wattenberg, M., Dave, K.: Studying cooperation and conflict between authors with history flow visualizations. In: Proc. ACM CHI 2004, pp. 575–582 (2004)

    Google Scholar 

  12. Wang, S., Iwaihara, M.: Quality Evaluation of Wikipedia Articles through Edit History and Editor Groups. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds.) APWeb 2011. LNCS, vol. 6612, pp. 188–199. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  13. Wöhner, T., Peters, R.: Assessing the quality of Wikipedia articles with lifecycle based metrics. In: Proc. 5th Int. Symp. Wikis and Open Collaboration (2009)

    Google Scholar 

  14. Zeng, H., Alhossaini, M., Ding, L., Fikes, R., McGuinness, D.L., Computing Trust from Revision History. In: Proc. Int. Conf. Privacy, Security and Trust (2006)

    Google Scholar 

  15. Dom4j, http://dom4j.sourceforge.net/

  16. Wikipedia, http://en.wikipedia.org/wiki/Wikipedia

  17. Wikipedia Editing, http://en.wikipedia.org/wiki/Wikipedia:How_to_edit_a_page

  18. Wikipedia edit history export pages, http://en.wikipedia.org/w/index.php?title=Special:Export&action=submit

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, J., Iwaihara, M. (2012). Wikipedia Revision Graph Extraction Based on N-Gram Cover. In: Bao, Z., et al. Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33050-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33050-6_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33049-0

  • Online ISBN: 978-3-642-33050-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics