Abstract
During the past decade, mass collaboration systems have emerged and thrived on the World-Wide Web, with numerous user contents generated. As one of such systems, Wikipedia allows users to add and edit articles in this encyclopedic knowledge base and piles of revisions have been contributed. Wikipedia maintains a linear record of edit history with timestamp for each article, which includes precious information on how each article has evolved. However, meaningful revision evolution features like branching and revert are implicit and needed to be reconstructed. Also, existence of merges from multiple ancestors indicates that the edit history shall be modeled as a directed acyclic graph. To address these issues, we propose a revision graph extraction method based on n-gram cover that effectively find branching and revert. We evaluate the accuracy of our method by comparing with manually constructed revision graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adler, T.B., de Alfaro, L.: A Content-driven reputation system for the Wikipedia. In: WWW (2007)
Cao, Z., Iwaihara, M.: Wikipedia version tree reconstruction by clustering revisions through keywords. IEICE Technical Report DE2011-32 (2011)
Doan, R.R., Halevy, A.Y.: Crowdsourcing systems on the World-Wide Web. Commun. ACM 54(4), 86–96 (2011)
Heintze, N.: Scalable document fingerprinting (extended abstract). In: Proc. USENIX Workshop on Electronic Commerce (1996)
Hoad, T., Zobel, J.: Methods for Identifying Versioned and Plagiarised Documents. Journal of the American Society for Information Science and Technology 54 (2003)
Lih, A.: Wikipedia as participatory journalism: Reliablesources? Metrics for evaluating collaborative media as a news resource. In: Proc. Int. Symp. Online Journalism (2004)
Navallo, G.: A Guided Tour to Approximate String Matching. ACM Computing Surveys 33(1) (2001)
Sabel, M.: Structuring wiki revision history. In: WikiSym, pp. 125–130 (2007)
de Brum Saccol, D., Edelweiss, N., de Matos Galante, R., Zaniolo, C.: XML version detection. In: Proc. ACM DocEng 2007, pp. 79–88 (2007)
Ukkonen, E.: Approximate String Matching with q-grams and maximal matches. Theor. Comput. Sci. 1, 191–211 (1992)
Viégas, F.B., Wattenberg, M., Dave, K.: Studying cooperation and conflict between authors with history flow visualizations. In: Proc. ACM CHI 2004, pp. 575–582 (2004)
Wang, S., Iwaihara, M.: Quality Evaluation of Wikipedia Articles through Edit History and Editor Groups. In: Du, X., Fan, W., Wang, J., Peng, Z., Sharaf, M.A. (eds.) APWeb 2011. LNCS, vol. 6612, pp. 188–199. Springer, Heidelberg (2011)
Wöhner, T., Peters, R.: Assessing the quality of Wikipedia articles with lifecycle based metrics. In: Proc. 5th Int. Symp. Wikis and Open Collaboration (2009)
Zeng, H., Alhossaini, M., Ding, L., Fikes, R., McGuinness, D.L., Computing Trust from Revision History. In: Proc. Int. Conf. Privacy, Security and Trust (2006)
Wikipedia, http://en.wikipedia.org/wiki/Wikipedia
Wikipedia Editing, http://en.wikipedia.org/wiki/Wikipedia:How_to_edit_a_page
Wikipedia edit history export pages, http://en.wikipedia.org/w/index.php?title=Special:Export&action=submit
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, J., Iwaihara, M. (2012). Wikipedia Revision Graph Extraction Based on N-Gram Cover. In: Bao, Z., et al. Web-Age Information Management. WAIM 2012. Lecture Notes in Computer Science, vol 7419. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33050-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-33050-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33049-0
Online ISBN: 978-3-642-33050-6
eBook Packages: Computer ScienceComputer Science (R0)