Abstract
The collaborative efforts of users in social media services such as Wikipedia have led to an explosion in user-generated content and how to automatically tag the quality of the content is an eminent concern now. Actually each article is usually undergoing a series of revision phases and the articles of different quality classes exhibit specific revision cycle patterns. We propose to Assess Quality based on Revision History (AQRH) for a specific domain as follows. First, we borrow Hidden Markov Model (HMM) to turn each article’s revision history into a revision state sequence. Then, for each quality class its revision cycle patterns are extracted and are clustered into quality corpora. Finally, article’s quality is thereby gauged by comparing the article’s state sequence with the patterns of pre-classified documents in probabilistic sense. We conduct experiments on a set of Wikipedia articles and the results demonstrate that our method can accurately and objectively capture web article’s quality.
The work is fully supported by National Natural Science Foundation of China under grants 61003040, 60903181 and China 973 program of No. 20100471353.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proc. of SIGMOD 2002, pp. 240–251 (2002)
Dalip, D.H., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In: Proc. of JCDL 2009, pp. 295–304 (2009)
Aebi, D., Perrochon, L.: Towards improving data quality. In: Proc. of the International Conference on Information Systems and Management of Data, pp. 273–281 (1993)
Wang, R.Y., Kon, H.B., Madnick, S.E.: Data quality requirements analysis and modeling. In: Proc. of the Ninth International Conference on Data Engineering, pp. 670–677 (1993)
Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: Proc. of 2004 International Information Quality Conference on Information System, pp. 59–67 (2004)
Pernici, B., Scannapieco, M.: Data quality in web information systems. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 397–413. Springer, Heidelberg (2002)
Macdonald, N., Frase, L., Gingrich, P., Keenan, S.: The writer’s workbench: computer aids for text analysis. IEEE Transactions on Communications 30(1), 105–110 (1982)
Foltz, P.W.: Supporting content-based feedback in on-line writing evaluation with lsa. Interactive Learning Environments 8(2), 111–127 (2000)
Rassbach, L., Pincock, T., Mingus, B.: Exploring the feasibility of automatically rating online article quality (2008)
Stvilia, B., Twidle, B., Smith, M.C.: Assessing information quality of a community-based encyclopedia. In: Proc. of the International Conference on Information Quality, pp. 442–454 (2005)
Zeng, H., Alhossaini, M.A., Ding, L.: Computing trust from revision history. In: Proc. of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (2006)
Zeng, H., Alhossaini, M.A., Fikes, R., McGuinness, D.L: mining revision history to assess trustworthiness of article fragments. In: Proc. of International conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 1–10 (2009)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. of IEEE, 257–286 (1989)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Statist. 41(1), 164–171 (1970)
Ding, B., Lo, D., Han, J., Khoo, S.C.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proc. of 2009 ICDE, pp. 1024–1035 (2009)
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(8), 841–847 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, J., Wang, C., Jiang, D. (2011). Probabilistic Quality Assessment Based on Article’s Revision History. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6861. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23091-2_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-23091-2_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23090-5
Online ISBN: 978-3-642-23091-2
eBook Packages: Computer ScienceComputer Science (R0)