Probabilistic Quality Assessment Based on Article’s Revision History

Han, Jingyu; Wang, Chuandong; Jiang, Dawei

doi:10.1007/978-3-642-23091-2_50

Jingyu Han²⁰,
Chuandong Wang²⁰ &
Dawei Jiang²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6861))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

1253 Accesses
3 Citations

Abstract

The collaborative efforts of users in social media services such as Wikipedia have led to an explosion in user-generated content and how to automatically tag the quality of the content is an eminent concern now. Actually each article is usually undergoing a series of revision phases and the articles of different quality classes exhibit specific revision cycle patterns. We propose to Assess Quality based on Revision History (AQRH) for a specific domain as follows. First, we borrow Hidden Markov Model (HMM) to turn each article’s revision history into a revision state sequence. Then, for each quality class its revision cycle patterns are extracted and are clustered into quality corpora. Finally, article’s quality is thereby gauged by comparing the article’s state sequence with the patterns of pre-classified documents in probabilistic sense. We conduct experiments on a set of Wikipedia articles and the results demonstrate that our method can accurately and objectively capture web article’s quality.

The work is fully supported by National Natural Science Foundation of China under grants 61003040, 60903181 and China 973 program of No. 20100471353.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Dasu, T., Johnson, T., Muthukrishnan, S., Shkapenyuk, V.: Mining database structure; or, how to build a data quality browser. In: Proc. of SIGMOD 2002, pp. 240–251 (2002)
Google Scholar
Dalip, D.H., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In: Proc. of JCDL 2009, pp. 295–304 (2009)
Google Scholar
Aebi, D., Perrochon, L.: Towards improving data quality. In: Proc. of the International Conference on Information Systems and Management of Data, pp. 273–281 (1993)
Google Scholar
Wang, R.Y., Kon, H.B., Madnick, S.E.: Data quality requirements analysis and modeling. In: Proc. of the Ninth International Conference on Data Engineering, pp. 670–677 (1993)
Google Scholar
Bouzeghoub, M., Peralta, V.: A framework for analysis of data freshness. In: Proc. of 2004 International Information Quality Conference on Information System, pp. 59–67 (2004)
Google Scholar
Pernici, B., Scannapieco, M.: Data quality in web information systems. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 397–413. Springer, Heidelberg (2002)
Chapter Google Scholar
Macdonald, N., Frase, L., Gingrich, P., Keenan, S.: The writer’s workbench: computer aids for text analysis. IEEE Transactions on Communications 30(1), 105–110 (1982)
Article Google Scholar
Foltz, P.W.: Supporting content-based feedback in on-line writing evaluation with lsa. Interactive Learning Environments 8(2), 111–127 (2000)
Article Google Scholar
Rassbach, L., Pincock, T., Mingus, B.: Exploring the feasibility of automatically rating online article quality (2008)
Google Scholar
Stvilia, B., Twidle, B., Smith, M.C.: Assessing information quality of a community-based encyclopedia. In: Proc. of the International Conference on Information Quality, pp. 442–454 (2005)
Google Scholar
Zeng, H., Alhossaini, M.A., Ding, L.: Computing trust from revision history. In: Proc. of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (2006)
Google Scholar
Zeng, H., Alhossaini, M.A., Fikes, R., McGuinness, D.L: mining revision history to assess trustworthiness of article fragments. In: Proc. of International conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 1–10 (2009)
Google Scholar
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proc. of IEEE, 257–286 (1989)
Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of markov chains. Ann. Math. Statist. 41(1), 164–171 (1970)
Article MathSciNet MATH Google Scholar
Ding, B., Lo, D., Han, J., Khoo, S.C.: Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proc. of 2009 ICDE, pp. 1024–1035 (2009)
Google Scholar
Xie, X.L., Beni, G.: A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 13(8), 841–847 (1991)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Nanjing University of Posts and Telecommunications, Nanjing, 210003, P.R. China
Jingyu Han & Chuandong Wang
School of Computing, National University of Singapore, 119077, Singapore
Dawei Jiang

Authors

Jingyu Han
View author publications
You can also search for this author in PubMed Google Scholar
Chuandong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dawei Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IRIT Institut de Recherche en Informatique de Toulouse, Paul Sabatier University, 118, route de Narbonne, 31062, Toulouse Cedex, France
Abdelkader Hameurlain
Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg and Johannes-Keppler-University Linz, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
School of Information Technology and Electrical Engineering, University of Queensland, QLD 4072, Brisbane, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Han, J., Wang, C., Jiang, D. (2011). Probabilistic Quality Assessment Based on Article’s Revision History. In: Hameurlain, A., Liddle, S.W., Schewe, KD., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2011. Lecture Notes in Computer Science, vol 6861. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23091-2_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-23091-2_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23090-5
Online ISBN: 978-3-642-23091-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics