Abstract
Nowadays user-generated content (UGC) such as Wikipedia, is emerging on the web at an explosive rate, but its data quality varies dramatically. How to effectively rate the article’s quality is the focus of research and industry communities. Considering that each quality class demonstrates its specific characteristics on different quality dimensions, we propose to learn the web quality corpus by taking different quality dimensions into consideration. Each article is regarded as an aggregation of sections and each section’s quality is modelled using Dynamic Bayesian Network(DBN) with reference to accuracy, completeness and consistency. Each quality class is represented by three dimension corpora, namely accuracy corpus, completeness corpus and consistency corpus. Finally we propose two schemes to compute quality ranking. Experiments show our approach performs well.
This work is fully supported by National Natural Science Foundation of China under grant No.61003040, 60903181.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hu, M., Lim, E.P., Sun, A.: Measuring article quality in wikipedia: Models and evaluation. In: Proc. of the sixteenth CIKM, pp. 243–252 (2007)
Aebi, D., Perrochon, L.: Towards improving data quality. In: Proc. of the International Conference on Information Systems and Management of Data, pp. 273–281 (1993)
Wang, R.Y., Kon, H.B., Madnick, S.E.: Data quality requirements analysis and modeling. In: Proc. of the Ninth International Conference on Data Engineering, pp. 670–677 (1993)
Pernici, B., Scannapieco, M.: Data quality in web information systems. In: Spaccapietra, S., March, S.T., Kambayashi, Y. (eds.) ER 2002. LNCS, vol. 2503, pp. 397–413. Springer, Heidelberg (2002)
Stvilia, B., Twidle, M.B., Smith, L.C.: Assessing information quality of a community-based encyclopedia. In: Proc. of the International Conference on Information Quality, pp. 442–454 (2005)
Rassbach, L., Pincock, T., Mingus, B.: Exploring the feasibility of automatically rating online article quality (2008)
Dalip, D.H., Cristo, M., Calado, P.: Automatic quality assessment of content created collaboratively by web communities: A case study of wikipedia. In: Proc. of JCDL 2009, pp. 295–304 (2009)
Zeng, H., Alhossaini, M.A., Ding, L.: Computing trust from revision history. In: Proc. of the 2006 International Conference on Privacy, Security and Trust: Bridge the Gap Between PST Technologies and Business Services (2006)
Zeng, H., Alhossaini, M.A., Fikes, R., McGuinness, D.L.: Mining revision history to assess trustworthiness of article fragments. In: Proc. of International Conference on Collaborative Computing: Networking, Applications and Worksharing, pp. 1–10 (2009)
Pipino, L.L., Lee, Y.W., Wang, R.Y.: Data quality assessment. Communications of the ACM 45(4), 211–218 (2002)
Batini, C., Cappiello, C., Francalanci, C., Maurino, A.: Methodologies for data quality assessment and improvement. ACM Computing Surveys 41(3), 1–52 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, J., Fu, X., Chen, K., Wang, C. (2011). Web Article Quality Assessment in Multi-dimensional Space. In: Wang, H., Li, S., Oyama, S., Hu, X., Qian, T. (eds) Web-Age Information Management. WAIM 2011. Lecture Notes in Computer Science, vol 6897. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23535-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-23535-1_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23534-4
Online ISBN: 978-3-642-23535-1
eBook Packages: Computer ScienceComputer Science (R0)