Abstract
As the popularity of weblogging continues to grow, the automatic quality assessment of user generated content shifts more and more into the focus of scientific and commercial discussions. This paper examines Web Mining and machine learning methods for these purposes. Based on automatically detectable features, various blog-specific quality models are trained using machine learning methods. Data from several thousand blogs in three languages has been collected. Along with the assessment of their efficiency, the most useful attributes are identified. Thus, this work points at the characteristics of high-quality blogs and develops basic ideas for their automatic analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ebersbach, A., Glaser, M., Heigl, R.: Social Web, p. 56. UTB, Stuttgart (2008)
Technorati: State of the Blogosphere (2008), http://technorati.com/blogging/state-of-the-blogosphere/ (Updated: 27.02.2009. Verified: 27.02.2009)
Gillmor, D.: We the Media. Grassroots Journalism by the People, for the People, ch. 2 (2004), http://oreilly.com/catalog/wemedia/book/ch02.pdf (Verified: 1.3.2009)
Herring, S., Kouper, I., Paolillo, J., Scheidt, L.-A., Tyworth, M., Welsch, P., Wright, E., Yu, N.: Conversations in the Blogosphere: An Analysis “From the Bottom Up”. In: Proceedings of the Thirty Eigth Hawaii International Conference on System Sciences (HICSS-38) (2005)
Mandl, T.: Implementation and Evaluation of a Quality Based Search Engine. In: ACM Conference on Hypertext and Hypermedia (HT 2006) Odense, Denmark. pp. 73–84 (2006)
Huang, K.-T., Lee, Y., Wang, R.: Quality Information and Knowledge. Prentice Hall, Upper Saddle River (1999)
Fogg, B., Marable, L., Stanford, J., Tauber, E.: How Do People Evaluate a Web Site’s Credibility? Results from a Large Study. In: Consumer Web Watch 2002 (2002)
Mandl, T., de la Cruz, T.: International Differences in Web Page Evaluation Guidelines. Intl. Journal of Intercultural Information Management (IJIIM) 1(2), 127–142 (2009)
Brinkmeier, M.: PageRank revisited. ACM Trans. Internet Technol. 6(3), 282–301 (2006)
Brajnik, G.: Towards Valid Quality Models for Websites. In: Seventh Conference Human Factors & the Web (HFWEB) (2001), http://www.dimi.uniud.it/~giorgio/papers/hfweb01.html
Tang, T., Hawking, D., Craswell, N., Griffiths, K.: Focused crawling for both topical relevance and quality of medical information. In: ACM Conf. on Information and Knowledge Management (CIKM), pp. 147–154 (2005)
Zhu, X., Gauch, S.: Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 288–295 (2000)
Amento, B., Terveen, L., Hill, W.: Does “authority” mean quality? predicting expert quality ratings of Web documents. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 296–303 (2000)
Ivory, M.Y., Megraw, R.: Evolution of web site design patterns. ACM Trans. Inf. Syst. 23(4), 463–497 (2005)
Nanno, T., Suzuki, Y., Fujiki, T., Okamura, M.: Automatic Collection and Monitoring of Japanese Weblogs. In: Proceedings of the 13th international World Wide Web conference. Alternate track papers & posters (WWW 2004), pp. 320–321 (2004)
Elgersma, E., de Rijke, M.: Personal vs Non-Personal Blogs: Initial Classification Experiments. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 723–724 (2008)
Li, B., Xu, S., Zhang, J.: Enhancing clustering blog documents by utilizing author/reader comments. In: Proceedings of the 45th Annual Southeast Regional Conf., pp. 94–99 (2007)
Macdonald, C., Ounis, I.: Key blog distillation: ranking aggregates. In: Proc. 17th ACM Conf. on information and Knowledge Management (CIKM 2008), pp. 1043–1052 (2008)
Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: Conf Web Search and Web Data Mining (WSDM), pp. 183–194 (2008)
Matuschek, D.: JoBo (2006), http://www.matuschek.net/jobo/ (Updated: 16.12.2006; Verified: 17.04.2009)
Mandl, T.: Comparing Chinese and German Blogs. In: ACM Conference on Hypertext and Hypermedia (HT 2009), pp. 299–308 (2009)
Witten, I., Frank, E.: Data Mining. In: Technical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
He, J., Weerkamp, W., Larson, M., de Rijke, M.: Blogger, Stick to your Story: Modeling Topical Noise in Blogs with Coherence Measures. In: Proceedings of the second workshop on Analytics for noisy unstructured text data, pp. 39–46 (2008)
Shchipitsina, L.: Sprachliche und textuelle Aspekte in russischen Weblogs. In: Schlobinski, P., Siever, T. (eds.) Sprachliche und textuelle Merkmale in Weblogs (2005), http://www.mediensprache.net/de/networx/docs/networx-46.asp
Carrol, D.: Technorati Authority and Rank (Updated: 05.05.2007. Verified: 01.03.2009) (2007), http://technorati.com/weblog/2007/05/354.html
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD Thesis, The University of Waikato (1999), http://www.cs.waikato.ac.nz/~mhall/thesis.pdf
Hellmann, R.: Qualitätsmodelle und Web Mining in Blogs. Master Thesis, University of Hildesheim. International Information Management (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hellmann, R., Griesbaum, J., Mandl, T. (2010). Quality in Blogs: How to Find the Best User Generated Content. In: Abramowicz, W., Tolksdorf, R. (eds) Business Information Systems. BIS 2010. Lecture Notes in Business Information Processing, vol 47. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12814-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-12814-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12813-4
Online ISBN: 978-3-642-12814-1
eBook Packages: Computer ScienceComputer Science (R0)