Quality in Blogs: How to Find the Best User Generated Content

Hellmann, Rafael; Griesbaum, Joachim; Mandl, Thomas

doi:10.1007/978-3-642-12814-1_5

Rafael Hellmann⁸,
Joachim Griesbaum⁸ &
Thomas Mandl⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 47))

Included in the following conference series:

International Conference on Business Information Systems

1174 Accesses

Abstract

As the popularity of weblogging continues to grow, the automatic quality assessment of user generated content shifts more and more into the focus of scientific and commercial discussions. This paper examines Web Mining and machine learning methods for these purposes. Based on automatically detectable features, various blog-specific quality models are trained using machine learning methods. Data from several thousand blogs in three languages has been collected. Along with the assessment of their efficiency, the most useful attributes are identified. Thus, this work points at the characteristics of high-quality blogs and develops basic ideas for their automatic analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Identifying Relevant Dimensions for the Quality of Web Mashups: An Empirical Study

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Measuring Web Content Credibility Using Predictive Models

References

Ebersbach, A., Glaser, M., Heigl, R.: Social Web, p. 56. UTB, Stuttgart (2008)
Google Scholar
Technorati: State of the Blogosphere (2008), http://technorati.com/blogging/state-of-the-blogosphere/ (Updated: 27.02.2009. Verified: 27.02.2009)
Gillmor, D.: We the Media. Grassroots Journalism by the People, for the People, ch. 2 (2004), http://oreilly.com/catalog/wemedia/book/ch02.pdf (Verified: 1.3.2009)
Herring, S., Kouper, I., Paolillo, J., Scheidt, L.-A., Tyworth, M., Welsch, P., Wright, E., Yu, N.: Conversations in the Blogosphere: An Analysis “From the Bottom Up”. In: Proceedings of the Thirty Eigth Hawaii International Conference on System Sciences (HICSS-38) (2005)
Google Scholar
Mandl, T.: Implementation and Evaluation of a Quality Based Search Engine. In: ACM Conference on Hypertext and Hypermedia (HT 2006) Odense, Denmark. pp. 73–84 (2006)
Google Scholar
Huang, K.-T., Lee, Y., Wang, R.: Quality Information and Knowledge. Prentice Hall, Upper Saddle River (1999)
Google Scholar
Fogg, B., Marable, L., Stanford, J., Tauber, E.: How Do People Evaluate a Web Site’s Credibility? Results from a Large Study. In: Consumer Web Watch 2002 (2002)
Google Scholar
Mandl, T., de la Cruz, T.: International Differences in Web Page Evaluation Guidelines. Intl. Journal of Intercultural Information Management (IJIIM) 1(2), 127–142 (2009)
Article Google Scholar
Brinkmeier, M.: PageRank revisited. ACM Trans. Internet Technol. 6(3), 282–301 (2006)
Article Google Scholar
Brajnik, G.: Towards Valid Quality Models for Websites. In: Seventh Conference Human Factors & the Web (HFWEB) (2001), http://www.dimi.uniud.it/~giorgio/papers/hfweb01.html
Tang, T., Hawking, D., Craswell, N., Griffiths, K.: Focused crawling for both topical relevance and quality of medical information. In: ACM Conf. on Information and Knowledge Management (CIKM), pp. 147–154 (2005)
Google Scholar
Zhu, X., Gauch, S.: Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 288–295 (2000)
Google Scholar
Amento, B., Terveen, L., Hill, W.: Does “authority” mean quality? predicting expert quality ratings of Web documents. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 296–303 (2000)
Google Scholar
Ivory, M.Y., Megraw, R.: Evolution of web site design patterns. ACM Trans. Inf. Syst. 23(4), 463–497 (2005)
Article Google Scholar
Nanno, T., Suzuki, Y., Fujiki, T., Okamura, M.: Automatic Collection and Monitoring of Japanese Weblogs. In: Proceedings of the 13th international World Wide Web conference. Alternate track papers & posters (WWW 2004), pp. 320–321 (2004)
Google Scholar
Elgersma, E., de Rijke, M.: Personal vs Non-Personal Blogs: Initial Classification Experiments. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 723–724 (2008)
Google Scholar
Li, B., Xu, S., Zhang, J.: Enhancing clustering blog documents by utilizing author/reader comments. In: Proceedings of the 45th Annual Southeast Regional Conf., pp. 94–99 (2007)
Google Scholar
Macdonald, C., Ounis, I.: Key blog distillation: ranking aggregates. In: Proc. 17th ACM Conf. on information and Knowledge Management (CIKM 2008), pp. 1043–1052 (2008)
Google Scholar
Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: Conf Web Search and Web Data Mining (WSDM), pp. 183–194 (2008)
Google Scholar
Matuschek, D.: JoBo (2006), http://www.matuschek.net/jobo/ (Updated: 16.12.2006; Verified: 17.04.2009)
Mandl, T.: Comparing Chinese and German Blogs. In: ACM Conference on Hypertext and Hypermedia (HT 2009), pp. 299–308 (2009)
Google Scholar
Witten, I., Frank, E.: Data Mining. In: Technical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar
He, J., Weerkamp, W., Larson, M., de Rijke, M.: Blogger, Stick to your Story: Modeling Topical Noise in Blogs with Coherence Measures. In: Proceedings of the second workshop on Analytics for noisy unstructured text data, pp. 39–46 (2008)
Google Scholar
Shchipitsina, L.: Sprachliche und textuelle Aspekte in russischen Weblogs. In: Schlobinski, P., Siever, T. (eds.) Sprachliche und textuelle Merkmale in Weblogs (2005), http://www.mediensprache.net/de/networx/docs/networx-46.asp
Carrol, D.: Technorati Authority and Rank (Updated: 05.05.2007. Verified: 01.03.2009) (2007), http://technorati.com/weblog/2007/05/354.html
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD Thesis, The University of Waikato (1999), http://www.cs.waikato.ac.nz/~mhall/thesis.pdf
Hellmann, R.: Qualitätsmodelle und Web Mining in Blogs. Master Thesis, University of Hildesheim. International Information Management (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Hildesheim, Germany
Rafael Hellmann, Joachim Griesbaum & Thomas Mandl

Authors

Rafael Hellmann
View author publications
You can also search for this author in PubMed Google Scholar
Joachim Griesbaum
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Mandl
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Systems, Poznań University of Economics, Al. Niepodległości 10, 61-875, Poznań, Poland
Witold Abramowicz
Institut für Informatik, Freie Universität Berlin, Takustr. 9, 14195, Berlin, Germany
Robert Tolksdorf

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hellmann, R., Griesbaum, J., Mandl, T. (2010). Quality in Blogs: How to Find the Best User Generated Content. In: Abramowicz, W., Tolksdorf, R. (eds) Business Information Systems. BIS 2010. Lecture Notes in Business Information Processing, vol 47. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12814-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-12814-1_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12813-4
Online ISBN: 978-3-642-12814-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics