Skip to main content

Quality in Blogs: How to Find the Best User Generated Content

  • Conference paper
Book cover Business Information Systems (BIS 2010)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 47))

Included in the following conference series:

Abstract

As the popularity of weblogging continues to grow, the automatic quality assessment of user generated content shifts more and more into the focus of scientific and commercial discussions. This paper examines Web Mining and machine learning methods for these purposes. Based on automatically detectable features, various blog-specific quality models are trained using machine learning methods. Data from several thousand blogs in three languages has been collected. Along with the assessment of their efficiency, the most useful attributes are identified. Thus, this work points at the characteristics of high-quality blogs and develops basic ideas for their automatic analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ebersbach, A., Glaser, M., Heigl, R.: Social Web, p. 56. UTB, Stuttgart (2008)

    Google Scholar 

  2. Technorati: State of the Blogosphere (2008), http://technorati.com/blogging/state-of-the-blogosphere/ (Updated: 27.02.2009. Verified: 27.02.2009)

  3. Gillmor, D.: We the Media. Grassroots Journalism by the People, for the People, ch. 2 (2004), http://oreilly.com/catalog/wemedia/book/ch02.pdf (Verified: 1.3.2009)

  4. Herring, S., Kouper, I., Paolillo, J., Scheidt, L.-A., Tyworth, M., Welsch, P., Wright, E., Yu, N.: Conversations in the Blogosphere: An Analysis “From the Bottom Up”. In: Proceedings of the Thirty Eigth Hawaii International Conference on System Sciences (HICSS-38) (2005)

    Google Scholar 

  5. Mandl, T.: Implementation and Evaluation of a Quality Based Search Engine. In: ACM Conference on Hypertext and Hypermedia (HT 2006) Odense, Denmark. pp. 73–84 (2006)

    Google Scholar 

  6. Huang, K.-T., Lee, Y., Wang, R.: Quality Information and Knowledge. Prentice Hall, Upper Saddle River (1999)

    Google Scholar 

  7. Fogg, B., Marable, L., Stanford, J., Tauber, E.: How Do People Evaluate a Web Site’s Credibility? Results from a Large Study. In: Consumer Web Watch 2002 (2002)

    Google Scholar 

  8. Mandl, T., de la Cruz, T.: International Differences in Web Page Evaluation Guidelines. Intl. Journal of Intercultural Information Management (IJIIM) 1(2), 127–142 (2009)

    Article  Google Scholar 

  9. Brinkmeier, M.: PageRank revisited. ACM Trans. Internet Technol. 6(3), 282–301 (2006)

    Article  Google Scholar 

  10. Brajnik, G.: Towards Valid Quality Models for Websites. In: Seventh Conference Human Factors & the Web (HFWEB) (2001), http://www.dimi.uniud.it/~giorgio/papers/hfweb01.html

  11. Tang, T., Hawking, D., Craswell, N., Griffiths, K.: Focused crawling for both topical relevance and quality of medical information. In: ACM Conf. on Information and Knowledge Management (CIKM), pp. 147–154 (2005)

    Google Scholar 

  12. Zhu, X., Gauch, S.: Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 288–295 (2000)

    Google Scholar 

  13. Amento, B., Terveen, L., Hill, W.: Does “authority” mean quality? predicting expert quality ratings of Web documents. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 296–303 (2000)

    Google Scholar 

  14. Ivory, M.Y., Megraw, R.: Evolution of web site design patterns. ACM Trans. Inf. Syst. 23(4), 463–497 (2005)

    Article  Google Scholar 

  15. Nanno, T., Suzuki, Y., Fujiki, T., Okamura, M.: Automatic Collection and Monitoring of Japanese Weblogs. In: Proceedings of the 13th international World Wide Web conference. Alternate track papers & posters (WWW 2004), pp. 320–321 (2004)

    Google Scholar 

  16. Elgersma, E., de Rijke, M.: Personal vs Non-Personal Blogs: Initial Classification Experiments. In: SIGIR Conf. on Research and Development in Information Retrieval, pp. 723–724 (2008)

    Google Scholar 

  17. Li, B., Xu, S., Zhang, J.: Enhancing clustering blog documents by utilizing author/reader comments. In: Proceedings of the 45th Annual Southeast Regional Conf., pp. 94–99 (2007)

    Google Scholar 

  18. Macdonald, C., Ounis, I.: Key blog distillation: ranking aggregates. In: Proc. 17th ACM Conf. on information and Knowledge Management (CIKM 2008), pp. 1043–1052 (2008)

    Google Scholar 

  19. Agichtein, E., Castillo, C., Donato, D., Gionis, A., Mishne, G.: Finding high-quality content in social media. In: Conf Web Search and Web Data Mining (WSDM), pp. 183–194 (2008)

    Google Scholar 

  20. Matuschek, D.: JoBo (2006), http://www.matuschek.net/jobo/ (Updated: 16.12.2006; Verified: 17.04.2009)

  21. Mandl, T.: Comparing Chinese and German Blogs. In: ACM Conference on Hypertext and Hypermedia (HT 2009), pp. 299–308 (2009)

    Google Scholar 

  22. Witten, I., Frank, E.: Data Mining. In: Technical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  23. He, J., Weerkamp, W., Larson, M., de Rijke, M.: Blogger, Stick to your Story: Modeling Topical Noise in Blogs with Coherence Measures. In: Proceedings of the second workshop on Analytics for noisy unstructured text data, pp. 39–46 (2008)

    Google Scholar 

  24. Shchipitsina, L.: Sprachliche und textuelle Aspekte in russischen Weblogs. In: Schlobinski, P., Siever, T. (eds.) Sprachliche und textuelle Merkmale in Weblogs (2005), http://www.mediensprache.net/de/networx/docs/networx-46.asp

  25. Carrol, D.: Technorati Authority and Rank (Updated: 05.05.2007. Verified: 01.03.2009) (2007), http://technorati.com/weblog/2007/05/354.html

  26. Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning. PhD Thesis, The University of Waikato (1999), http://www.cs.waikato.ac.nz/~mhall/thesis.pdf

  27. Hellmann, R.: Qualitätsmodelle und Web Mining in Blogs. Master Thesis, University of Hildesheim. International Information Management (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hellmann, R., Griesbaum, J., Mandl, T. (2010). Quality in Blogs: How to Find the Best User Generated Content. In: Abramowicz, W., Tolksdorf, R. (eds) Business Information Systems. BIS 2010. Lecture Notes in Business Information Processing, vol 47. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12814-1_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12814-1_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12813-4

  • Online ISBN: 978-3-642-12814-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics