Skip to main content

Clustering of Blog Sites Using Collective Wisdom

  • Chapter
  • First Online:
Computational Social Network Analysis

Part of the book series: Computer Communications and Networks ((CCN))

Abstract

The blogosphere is expanding at an unprecedented speed. A better understanding of the blogosphere can greatly facilitate the development of the social Web to serve the needs of users, service providers, and advertisers. One important task in this process is the clustering of blog sites. Although a good number of traditional clustering methods exist, they are not designed to take into account the blogosphere’s unique characteristics. Clustering blog sites presents new challenges. A prominent feature of the social Web is that many enthusiastic bloggers voluntarily write, tag, and catalog their posts in order to reach the widest possible audience who will share their thoughts and appreciate their ideas. In the process, a new kind of collective wisdom is generated. The objective of this work is to make use of this collective wisdom in the clustering of blog sites. As such, we study how clustering with collective wisdom can be achieved and compare its performance with respect to representative traditional clustering methods. Here contain, we will present statistical and visual results, report findings, opportunities for future research work, and estimated timeline, extending this work to many real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://vlado.fmf.uni-lj.si/pub/networks/pajek/

  2. 2.

    http://www.blogcatalog.com/

  3. 3.

    For the sake of space constraint and the analysis presented here, we limit the labels in this chart that have at least 1,000 blog sites.

  4. 4.

    Pajek was used to create the visualizations.

References

  1. Agarwal N, Liu H, Salerno J, Yu PS (2007) Searching for ‘Familiar Strangers’ on blogosphere: problems and challenges. In: NSF symposium on next-generation data mining and cyber-enabled discovery and innovation (NGDM), Baltimore, MD

    Google Scholar 

  2. Anderson C (2006) The long tail: why the future of business is selling less of more. Hyperion, New York

    Google Scholar 

  3. Bansal NKN, Chiang F, Tompa FW (2007) Seeking stable clusters in the blogosphere. In: Proceedings of VLDB-07, University of Vienna, Austria, pp 806–817

    Google Scholar 

  4. Brooks C, Montanez N (2006) Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proceedings of the WWW 2006. ACM, Edinburgh, UK, pp 625–632

    Google Scholar 

  5. Chin A, Chignell M (2006) A social hypertext model for finding community in blogs. In: HYPERTEXT ’06: proceedings of the 17th conference on hypertext and hypermedia. ACM, New York, NY, USA, Odense, Denmark, pp 11–22

    Google Scholar 

  6. Cutting D, Karger D, Pedersen J, Tukey JW (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the SIGIR, pp 318–329

    Google Scholar 

  7. Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391

    Google Scholar 

  8. Devaney M, Ram A (1997) Efficient feature selection in conceptual clustering. In: Proceedings of ICML, pp 92–97

    Google Scholar 

  9. Dubes RC, Jain AK (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, USA

    MATH  Google Scholar 

  10. Hayes C, Avesani P, Veeramachaneni S (2007) An analysis of the use of tags in a blog recommender system. In: Proceedings of the IJCAI, Hyderabad, India, pp 1–20

    Google Scholar 

  11. Hotho A, Jaschke R, Schmitz C, Stumme G (2006) Information retrieval in folksonomies: Search and ranking. In: Proceedings of ESWC, Budva, Montenegro, pp 411–426

    Google Scholar 

  12. Huang JZ, Ng M, Jing L (2006) Text clustering: algorithms, semantics and systems. PAKDD Tutorial, Singapore

    Google Scholar 

  13. Lin Y-R, Sundaram H, Chi Y, Tatemura J, Tseng B (2006) Discovery of blog communities based on mutual awareness. In: www’06: 3rd annual workshop on webloging ecosystem: aggreation, analysis and dynamics

    Google Scholar 

  14. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th symposium on mathematics, statistics, and probability, pp 281–297

    Google Scholar 

  15. McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: textbooks and monographs. Dekker, New York

    Google Scholar 

  16. Mootee I (2001) High intensity marketing. SA Press

    Google Scholar 

  17. O’Reilly T (September 2005) What is web 2.0 – design patterns and business models for the next generation of software. http://www.oreillynet.com/pub/a/oreilly/tim/news/ 2005/09/30/ what-is-web-20.html

    Google Scholar 

  18. Qamra A, Tseng B, Chang EY (2006) Mining blog stories using community-based and temporal clustering. In: Proceedings of the CIKM, pp 58–67

    Google Scholar 

  19. Song W, Park SC (2006) Genetic algorithm-based text clustering technique:automatic evolution of clusters with high efficiency. In: Proceedings of the 7th international conference on web-age information management workshops (WAIMW), Hong Kong, China, p 17

    Google Scholar 

  20. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Technical report TR-00-034, Department of Computer Science and Engineering, University of Minnesota

    Google Scholar 

  21. Tseng BL, Tatemura J, Wu Y (2005) Tomographic: clustering to visualize blog communities as mountain views. In: Proceedings of the World Wide Web

    Google Scholar 

  22. Xin Li PSY, Liu B (2006) Mining community structure of named entities from web pages and blogs. In: Proceedings of AAAI-06

    Google Scholar 

  23. Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. In: Proceedings of the neural information processing systems conference (NIPS), Vancouver, Canada

    Google Scholar 

  24. Xu S, Zhang J (2004) A parallel hybrid web document clustering algorithm and its performance study. J Supercomput 30:117–131

    MATH  Google Scholar 

  25. Yin X, Han J, Yu PS (2006) Linkclus: efficient clustering via heterogeneous semantic links. In: VLDB’2006: proceedings of the 32nd international conference on very large data bases. VLDB Endowment, pp 427–438

    Google Scholar 

  26. Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of SIGIR, Melbourne, Australia, pp 46–54

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nitin Agarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London Limited

About this chapter

Cite this chapter

Agarwal, N., Galan, M., Liu, H., Subramanya, S. (2010). Clustering of Blog Sites Using Collective Wisdom. In: Abraham, A., Hassanien, AE., Sná¿el, V. (eds) Computational Social Network Analysis. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-84882-229-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-229-0_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-228-3

  • Online ISBN: 978-1-84882-229-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics