Abstract
The blogosphere is expanding at an unprecedented speed. A better understanding of the blogosphere can greatly facilitate the development of the social Web to serve the needs of users, service providers, and advertisers. One important task in this process is the clustering of blog sites. Although a good number of traditional clustering methods exist, they are not designed to take into account the blogosphere’s unique characteristics. Clustering blog sites presents new challenges. A prominent feature of the social Web is that many enthusiastic bloggers voluntarily write, tag, and catalog their posts in order to reach the widest possible audience who will share their thoughts and appreciate their ideas. In the process, a new kind of collective wisdom is generated. The objective of this work is to make use of this collective wisdom in the clustering of blog sites. As such, we study how clustering with collective wisdom can be achieved and compare its performance with respect to representative traditional clustering methods. Here contain, we will present statistical and visual results, report findings, opportunities for future research work, and estimated timeline, extending this work to many real-world applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
http://vlado.fmf.uni-lj.si/pub/networks/pajek/
- 2.
http://www.blogcatalog.com/
- 3.
For the sake of space constraint and the analysis presented here, we limit the labels in this chart that have at least 1,000 blog sites.
- 4.
Pajek was used to create the visualizations.
References
Agarwal N, Liu H, Salerno J, Yu PS (2007) Searching for ‘Familiar Strangers’ on blogosphere: problems and challenges. In: NSF symposium on next-generation data mining and cyber-enabled discovery and innovation (NGDM), Baltimore, MD
Anderson C (2006) The long tail: why the future of business is selling less of more. Hyperion, New York
Bansal NKN, Chiang F, Tompa FW (2007) Seeking stable clusters in the blogosphere. In: Proceedings of VLDB-07, University of Vienna, Austria, pp 806–817
Brooks C, Montanez N (2006) Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proceedings of the WWW 2006. ACM, Edinburgh, UK, pp 625–632
Chin A, Chignell M (2006) A social hypertext model for finding community in blogs. In: HYPERTEXT ’06: proceedings of the 17th conference on hypertext and hypermedia. ACM, New York, NY, USA, Odense, Denmark, pp 11–22
Cutting D, Karger D, Pedersen J, Tukey JW (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the SIGIR, pp 318–329
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391
Devaney M, Ram A (1997) Efficient feature selection in conceptual clustering. In: Proceedings of ICML, pp 92–97
Dubes RC, Jain AK (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, USA
Hayes C, Avesani P, Veeramachaneni S (2007) An analysis of the use of tags in a blog recommender system. In: Proceedings of the IJCAI, Hyderabad, India, pp 1–20
Hotho A, Jaschke R, Schmitz C, Stumme G (2006) Information retrieval in folksonomies: Search and ranking. In: Proceedings of ESWC, Budva, Montenegro, pp 411–426
Huang JZ, Ng M, Jing L (2006) Text clustering: algorithms, semantics and systems. PAKDD Tutorial, Singapore
Lin Y-R, Sundaram H, Chi Y, Tatemura J, Tseng B (2006) Discovery of blog communities based on mutual awareness. In: www’06: 3rd annual workshop on webloging ecosystem: aggreation, analysis and dynamics
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th symposium on mathematics, statistics, and probability, pp 281–297
McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: textbooks and monographs. Dekker, New York
Mootee I (2001) High intensity marketing. SA Press
O’Reilly T (September 2005) What is web 2.0 – design patterns and business models for the next generation of software. http://www.oreillynet.com/pub/a/oreilly/tim/news/ 2005/09/30/ what-is-web-20.html
Qamra A, Tseng B, Chang EY (2006) Mining blog stories using community-based and temporal clustering. In: Proceedings of the CIKM, pp 58–67
Song W, Park SC (2006) Genetic algorithm-based text clustering technique:automatic evolution of clusters with high efficiency. In: Proceedings of the 7th international conference on web-age information management workshops (WAIMW), Hong Kong, China, p 17
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Technical report TR-00-034, Department of Computer Science and Engineering, University of Minnesota
Tseng BL, Tatemura J, Wu Y (2005) Tomographic: clustering to visualize blog communities as mountain views. In: Proceedings of the World Wide Web
Xin Li PSY, Liu B (2006) Mining community structure of named entities from web pages and blogs. In: Proceedings of AAAI-06
Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. In: Proceedings of the neural information processing systems conference (NIPS), Vancouver, Canada
Xu S, Zhang J (2004) A parallel hybrid web document clustering algorithm and its performance study. J Supercomput 30:117–131
Yin X, Han J, Yu PS (2006) Linkclus: efficient clustering via heterogeneous semantic links. In: VLDB’2006: proceedings of the 32nd international conference on very large data bases. VLDB Endowment, pp 427–438
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of SIGIR, Melbourne, Australia, pp 46–54
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag London Limited
About this chapter
Cite this chapter
Agarwal, N., Galan, M., Liu, H., Subramanya, S. (2010). Clustering of Blog Sites Using Collective Wisdom. In: Abraham, A., Hassanien, AE., Sná¿el, V. (eds) Computational Social Network Analysis. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-84882-229-0_5
Download citation
DOI: https://doi.org/10.1007/978-1-84882-229-0_5
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84882-228-3
Online ISBN: 978-1-84882-229-0
eBook Packages: Computer ScienceComputer Science (R0)