Clustering of Blog Sites Using Collective Wisdom

Agarwal, Nitin; Galan, Magdiel; Liu, Huan; Subramanya, Shankar

doi:10.1007/978-1-84882-229-0_5

Nitin Agarwal⁴,
Magdiel Galan⁴,
Huan Liu⁴ &
…
Shankar Subramanya⁴

Part of the book series: Computer Communications and Networks ((CCN))

2913 Accesses
2 Citations

Abstract

The blogosphere is expanding at an unprecedented speed. A better understanding of the blogosphere can greatly facilitate the development of the social Web to serve the needs of users, service providers, and advertisers. One important task in this process is the clustering of blog sites. Although a good number of traditional clustering methods exist, they are not designed to take into account the blogosphere’s unique characteristics. Clustering blog sites presents new challenges. A prominent feature of the social Web is that many enthusiastic bloggers voluntarily write, tag, and catalog their posts in order to reach the widest possible audience who will share their thoughts and appreciate their ideas. In the process, a new kind of collective wisdom is generated. The objective of this work is to make use of this collective wisdom in the clustering of blog sites. As such, we study how clustering with collective wisdom can be achieved and compare its performance with respect to representative traditional clustering methods. Here contain, we will present statistical and visual results, report findings, opportunities for future research work, and estimated timeline, extending this work to many real-world applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://vlado.fmf.uni-lj.si/pub/networks/pajek/
2.
http://www.blogcatalog.com/
3.
For the sake of space constraint and the analysis presented here, we limit the labels in this chart that have at least 1,000 blog sites.
4.
Pajek was used to create the visualizations.

References

Agarwal N, Liu H, Salerno J, Yu PS (2007) Searching for ‘Familiar Strangers’ on blogosphere: problems and challenges. In: NSF symposium on next-generation data mining and cyber-enabled discovery and innovation (NGDM), Baltimore, MD
Google Scholar
Anderson C (2006) The long tail: why the future of business is selling less of more. Hyperion, New York
Google Scholar
Bansal NKN, Chiang F, Tompa FW (2007) Seeking stable clusters in the blogosphere. In: Proceedings of VLDB-07, University of Vienna, Austria, pp 806–817
Google Scholar
Brooks C, Montanez N (2006) Improved annotation of the blogosphere via autotagging and hierarchical clustering. In: Proceedings of the WWW 2006. ACM, Edinburgh, UK, pp 625–632
Google Scholar
Chin A, Chignell M (2006) A social hypertext model for finding community in blogs. In: HYPERTEXT ’06: proceedings of the 17th conference on hypertext and hypermedia. ACM, New York, NY, USA, Odense, Denmark, pp 11–22
Google Scholar
Cutting D, Karger D, Pedersen J, Tukey JW (1992) Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the SIGIR, pp 318–329
Google Scholar
Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inform Sci 41(6):391
Google Scholar
Devaney M, Ram A (1997) Efficient feature selection in conceptual clustering. In: Proceedings of ICML, pp 92–97
Google Scholar
Dubes RC, Jain AK (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs, NJ, USA
MATH Google Scholar
Hayes C, Avesani P, Veeramachaneni S (2007) An analysis of the use of tags in a blog recommender system. In: Proceedings of the IJCAI, Hyderabad, India, pp 1–20
Google Scholar
Hotho A, Jaschke R, Schmitz C, Stumme G (2006) Information retrieval in folksonomies: Search and ranking. In: Proceedings of ESWC, Budva, Montenegro, pp 411–426
Google Scholar
Huang JZ, Ng M, Jing L (2006) Text clustering: algorithms, semantics and systems. PAKDD Tutorial, Singapore
Google Scholar
Lin Y-R, Sundaram H, Chi Y, Tatemura J, Tseng B (2006) Discovery of blog communities based on mutual awareness. In: www’06: 3rd annual workshop on webloging ecosystem: aggreation, analysis and dynamics
Google Scholar
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th symposium on mathematics, statistics, and probability, pp 281–297
Google Scholar
McLachlan GJ, Basford KE (1988) Mixture models. Inference and applications to clustering. Statistics: textbooks and monographs. Dekker, New York
Google Scholar
Mootee I (2001) High intensity marketing. SA Press
Google Scholar
O’Reilly T (September 2005) What is web 2.0 – design patterns and business models for the next generation of software. http://www.oreillynet.com/pub/a/oreilly/tim/news/ 2005/09/30/ what-is-web-20.html
Google Scholar
Qamra A, Tseng B, Chang EY (2006) Mining blog stories using community-based and temporal clustering. In: Proceedings of the CIKM, pp 58–67
Google Scholar
Song W, Park SC (2006) Genetic algorithm-based text clustering technique:automatic evolution of clusters with high efficiency. In: Proceedings of the 7th international conference on web-age information management workshops (WAIMW), Hong Kong, China, p 17
Google Scholar
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. Technical report TR-00-034, Department of Computer Science and Engineering, University of Minnesota
Google Scholar
Tseng BL, Tatemura J, Wu Y (2005) Tomographic: clustering to visualize blog communities as mountain views. In: Proceedings of the World Wide Web
Google Scholar
Xin Li PSY, Liu B (2006) Mining community structure of named entities from web pages and blogs. In: Proceedings of AAAI-06
Google Scholar
Xu L, Neufeld J, Larson B, Schuurmans D (2004) Maximum margin clustering. In: Proceedings of the neural information processing systems conference (NIPS), Vancouver, Canada
Google Scholar
Xu S, Zhang J (2004) A parallel hybrid web document clustering algorithm and its performance study. J Supercomput 30:117–131
MATH Google Scholar
Yin X, Han J, Yu PS (2006) Linkclus: efficient clustering via heterogeneous semantic links. In: VLDB’2006: proceedings of the 32nd international conference on very large data bases. VLDB Endowment, pp 427–438
Google Scholar
Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of SIGIR, Melbourne, Australia, pp 46–54
Google Scholar

Download references

Author information

Authors and Affiliations

Arizona State University, Tempe, AZ, 85287, USA
Nitin Agarwal, Magdiel Galan, Huan Liu & Shankar Subramanya

Authors

Nitin Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Magdiel Galan
View author publications
You can also search for this author in PubMed Google Scholar
Huan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shankar Subramanya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nitin Agarwal .

Editor information

Editors and Affiliations

Scientific Network for Innovation &, Machine Intelligence Research Labs (MIR), Auburn, 98071-2259, U.S.A.
Ajith Abraham
Fac. Computers & Information, Cairo University, Ahmed Zewal Street 5, Giza, 12613, Egypt
Aboul-Ella Hassanien
Dept. Computer Science, Technical University Ostrava, Tr. 17. Listopadu 15, Ostrava, 708 33, Czech Republic
Vaclav Sná¿el

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Agarwal, N., Galan, M., Liu, H., Subramanya, S. (2010). Clustering of Blog Sites Using Collective Wisdom. In: Abraham, A., Hassanien, AE., Sná¿el, V. (eds) Computational Social Network Analysis. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-84882-229-0_5

Download citation

DOI: https://doi.org/10.1007/978-1-84882-229-0_5
Published: 06 November 2009
Publisher Name: Springer, London
Print ISBN: 978-1-84882-228-3
Online ISBN: 978-1-84882-229-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics