Abstract
The blogosphere, which is the name given to the universe of all blog sites, is now a collection of a tremendous amount of user generated data. The ease & simplicity of creating blog posts and their free form and unedited nature have made the blogosphere a rich and unique source of data, which has attracted people and companies across disciplines to exploit it for varied purposes. The large volume of data requires developing appropriate automated techniques for searching and mining useful inferences from the blogosphere. The valuable data contained in posts from a large number of users across geographic, demographic and cultural boundaries provide a rich opportunity for not only commercial exploitation but also for cross-cultural psychological & sociological research. This paper tries to present the broader picture in and around this theme, chart the required academic and technological framework for the purpose and presents initial results of an experimental work to demonstrate the plausibility of the idea.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Technorati Blogosphere Statistics (2008), http://technorati.com/blogging/state-of-the-blogosphere/
Kritikopoulos, A., Sideri, M., Varlamis, I.: Bogrank: Ranking Weblogs based on connectivity and similarity features. In: AAA-IDEA 2006- Proceedings of the 2nd International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications. ACM Press, New York (2006)
Leskovec, J., McGlohon, M., Faloutsos, C., Glance, N., Hurst, M.: Cascading Behaviour in Large Blog Graphs. In: SIAM International Conference on Data Mining (2007)
Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the Bursty Evolution of Blogspace. In: Proceedings of 12th International Conference on World Wide Web, pp. 568–576. ACM Press, New York (2003)
Brooks, C.H., Montanez, N.: Improved Annotation of Blogosphere via Autotagging and Hierarchical Clustering. In: WWW 2006: Proceedings of 15th International Conference on World Wide Web, pp. 625–632. ACM Press, New York (2006)
Li, B., Xu, S., Zhang, J.: Enhancing Clustering Blog Documents by author/ reader comments. In: ACM-SE 45: Proceedings of 45th Annual Southeast Regional Conference, pp. 94–99. ACM Press, New York (2007)
Agarwal, N., Galan, M., Liu, H., Subramanya, S.: Clustering Blogs with Collective Wisdom. In: Proceedings of International Conference on Web Engineering (2008)
Gammon, M., Aue, A., Corston-Oliver, S., Ringger, E.: Pulse: Mining Customer Opinions from Free Text. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 121–132. Springer, Heidelberg (2005)
Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, Heidelberg (2006)
Blanchard, A., Markus, M.: The Experienced Sense of a Virtual Community- Characteristics and Processes. The DATA BASE for Advances in Information Systems 35(1) (2004)
Efimova, L., Hendrick, S.: In Search for a Virtual Settlement: An Exploration of Weblog Community Boundaries. IEEE Computer Society Press (2005)
Lu, Y., Lee, H.: Blog Community Discovery Based on Tag Data Clustering. In: 2008 Asia-Pacific Workshop on Computational Intelligence & Industrial Application. IEEE Computer Society Press, Los Alamitos (2008)
Chin, A., Chignell, M.: A Social Hypertext Model for finding Community in Blogs. In: HYPERTEXT 2006: Proceedings of Seventeenth Conference on Hypertext and Hypermedia, pp. 11–12. ACM Press, New York (2006)
Agarwal, N., Liu, H., Tang, L., Yu, P.S.: Identifying the Influential Bloggers in a Community. In: Proceedings of International Conference on Web Search and Web Data Mining, pp. 207–218. ACM Press, Palo Alto (2008)
Ntoulas, A., Najork, M., Manasse, M., Fetterl, D.: Detecting Spam Web Pages through Content Analysis. In: Proceedings of 15th International Conference on World Wide Web, WWW (2006)
Gyongyi, Z., Berkhin, P., Gracia-Molina, H., Pedersen, J.: Link Spam Detection Based on Mass Estimation. In: Proceedings of the 32nd International Conference on Very Large Databases, VLDB (2006)
Kolari, P., Finin, T., Joshi, A.: SVMs for Blogosphere: Blog Identification and Splog Detection. In: AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs. AAAI, Menlo Park (2006)
Kolari, P., Java, A., Finin, T., Oates, T., Joshi, A.: Detecting Spam Blogs: A Machine Learning Approach. In: Proceedings of 21st National Conference on Artificial Intelligence (AAAI). AAAI, Menlo Park (2006)
Alag, S.: Collective Intelligence in Action. In: Manning, New York, pp. 111–144 (2009)
Online Sentiment Analysis: Free and Paid tools, http://www.rockyfu.com/blog/sentiment-analysis/ (reteieved August 2009)
Sood, S.O., Vasserman, L.: Esse: Exploring mood on the Web. In: Proceedings of International Conference on Weblogs and Social Media, Seattle (May 2009)
Godbole, N., Srinivasaiah, M., Skiena, S.: Large Scale Sentiment Analysis for News and Blogs. In: Proceedings of the International Conference on Weblogs and Social Media, ICWSM (2007)
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Journal of Foundation and Trends in Information Retrieval 2 (2008)
Esuli, A., Sebastiani, F.: SentiWordNet: A Publicly available lexical resource for opinion mining. In: Proceedings of the fifth Conference on Language Resources and Evaluation (LREC 2006), Geneva (2006)
WEKA- Waikato Environment for Knowledge Analysis, http://www.cs.waikato.ac.nz/ml/weka/ (retrieved May 2009)
JDM-Java Data Mining API 2.0, JSR 247, http://www.jcp.org/en/jsr/detail?id=247 (retrieved May 2009)
Singh, V.K., Jalan, R., Chaturvedi, S.K., Gupta, A.K.: Collective Intelligence Based Computational Approach to Web Intelligence. In: Proceedings of 2009 International Conference on Web Information Systems and Mining. IEEE Computer Society Press, Shanghai (November 2009)
Singh, V.K., Mahata, D., Adhikari, R.: A Clustering and Opinion Extraction Based Approach to Socio-political Analysis of the Blogosphere. In: Communicated to appear in 2010 IEEE International Conference on Computational Intelligence and Computing Research. IEEE Xplore, Coimbatore (December 2010)
Subject Search Summarizer tool, by Kryloff technologies, http://www.kryltech.com/summarizer.htm (retrieved April 2009)
Hovy, E., Marcu, D.: Automatic Text Summarization Tutorial. In: Proceedings of the Workshop on Intelligent Scalable Text Summarization, ACL/EACL Conference, Madrid, pp. 66–73 (1998)
TagCrowd Beta, Tag Cloud Generation tool, http://www.tagcrowd.com/ (retrieved April 2009)
Miller, G.A.: Wordnet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995), http://wordnet.princeton.edu
Uclassify Mood Analysis tool, http://www.uclassify.com/browse/prfekt/Mood (retrieved April 2009)
Mishne, G., Rijke, M.D.: MoodViews: Tools for Blog Mood Analysis. In: AAAI 2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, AAAI-CAAW 2006 (March 2006)
Balog, K., Rijke, M.D.: Decomposing Bloggers’ Moods. In: 3rd Annual Workshop on the Web blogging Ecosystem, at WWW 2006 (2006)
Attardi, G., Simi, M.: Blog Mining through Opinionated Words. In: Proceedings of Fifteenth Text Retrieval Conference, TREC (2006)
Agarwal, N., Liu, H.: Data Mining and Knowledge Discovery in Blogs. Morgan & Claypool Publishers, San Francisco (2010)
Jones, K.S.: What is the Role of Natural Language Processing in Information Retrieval In Natural Language Information Retrieval. In: Strzalkowski, T. (ed.) Text, Speech and Language Technology. Springer, Heidelberg (1999)
Lease, M.: Natural Language Processing for Information Retrieval: the time is ripe (again). In: Proceedings of Conference on Information and Knowledge Management (2007)
Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Greaves, M.: Semantic Web 2.0. IEEE Intelligent Systems 22(2) (2007)
Gruber, T.: Collective Knowledge Systems- Where the Social Web Meets the Semantic Web. Web Semantics (November 2007)
Singh, V.K.: Collective Intelligence Transforming the World Wide Web. CSI Communications (2010) (in Press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, V.K. (2010). Mining the Blogosphere for Sociological Inferences. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_51
Download citation
DOI: https://doi.org/10.1007/978-3-642-14834-7_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14833-0
Online ISBN: 978-3-642-14834-7
eBook Packages: Computer ScienceComputer Science (R0)