Skip to main content

Mining the Blogosphere for Sociological Inferences

  • Conference paper
Contemporary Computing (IC3 2010)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 94))

Included in the following conference series:

Abstract

The blogosphere, which is the name given to the universe of all blog sites, is now a collection of a tremendous amount of user generated data. The ease & simplicity of creating blog posts and their free form and unedited nature have made the blogosphere a rich and unique source of data, which has attracted people and companies across disciplines to exploit it for varied purposes. The large volume of data requires developing appropriate automated techniques for searching and mining useful inferences from the blogosphere. The valuable data contained in posts from a large number of users across geographic, demographic and cultural boundaries provide a rich opportunity for not only commercial exploitation but also for cross-cultural psychological & sociological research. This paper tries to present the broader picture in and around this theme, chart the required academic and technological framework for the purpose and presents initial results of an experimental work to demonstrate the plausibility of the idea.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Technorati Blogosphere Statistics (2008), http://technorati.com/blogging/state-of-the-blogosphere/

  2. Kritikopoulos, A., Sideri, M., Varlamis, I.: Bogrank: Ranking Weblogs based on connectivity and similarity features. In: AAA-IDEA 2006- Proceedings of the 2nd International Workshop on Advanced Architectures and Algorithms for Internet Delivery and Applications. ACM Press, New York (2006)

    Google Scholar 

  3. Leskovec, J., McGlohon, M., Faloutsos, C., Glance, N., Hurst, M.: Cascading Behaviour in Large Blog Graphs. In: SIAM International Conference on Data Mining (2007)

    Google Scholar 

  4. Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: On the Bursty Evolution of Blogspace. In: Proceedings of 12th International Conference on World Wide Web, pp. 568–576. ACM Press, New York (2003)

    Google Scholar 

  5. Brooks, C.H., Montanez, N.: Improved Annotation of Blogosphere via Autotagging and Hierarchical Clustering. In: WWW 2006: Proceedings of 15th International Conference on World Wide Web, pp. 625–632. ACM Press, New York (2006)

    Chapter  Google Scholar 

  6. Li, B., Xu, S., Zhang, J.: Enhancing Clustering Blog Documents by author/ reader comments. In: ACM-SE 45: Proceedings of 45th Annual Southeast Regional Conference, pp. 94–99. ACM Press, New York (2007)

    Chapter  Google Scholar 

  7. Agarwal, N., Galan, M., Liu, H., Subramanya, S.: Clustering Blogs with Collective Wisdom. In: Proceedings of International Conference on Web Engineering (2008)

    Google Scholar 

  8. Gammon, M., Aue, A., Corston-Oliver, S., Ringger, E.: Pulse: Mining Customer Opinions from Free Text. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds.) IDA 2005. LNCS, vol. 3646, pp. 121–132. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  9. Liu, B.: Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, Heidelberg (2006)

    Google Scholar 

  10. Blanchard, A., Markus, M.: The Experienced Sense of a Virtual Community- Characteristics and Processes. The DATA BASE for Advances in Information Systems 35(1) (2004)

    Google Scholar 

  11. Efimova, L., Hendrick, S.: In Search for a Virtual Settlement: An Exploration of Weblog Community Boundaries. IEEE Computer Society Press (2005)

    Google Scholar 

  12. Lu, Y., Lee, H.: Blog Community Discovery Based on Tag Data Clustering. In: 2008 Asia-Pacific Workshop on Computational Intelligence & Industrial Application. IEEE Computer Society Press, Los Alamitos (2008)

    Google Scholar 

  13. Chin, A., Chignell, M.: A Social Hypertext Model for finding Community in Blogs. In: HYPERTEXT 2006: Proceedings of Seventeenth Conference on Hypertext and Hypermedia, pp. 11–12. ACM Press, New York (2006)

    Chapter  Google Scholar 

  14. Agarwal, N., Liu, H., Tang, L., Yu, P.S.: Identifying the Influential Bloggers in a Community. In: Proceedings of International Conference on Web Search and Web Data Mining, pp. 207–218. ACM Press, Palo Alto (2008)

    Chapter  Google Scholar 

  15. Ntoulas, A., Najork, M., Manasse, M., Fetterl, D.: Detecting Spam Web Pages through Content Analysis. In: Proceedings of 15th International Conference on World Wide Web, WWW (2006)

    Google Scholar 

  16. Gyongyi, Z., Berkhin, P., Gracia-Molina, H., Pedersen, J.: Link Spam Detection Based on Mass Estimation. In: Proceedings of the 32nd International Conference on Very Large Databases, VLDB (2006)

    Google Scholar 

  17. Kolari, P., Finin, T., Joshi, A.: SVMs for Blogosphere: Blog Identification and Splog Detection. In: AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs. AAAI, Menlo Park (2006)

    Google Scholar 

  18. Kolari, P., Java, A., Finin, T., Oates, T., Joshi, A.: Detecting Spam Blogs: A Machine Learning Approach. In: Proceedings of 21st National Conference on Artificial Intelligence (AAAI). AAAI, Menlo Park (2006)

    Google Scholar 

  19. Alag, S.: Collective Intelligence in Action. In: Manning, New York, pp. 111–144 (2009)

    Google Scholar 

  20. Online Sentiment Analysis: Free and Paid tools, http://www.rockyfu.com/blog/sentiment-analysis/ (reteieved August 2009)

  21. Sood, S.O., Vasserman, L.: Esse: Exploring mood on the Web. In: Proceedings of International Conference on Weblogs and Social Media, Seattle (May 2009)

    Google Scholar 

  22. Godbole, N., Srinivasaiah, M., Skiena, S.: Large Scale Sentiment Analysis for News and Blogs. In: Proceedings of the International Conference on Weblogs and Social Media, ICWSM (2007)

    Google Scholar 

  23. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Journal of Foundation and Trends in Information Retrieval 2 (2008)

    Google Scholar 

  24. Esuli, A., Sebastiani, F.: SentiWordNet: A Publicly available lexical resource for opinion mining. In: Proceedings of the fifth Conference on Language Resources and Evaluation (LREC 2006), Geneva (2006)

    Google Scholar 

  25. WEKA- Waikato Environment for Knowledge Analysis, http://www.cs.waikato.ac.nz/ml/weka/ (retrieved May 2009)

  26. JDM-Java Data Mining API 2.0, JSR 247, http://www.jcp.org/en/jsr/detail?id=247 (retrieved May 2009)

  27. Singh, V.K., Jalan, R., Chaturvedi, S.K., Gupta, A.K.: Collective Intelligence Based Computational Approach to Web Intelligence. In: Proceedings of 2009 International Conference on Web Information Systems and Mining. IEEE Computer Society Press, Shanghai (November 2009)

    Google Scholar 

  28. Singh, V.K., Mahata, D., Adhikari, R.: A Clustering and Opinion Extraction Based Approach to Socio-political Analysis of the Blogosphere. In: Communicated to appear in 2010 IEEE International Conference on Computational Intelligence and Computing Research. IEEE Xplore, Coimbatore (December 2010)

    Google Scholar 

  29. Subject Search Summarizer tool, by Kryloff technologies, http://www.kryltech.com/summarizer.htm (retrieved April 2009)

  30. Hovy, E., Marcu, D.: Automatic Text Summarization Tutorial. In: Proceedings of the Workshop on Intelligent Scalable Text Summarization, ACL/EACL Conference, Madrid, pp. 66–73 (1998)

    Google Scholar 

  31. TagCrowd Beta, Tag Cloud Generation tool, http://www.tagcrowd.com/ (retrieved April 2009)

  32. Miller, G.A.: Wordnet: A Lexical Database for English. Communications of the ACM 38(11), 39–41 (1995), http://wordnet.princeton.edu

    Article  Google Scholar 

  33. Uclassify Mood Analysis tool, http://www.uclassify.com/browse/prfekt/Mood (retrieved April 2009)

  34. Mishne, G., Rijke, M.D.: MoodViews: Tools for Blog Mood Analysis. In: AAAI 2006 Spring Symposium on Computational Approaches to Analyzing Weblogs, AAAI-CAAW 2006 (March 2006)

    Google Scholar 

  35. Balog, K., Rijke, M.D.: Decomposing Bloggers’ Moods. In: 3rd Annual Workshop on the Web blogging Ecosystem, at WWW 2006 (2006)

    Google Scholar 

  36. Attardi, G., Simi, M.: Blog Mining through Opinionated Words. In: Proceedings of Fifteenth Text Retrieval Conference, TREC (2006)

    Google Scholar 

  37. Agarwal, N., Liu, H.: Data Mining and Knowledge Discovery in Blogs. Morgan & Claypool Publishers, San Francisco (2010)

    Google Scholar 

  38. Jones, K.S.: What is the Role of Natural Language Processing in Information Retrieval In Natural Language Information Retrieval. In: Strzalkowski, T. (ed.) Text, Speech and Language Technology. Springer, Heidelberg (1999)

    Google Scholar 

  39. Lease, M.: Natural Language Processing for Information Retrieval: the time is ripe (again). In: Proceedings of Conference on Information and Knowledge Management (2007)

    Google Scholar 

  40. Chakrabarti, S.: Mining the Web: Discovering Knowledge from Hypertext Data. Morgan Kaufmann, San Francisco (2002)

    Google Scholar 

  41. Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  42. Greaves, M.: Semantic Web 2.0. IEEE Intelligent Systems 22(2) (2007)

    Google Scholar 

  43. Gruber, T.: Collective Knowledge Systems- Where the Social Web Meets the Semantic Web. Web Semantics (November 2007)

    Google Scholar 

  44. Singh, V.K.: Collective Intelligence Transforming the World Wide Web. CSI Communications (2010) (in Press)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Singh, V.K. (2010). Mining the Blogosphere for Sociological Inferences. In: Ranka, S., et al. Contemporary Computing. IC3 2010. Communications in Computer and Information Science, vol 94. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14834-7_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14834-7_51

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14833-0

  • Online ISBN: 978-3-642-14834-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics