Skip to main content

Chinese Blog Clustering by Hidden Sentiment Factors

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5678))

Abstract

In the Web age, blogs have become the major platform for people to express their opinions and sentiments. The traditional blog clustering methods usually group blogs by keywords, stories or timelines, which do not consider opinions and emotions expressed in the articles. In this paper, a novel method based on Probabilistic Latent Semantic Analysis (PLSA) is presented to model the hidden emotion factors and an emotion-oriented clustering approach is proposed according to the sentiment similarities between Chinese blogs. Extensive experiments were conducted on real world blog datasets with different topics and the results show that our approach can cluster Chinese blogs into sentiment coherent groups to allow for better organization and easy navigation.

This work is supported by National Natural Science Foundation of China (No. 60573090, 60703068, 60673139) and the National High-Tech Development Program (2008AA01Z146).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bansal, N., Chiang, F., Koudas, N., Tompa, F.: Seeking Stable Clusters in the Blogosphere. In: 33rd International Conference on Very Large Data Bases, pp. 390–398 (2007)

    Google Scholar 

  2. Bar-Ilan, J.: An Outsider’s View on “Topic-oriented” Blogging. In: 13th International Conference on World Wide Web Alternate Papers Track, pp. 28–34 (2004)

    Google Scholar 

  3. Bekkerman, R., Raghavan, H., Allan, J., Eguchi, K.: Interactive Clustering of Text Collections According to a User-Specified Criterion. In: 20th International Joint Conference on Artificial Intelligence, pp. 684–689 (2007)

    Google Scholar 

  4. China Internet Network Information Center (CNNIC), http://www.cnnic.cn/en/index

  5. Glance, N., Hurst, M., Tornkiyo, T.: Blogpulse: Automated Trend Discovery for Weblogs. In: WWW 2004 Workshop on the Weblogging Ecosystem (2004)

    Google Scholar 

  6. Google Blog Search, http://blogsearch.google.com

  7. Hofmann, T.: Probabilistic Latent Semantic Indexing. In: 22nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)

    Google Scholar 

  8. HowNet, http://www.keenage.com/html/e_index.html

  9. ICTCLAS, http://www.ictclas.org

  10. Ku, L., Chen, H.: Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology 58(12), 1838–1850 (2007)

    Article  Google Scholar 

  11. Kumar, R., Novak, J., Raghavan, P., Tomkins, A.: Structure and Evolution of Blogspace. Commun. ACM 47(12), 35–39 (2004)

    Article  Google Scholar 

  12. Liu, Y., Huang, X., An, A., Yu, X.: ARSA: a Sentiment-aware Model for Predicting Sales Performance Using Blogs. In: 30th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 607–614 (2007)

    Google Scholar 

  13. Lu, Y., Zhai, C.: Opinion Integration through Semi-supervised Topic Modeling. In: 17th International Conference on World Wide Web, pp. 121–130 (2008)

    Google Scholar 

  14. Mei, Q., Zhai, C.: A Mixture Model for Contextual Text Mining. In: Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 649–655 (2006)

    Google Scholar 

  15. MSN Spaces, http://home.services.spaces.live.com

  16. Ni, X., Xue, G., Ling, X., Yu, Y., Yang, Q.: Exploring in the Weblog Space by Detecting Informative and Affective Articles. In: 16th International Conference on World Wide Web, pp. 281–290 (2007)

    Google Scholar 

  17. Qamra, A., Tseng, B., Chang, E.: Mining Blog Stories Using Community Based and Temporal Clustering. In: Thirteen ACM Conference on Information and Knowledge Management, pp. 390–398 (2004)

    Google Scholar 

  18. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification Using Machine Learning Techniques. In: 2002 Conference on Empirical Methods in Natural Language Processing, pp. 79–86 (2002)

    Google Scholar 

  19. Shen, D., Sun, J., Yang, Q., Chen, Z.: Latent Friend Mining from Blog Data. In: 6th IEEE International Conference on Data Mining, pp. 552–561 (2006)

    Google Scholar 

  20. Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. In: Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)

    Google Scholar 

  21. Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: 40th Annual Meeting of the Association for Computational Linguistics, pp. 417–424 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Feng, S., Wang, D., Yu, G., Yang, C., Yang, N. (2009). Chinese Blog Clustering by Hidden Sentiment Factors. In: Huang, R., Yang, Q., Pei, J., Gama, J., Meng, X., Li, X. (eds) Advanced Data Mining and Applications. ADMA 2009. Lecture Notes in Computer Science(), vol 5678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03348-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03348-3_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03347-6

  • Online ISBN: 978-3-642-03348-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics