Skip to main content

Sentiment Analysis on Chinese Health Forums: A Preliminary Study of Different Language Models

  • Conference paper
  • First Online:
Smart Health (ICSH 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9545))

Included in the following conference series:

Abstract

Sentiment analysis on Chinese health forums is challenging because of the language, platform, and domain characteristics. Our research investigates the impact of three factors on sentiment analysis: sentiment polarity distribution, language models, and model settings. We manually labeled a large sample of Chinese health forum posts, which showed an extremely unbalanced distribution with a very small percentage of negative posts, and found that the balanced training set could produce higher accuracy than the unbalanced one. We also found that the hybrid approaches combining multiple language model based approaches for sentiment analysis performed better than individual approaches. Finally we evaluated the effects of different model settings and improved the overall accuracy using the hybrid approaches in their optimal settings. Findings from this preliminary study provide deeper insights into the problem of sentiment analysis on Chinese health forums and will inform future sentiment analysis studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New York (2005)

    Chapter  Google Scholar 

  2. Chelba, C., Jelinek, F.: Recognition performance of a structured language model. arXiv:cs/0001022 (2000)

  3. Fox, S.: The social life of health information 2011. Pew Internet & American Life Project Washington, DC (2011)

    Google Scholar 

  4. Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, vol. 1, p. 12, Amherst, MA (1986)

    Google Scholar 

  5. Huh, J., Yetisgen-Yildiz, M., Pratt, W.: Text classification for assisting moderators in online health communities. J. Biomed. Inform. 46(6), 998–1005 (2013)

    Article  Google Scholar 

  6. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014. JMLR Proceedings, vol. 32, JMLR.org (2014)

    Google Scholar 

  7. Lee, C.Y., Lee, Z.J.: A novel algorithm applied to classify unbalanced data. Appl. Soft Comput. 12(8), 2481–2485 (2012)

    Article  Google Scholar 

  8. Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)

    Google Scholar 

  9. Mesnil, G., Ranzato, M., Mikolov, T., Bengio, Y.: Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. CoRR bs/1412.5335 (2014)

    Google Scholar 

  10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)

    Google Scholar 

  11. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010

    Google Scholar 

  12. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  13. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)

    Google Scholar 

  14. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)

    Google Scholar 

  15. Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)

    Article  Google Scholar 

  16. Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2. pp. 90–94. Association for Computational Linguistics (2012)

    Google Scholar 

  17. Xu, Y., Wang, L., He, J., Bi, Y., Li, M., Wang, T., Wang, L., Jiang, Y., Dai, M., Lu, J., et al.: Prevalence and control of diabetes in Chinese adults. JAMA 310(9), 948–959 (2013)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the National High-tech R&D Program of China (Grant No. SS2015AA020102), National Basic Research Program of China (Grant No. 2011CB302302), the 1000-Talent program, and the Tsinghua University Initiative Scientific Research Program. We appreciate the research assistance provided by Qingbo Cao, Yanshen Yin, and Xinhuan Chen at Tsinghua University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, Y., Zhang, Y., Xu, J., Xing, C., Chen, H. (2016). Sentiment Analysis on Chinese Health Forums: A Preliminary Study of Different Language Models. In: Zheng, X., Zeng, D., Chen, H., Leischow, S. (eds) Smart Health. ICSH 2015. Lecture Notes in Computer Science(), vol 9545. Springer, Cham. https://doi.org/10.1007/978-3-319-29175-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29175-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29174-1

  • Online ISBN: 978-3-319-29175-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics