Sentiment Analysis on Chinese Health Forums: A Preliminary Study of Different Language Models

Zhang, Yan; Zhang, Yong; Xu, Jennifer; Xing, Chunxiao; Chen, Hsinchun

doi:10.1007/978-3-319-29175-8_7

Yan Zhang¹⁷,
Yong Zhang¹⁷,
Jennifer Xu¹⁸,
Chunxiao Xing¹⁷ &
…
Hsinchun Chen^17,19

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9545))

Included in the following conference series:

ICSH

2517 Accesses
1 Altmetric

Abstract

Sentiment analysis on Chinese health forums is challenging because of the language, platform, and domain characteristics. Our research investigates the impact of three factors on sentiment analysis: sentiment polarity distribution, language models, and model settings. We manually labeled a large sample of Chinese health forum posts, which showed an extremely unbalanced distribution with a very small percentage of negative posts, and found that the balanced training set could produce higher accuracy than the unbalanced one. We also found that the hybrid approaches combining multiple language model based approaches for sentiment analysis performed better than individual approaches. Finally we evaluated the effects of different model settings and improved the overall accuracy using the hybrid approaches in their optimal settings. Findings from this preliminary study provide deeper insights into the problem of sentiment analysis on Chinese health forums and will inform future sentiment analysis studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Survey on Sentiment Analysis in Persian: a Comprehensive System Perspective Covering Challenges and Advances in Resources and Methods

Article 21 June 2021

Confused and Thankful: Multi-label Sentiment Classification of Health Forums

A Review of Sentiment Analysis Research in Chinese Language

Article 08 May 2017

References

Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New York (2005)
Chapter Google Scholar
Chelba, C., Jelinek, F.: Recognition performance of a structured language model. arXiv:cs/0001022 (2000)
Fox, S.: The social life of health information 2011. Pew Internet & American Life Project Washington, DC (2011)
Google Scholar
Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, vol. 1, p. 12, Amherst, MA (1986)
Google Scholar
Huh, J., Yetisgen-Yildiz, M., Pratt, W.: Text classification for assisting moderators in online health communities. J. Biomed. Inform. 46(6), 998–1005 (2013)
Article Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, Beijing, China, 21–26 June 2014. JMLR Proceedings, vol. 32, JMLR.org (2014)
Google Scholar
Lee, C.Y., Lee, Z.J.: A novel algorithm applied to classify unbalanced data. Appl. Soft Comput. 12(8), 2481–2485 (2012)
Article Google Scholar
Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y., Potts, C.: Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 142–150. Association for Computational Linguistics (2011)
Google Scholar
Mesnil, G., Ranzato, M., Mikolov, T., Bengio, Y.: Ensemble of generative and discriminative techniques for sentiment analysis of movie reviews. CoRR bs/1412.5335 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013)
Google Scholar
Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association, Makuhari, Chiba, Japan, pp. 1045–1048, 26–30 September 2010
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, vol. 10, pp. 79–86. Association for Computational Linguistics (2002)
Google Scholar
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)
Google Scholar
Taboada, M., Brooke, J., Tofiloski, M., Voll, K., Stede, M.: Lexicon-based methods for sentiment analysis. Comput. Linguist. 37(2), 267–307 (2011)
Article Google Scholar
Wang, S., Manning, C.D.: Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2. pp. 90–94. Association for Computational Linguistics (2012)
Google Scholar
Xu, Y., Wang, L., He, J., Bi, Y., Li, M., Wang, T., Wang, L., Jiang, Y., Dai, M., Lu, J., et al.: Prevalence and control of diabetes in Chinese adults. JAMA 310(9), 948–959 (2013)
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National High-tech R&D Program of China (Grant No. SS2015AA020102), National Basic Research Program of China (Grant No. 2011CB302302), the 1000-Talent program, and the Tsinghua University Initiative Scientific Research Program. We appreciate the research assistance provided by Qingbo Cao, Yanshen Yin, and Xinhuan Chen at Tsinghua University.

Author information

Authors and Affiliations

Research Institute of Information Technology, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing, China
Yan Zhang, Yong Zhang, Chunxiao Xing & Hsinchun Chen
Computer Information Systems, Bentley University, Waltham, USA
Jennifer Xu
MIS Department, University of Arizona, Tucson, USA
Hsinchun Chen

Authors

Yan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jennifer Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chunxiao Xing
View author publications
You can also search for this author in PubMed Google Scholar
Hsinchun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yan Zhang .

Editor information

Editors and Affiliations

Institute of Automation,Bldg.1004, Chinese Academy of Sciences, Beijing, China
Xiaolong Zheng
University of Arizona, Tucson, Arizona, USA
Daniel Dajun Zeng
University of Arizona, Phoenix, USA
Hsinchun Chen
Mayo Clinic, Scottsdale, USA
Scott J. Leischow

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Y., Zhang, Y., Xu, J., Xing, C., Chen, H. (2016). Sentiment Analysis on Chinese Health Forums: A Preliminary Study of Different Language Models. In: Zheng, X., Zeng, D., Chen, H., Leischow, S. (eds) Smart Health. ICSH 2015. Lecture Notes in Computer Science(), vol 9545. Springer, Cham. https://doi.org/10.1007/978-3-319-29175-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-29175-8_7
Published: 20 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29174-1
Online ISBN: 978-3-319-29175-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics