Skip to main content
Log in

Analyzing sentiments in Web 2.0 social media data in Chinese: experiments on business and marketing related Chinese Web forums

  • Published:
Information Technology and Management Aims and scope Submit manuscript

Abstract

Web 2.0 has brought a huge amount of user-generated, social media data that contains rich information about people’s opinions and ideas towards various products, services, and ongoing social and political events. Nowadays, many companies start to look into and try to leverage this new type of data to understand their customers in order to make better business strategies and services. As a nation with rapid economic growth in recently years, China has become visible and started to play an important role in the global business and economy. Also, with the large number of Chinese Internet users, a considerable amount of options about Chinese business and market have been expressed in social media sites. Thus, it will be of interest to explore and understand those user-generated contents in Chinese. In this study, we develop an integrated framework to analyze user sentiments from Chinese social media sites by leveraging sentiment analysis techniques. Based on the framework, we conduct experiments on two popular Chinese Web forums, both related to business and marketing. By utilizing Elastic Net together with a rich body of feature representations, we achieve the highest F-measures of 84.4 and 86.7 % for the two data sets, respectively. We also demonstrate the interpretability of Elastic Net by discussing the top-ranked features with positive or negative sentiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. O’Reilly T (2005) What is web 2.0? Design patterns and business models for the next generation of software. http://wwworeillynetcom/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20html

  2. Subrahmanian VS (2009) Mining online opinions. Computer 42(7):88–90

    Article  Google Scholar 

  3. Cheong C, Lee V (2011) A microblogging-based approach to terrorism informatics: exploration and chronicling civilian sentiment and response to terrorism events via Twitter. Inf Syst Frontier 13:45–49

    Article  Google Scholar 

  4. Hu N, Liu L, Zhang JJ (2008) Do online reviews affect product sales? The role of reviewer characteristics and temporal effects. Inf Technol Manag 9(3):201–214

    Article  Google Scholar 

  5. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 1(1–2):1–135

    Article  Google Scholar 

  6. Tweedie FJ, Baayen RH (1998) How variable may a constant be? measures of lexical richness in perspective. Comput Hum 32(5):323–352

    Article  Google Scholar 

  7. Li N, Wu DD (2010) Using text mining and sentiment analysis for online forums hotspot detection and forecast. Decis Support Syst 48(2):354–368

    Article  Google Scholar 

  8. Ye Q, Shi W, Li Y (2006) Sentiment classification for movie reviews in chinese by improved semantic oriented approach. Paper presented at the Proceedings of the 39th Hawaii International Conference on System Sciences (HICSS’06)

  9. Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in Web forums. ACM Trans Inf Syst 26(3):1–34

    Article  Google Scholar 

  10. Tan S, Zhang J (2008) An empirical study of sentiment analysis for Chinese documents. Expert Syst Appl 34(4):2622–2629

    Article  Google Scholar 

  11. Esuli A, Sebastiani F SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In: Proceedings of LREC-06, the 5th Conference on Language Resources and Evaluation, Genova, IT, 2006. pp 417–422

  12. Zagibalov T (2007) Kinds of Features for Chinese Opinionated Information Retrieval. Paper presented at the 45th Annual Meeting of the Association for Computational Linguistics (ACL’ 07)

  13. Zheng R, Li J, Chen H, Huang Z (2006) A framework for authorship identification of online messages: writing-style features and classification techniques. J Am Soc Inf Sci Technol (JASIST) 57(3):378–393

    Article  Google Scholar 

  14. Abbasi A, Chen H (2009) A comparison of fraud cues and classification methods for fake escrow website detection. Inf Technol Manag 10(2–3):83–101

    Article  Google Scholar 

  15. Benamara F, Cesarano C, Reforgiato D Sentiment analysis: adjectives and adverbs are better than adjectives alone. In: Proceedings of the International Conference on Weblogs and Social Media (ICWSM-2007), Boulder, CO, 2007. pp 203–206

  16. Glance N, Hurst M, Nigam K (2008) Deriving marketing intelligence from online discussion. Paper presented at the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08)

  17. Zeng D, Wei D, Chau M, Wang F (2011) Domain-specific Chinese word segmentation using suffix tree and mutual information. Inf Syst Frontier 13:115–125

    Article  Google Scholar 

  18. Tan S, Wang Y, Cheng X (2008) combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. Paper presented at the ACM Conference on research and development in information retrieval (sigir’08)

  19. Zhang C, Zeng D, Li J, Wang F, Zuo W (2009) Sentiment analysis of Chinese documents: from sentence to document level. J Am Soc Inf Sci Technol (JASIST) 60(12):2474–2487

    Article  Google Scholar 

  20. Li L, Sun M (2007) Experimental study on sentiment classification of chinese review using machine learning techniques. Paper presented at the International Conference on Natural Language Processing and Knowledge Engineering

  21. Haasdonk B (2005) Feature space interpretation of SVMs with indefinite kernels. IEEE Trans Pattern Anal Mach Intell 27(4):482–492

    Article  Google Scholar 

  22. Stets JE (ed) (2006) Emotions and sentiments. Handbook of Social Psychology, Springer US

    Google Scholar 

  23. Yuan G-X, Chang K-W, Hsieh C-J, Lin C-J (2010) A Comparison of optimization methods and software for large-scale L1-regularized linear classification. Journal of Machine Learning Research 11: J Mach Learn Res

  24. Genkin A, Lewis DD, Madigan D (2007) Large-scale bayesian logistic regression for text categorization. Technometrics 49(3):291–305

    Article  Google Scholar 

  25. Zhang T, Oles FJ (2001) Text categorization based on regularized linear classification methods. Inf Retrieval 4(1):5–31

    Article  Google Scholar 

  26. Ifrim G, Bakir G, Weikum G (2008) Fast logistic regression for text categorization with variable-length N-grams. Paper presented at the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08)

  27. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Roy Stat Soc 67(2):301–320

    Article  Google Scholar 

  28. Tseng H, Chang P, Andrew G, Jurafsky D, Manning C A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005. In: Fourth SIGHAN Workshop on Chinese Language Processing 2005

  29. Wei Z, Miao D, Chauchat JH (2008) Feature selection on Chinese text classification using character N-Grams. Paper presented at the The 3rd International Conference on Rough Sets and Knowledge Technology

  30. Zhai Z, Xu H, Kang B (2011) Exploiting effective features for Chinese sentiment classification. Expert Syst Appl 38(8):9139–9146

    Article  Google Scholar 

  31. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal: An Int J 1(3):131–156

    Article  Google Scholar 

  32. Piramuthu S (2005) Feature selection for reduction of tabular knowledge-based systems. Inf Technol Manag 6(4):351–362

    Article  Google Scholar 

  33. Sikora R, Piramuthu S (2005) Efficient genetic algorithm based data mining using feature selection with hausdorff distance. Inf Technol Manag 6(4):315–331

    Article  Google Scholar 

  34. Das SR, Chen MY (2007) Yahoo! for amazon: sentiment extraction from small talk on the web. Manag Sci 53(9):1357–1388

    Google Scholar 

  35. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explorations 11(1):10–18

    Article  Google Scholar 

  36. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1–22

    Google Scholar 

Download references

Acknowledgment

This work is supported by the NSF Computer and Network Systems (CNS) Program, “(CRI: CRD) Developing a Dark Web Collection and Infrastructure for Computational and Social Sciences,” (CNS-0709338).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yulei Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, L., Zhang, Y., Dang, Y. et al. Analyzing sentiments in Web 2.0 social media data in Chinese: experiments on business and marketing related Chinese Web forums. Inf Technol Manag 14, 231–242 (2013). https://doi.org/10.1007/s10799-013-0160-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10799-013-0160-2

Keywords

Navigation