Skip to main content
Log in

A hybrid model for opinion mining based on domain sentiment dictionary

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Sentiment classification is an application of sentiment analysis, which is a popular research field in NLP. It can classify documents into different categories according to their sentiments. For a sentiment classification task, the first step is to extract sentimental features from documents, and then classify them using some classifiers. In the first step, a traditional way to extract sentimental features is to apply sentiment dictionaries. However, sentiment words may have different sentiment tendencies in different contexts, and traditional sentiment dictionaries does not consider this situation where wrong sentiment tendencies may be selected for sentiment words. In our research, we find that sentiment words will not have diverse meanings when they associate with the nearby aspects and entities in documents. Then, we propose a three layers sentiment dictionary, which can associate sentiment words with the corresponding entities and aspects together to reduce their multiple meanings. In the second step of the sentiment classification task, many classification models, such as SVM, GBDT, can be used to classify documents according to the extracted sentiment words. However, different classifiers have different weaknesses. A Stacking-based hybrid model is applied to combine SVM and GBDT together to overcome their weaknesses and reach higher performance. This hybrid model contains two layers, and the output of the first layer will become the input of the second layer. The first layer will generate different classification results according to different classifiers, while the second layer will automatically learn how to select a probable one as the final result. The experimental results show that our hybrid model outperforms the baseline single models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107

    Article  Google Scholar 

  2. Cavnar WB, Trenkle JM et al (1994) N-gram-based text categorization. Ann Arbor MI 48113(2):161–175

    Google Scholar 

  3. Dong Z, Dong Q (2006) Hownet and the computation of meaning. World Scientific, Singapore

    Book  Google Scholar 

  4. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  5. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  MathSciNet  MATH  Google Scholar 

  6. Fu Z, Huang F, Sun X, Vasilakos A, Yang C-N (2016) Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans Serv Comput PP:1–1

    Google Scholar 

  7. Fu Z, Ren K, Shu J, Sun X, Huang F (2016) Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans Parallel Distrib Syst 27(9):2546–2559

    Article  Google Scholar 

  8. Fu Z, Wu X, Guan C, Sun X, Ren K (2016) Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans Inf Forensics Secur 11(12):2706–2716

    Article  Google Scholar 

  9. Goldberg Y, Levy O (2014) word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722

  10. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’99. ACM, New York, NY, USA, pp 50–57

  11. Ko Y (2012) A study of term weighting schemes using class information for text classification. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1029–1030

  12. Lan M, Tan CL, Su J, Lu Y (2009) Supervised and traditional term weighting methods for automatic text categorization. Pattern Anal Mach Intell IEEE Trans 31(4):721–735

    Article  Google Scholar 

  13. Leopold E, Kindermann J (2002) Text categorization with support vector machines. How to represent texts in input space? Mach Learn 46(1–3):423–444

    Article  MATH  Google Scholar 

  14. Liu Bing (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Article  Google Scholar 

  15. Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on world wide web, WWW ’05. ACM, New York, NY, USA, pp 342–351

  16. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS’13 Proceedings of the 26th international conference on neural information processing systems, vol 2, 5–10 Dec 2013, Lake Tahoe, Nevada, pp 3111–3119

  17. Paik JH (2013) A novel tf-idf weighting scheme for e ective ranking. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’13. ACM, New York, NY, USA, pp 343–352

  18. Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135

    Article  Google Scholar 

  19. Papadimitriou CH, Tamaki H, Raghavan P, Vempala S (1998) Latent semantic indexing: a probabilistic analysis. In: Proceedings of the seventeenth ACM SIGACT–SIGMOD–SIGART symposium on principles of database systems, ACM, pp 159–168

  20. Quan X, Wenyin L, Qiu B (2011) Term weighting schemes for question categorization. Pattern Anal Mach Intell IEEE Trans 33(5):1009–1021

    Article  Google Scholar 

  21. Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16

    Article  Google Scholar 

  22. Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21

    Article  Google Scholar 

  23. Wang G, Hao J, Ma J, Jiang H (2011) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230

    Article  Google Scholar 

  24. Wang T, Cai Y, Leung H, Cai Z, Min H (2015) Entropy-based term weighting schemes for text categorization in VSM. In: Tools with artificial intelligence (ICTAI), 2015 IEEE 27th international conference. IEEE, Vietri sul Mare, Italy, pp 325–332

  25. Xia Z, Wang X, Sun X, Wang Q (2016) A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst 27(2):340–352

    Article  Google Scholar 

  26. Xue B, Fu C, Shaobin Z (2014) A study on sentiment computing and classification of sina weibo with word2vec. In: Big Data (BigData Congress), 2014 IEEE international congress. IEEE, Anchorage, AK, USA, pp 358–363

  27. Yang K, Cai Y, Huang D, Li J, Zhou Z, Lei X (2017) An effective hybrid model for opinion mining and sentiment analysis. In: Big data and smart computing (BigComp), 2017 IEEE international conference. IEEE, Jeju, South Korea, pp 465–466

Download references

Acknowledgements

This work is supported by the Fundamental Research Funds for the Central Universities, SCUT (No. 2017ZD048), Tiptop Scientific and Technical Innovative Youth Talents of Guangdong special support program (No. 2015TQ01X633), Science and Technology Planning Project of Guangdong Province, China (No. 2017B050506004), Science and Technology Program of Guangzhou (International Science & Technology Cooperation Program No. 201704030076), and the Internal Research Grant (RG 66/2016-2017) and the Funding Support to ECS Proposal (RG 23/2017-2018R) of The Education University of Hong Kong.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Cai.

Additional information

The preliminary version of this article has been published in ASC 2017 conjunction with BIGCOMP 2017 [27].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, Y., Yang, K., Huang, D. et al. A hybrid model for opinion mining based on domain sentiment dictionary. Int. J. Mach. Learn. & Cyber. 10, 2131–2142 (2019). https://doi.org/10.1007/s13042-017-0757-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-017-0757-6

Keywords

Navigation