Abstract
Sentiment classification is an application of sentiment analysis, which is a popular research field in NLP. It can classify documents into different categories according to their sentiments. For a sentiment classification task, the first step is to extract sentimental features from documents, and then classify them using some classifiers. In the first step, a traditional way to extract sentimental features is to apply sentiment dictionaries. However, sentiment words may have different sentiment tendencies in different contexts, and traditional sentiment dictionaries does not consider this situation where wrong sentiment tendencies may be selected for sentiment words. In our research, we find that sentiment words will not have diverse meanings when they associate with the nearby aspects and entities in documents. Then, we propose a three layers sentiment dictionary, which can associate sentiment words with the corresponding entities and aspects together to reduce their multiple meanings. In the second step of the sentiment classification task, many classification models, such as SVM, GBDT, can be used to classify documents according to the extracted sentiment words. However, different classifiers have different weaknesses. A Stacking-based hybrid model is applied to combine SVM and GBDT together to overcome their weaknesses and reach higher performance. This hybrid model contains two layers, and the output of the first layer will become the input of the second layer. The first layer will generate different classification results according to different classifiers, while the second layer will automatically learn how to select a probable one as the final result. The experimental results show that our hybrid model outperforms the baseline single models.
Similar content being viewed by others
References
Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107
Cavnar WB, Trenkle JM et al (1994) N-gram-based text categorization. Ann Arbor MI 48113(2):161–175
Dong Z, Dong Q (2006) Hownet and the computation of meaning. World Scientific, Singapore
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Fu Z, Huang F, Sun X, Vasilakos A, Yang C-N (2016) Enabling semantic search based on conceptual graphs over encrypted outsourced data. IEEE Trans Serv Comput PP:1–1
Fu Z, Ren K, Shu J, Sun X, Huang F (2016) Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans Parallel Distrib Syst 27(9):2546–2559
Fu Z, Wu X, Guan C, Sun X, Ren K (2016) Toward efficient multi-keyword fuzzy search over encrypted outsourced data with accuracy improvement. IEEE Trans Inf Forensics Secur 11(12):2706–2716
Goldberg Y, Levy O (2014) word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’99. ACM, New York, NY, USA, pp 50–57
Ko Y (2012) A study of term weighting schemes using class information for text classification. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1029–1030
Lan M, Tan CL, Su J, Lu Y (2009) Supervised and traditional term weighting methods for automatic text categorization. Pattern Anal Mach Intell IEEE Trans 31(4):721–735
Leopold E, Kindermann J (2002) Text categorization with support vector machines. How to represent texts in input space? Mach Learn 46(1–3):423–444
Liu Bing (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167
Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th international conference on world wide web, WWW ’05. ACM, New York, NY, USA, pp 342–351
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS’13 Proceedings of the 26th international conference on neural information processing systems, vol 2, 5–10 Dec 2013, Lake Tahoe, Nevada, pp 3111–3119
Paik JH (2013) A novel tf-idf weighting scheme for e ective ranking. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’13. ACM, New York, NY, USA, pp 343–352
Pang B, Lee L (2008) Opinion mining and sentiment analysis. Found Trends Inf Retr 2(1–2):1–135
Papadimitriou CH, Tamaki H, Raghavan P, Vempala S (1998) Latent semantic indexing: a probabilistic analysis. In: Proceedings of the seventeenth ACM SIGACT–SIGMOD–SIGART symposium on principles of database systems, ACM, pp 159–168
Quan X, Wenyin L, Qiu B (2011) Term weighting schemes for question categorization. Pattern Anal Mach Intell IEEE Trans 33(5):1009–1021
Rabiner L, Juang B (1986) An introduction to hidden Markov models. IEEE ASSP Mag 3(1):4–16
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21
Wang G, Hao J, Ma J, Jiang H (2011) A comparative assessment of ensemble learning for credit scoring. Expert Syst Appl 38(1):223–230
Wang T, Cai Y, Leung H, Cai Z, Min H (2015) Entropy-based term weighting schemes for text categorization in VSM. In: Tools with artificial intelligence (ICTAI), 2015 IEEE 27th international conference. IEEE, Vietri sul Mare, Italy, pp 325–332
Xia Z, Wang X, Sun X, Wang Q (2016) A secure and dynamic multi-keyword ranked search scheme over encrypted cloud data. IEEE Trans Parallel Distrib Syst 27(2):340–352
Xue B, Fu C, Shaobin Z (2014) A study on sentiment computing and classification of sina weibo with word2vec. In: Big Data (BigData Congress), 2014 IEEE international congress. IEEE, Anchorage, AK, USA, pp 358–363
Yang K, Cai Y, Huang D, Li J, Zhou Z, Lei X (2017) An effective hybrid model for opinion mining and sentiment analysis. In: Big data and smart computing (BigComp), 2017 IEEE international conference. IEEE, Jeju, South Korea, pp 465–466
Acknowledgements
This work is supported by the Fundamental Research Funds for the Central Universities, SCUT (No. 2017ZD048), Tiptop Scientific and Technical Innovative Youth Talents of Guangdong special support program (No. 2015TQ01X633), Science and Technology Planning Project of Guangdong Province, China (No. 2017B050506004), Science and Technology Program of Guangzhou (International Science & Technology Cooperation Program No. 201704030076), and the Internal Research Grant (RG 66/2016-2017) and the Funding Support to ECS Proposal (RG 23/2017-2018R) of The Education University of Hong Kong.
Author information
Authors and Affiliations
Corresponding author
Additional information
The preliminary version of this article has been published in ASC 2017 conjunction with BIGCOMP 2017 [27].
Rights and permissions
About this article
Cite this article
Cai, Y., Yang, K., Huang, D. et al. A hybrid model for opinion mining based on domain sentiment dictionary. Int. J. Mach. Learn. & Cyber. 10, 2131–2142 (2019). https://doi.org/10.1007/s13042-017-0757-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-017-0757-6