Abstract
Sentiment analysis has emerged as an active research field due to the rapid growth of user-generated content on the Internet. This research area analyzes the opinions and attitudes of masses toward products, movies, topics, individuals, and services. Various machine learning and text mining algorithms have been used for sentiment analysis and classification. The recent research concludes that domain-specific lexicons perform significantly better as compared to domain-independent lexicons. The proposed research aims at improving the performance of general-purpose lexicons utilizing machine learning algorithms. A semi-supervised framework based on “MOMS” is introduced in order to determine the feature weight by incorporating SentiWordNet, a well-known general-purpose sentiment lexicon. The feature weights are learned by support vector machine, and the classification performance is enhanced by using Multi-Objective Model Selection procedure. Subjectivity criterion is used to select the desired features, and the effects of feature selection with respect to their part-of-speech information are studied comprehensively. Experimental evaluation is performed on seven different benchmark datasets which includes Large movie review dataset, Multi-domain sentiment dataset, and Cornell movie review dataset. The comparison of the proposed approach is performed with state-of-the-art techniques, lexicon-based approaches, and other methods for sentiment analysis. The proposed framework results in high performance when compared to other research in this field.
Similar content being viewed by others
Notes
https://wordnet.princeton.edu/ [Last Accessed: July 27, 2015].
https://www.jspell.com/java-spell-checker.html [Last Accessed: July 27, 2015].
http://nlp.stanford.edu/IR-book/html/htmledition/dropping-common-terms-stop-words-1.html [Last Accessed: July 27, 2015].
http://www.interopia.com/education/all-question-words-in-english/ [Last Accessed: July 27, 2015].
http://www.noslang.com/dictionary [Last Accessed: July 27, 2015].
http://nlp.stanford.edu/software/tagger.shtml [Last Accessed: July 28, 2015].
www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html [Last Accessed: July 28, 2015].
http://download.joachims.org/svm_light/current/svm_light_windows64.zip [Last Accessed: August 9, 2015].
References
Pang B, Lee L. Opinion mining and sentiment analysis. Found Trend Inf Retr. 2008;2:1–135.
Molina-González MD, Martínez-Cámara E, Martín-Valdivia MT, Ureña-López LA. A Spanish semantic orientation approach to domain adaptation for polarity classification. Inf Process Manage. 2015;51:520–31.
Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J. 2014;5(4):1093–113.
Saif H, He Y, Fernandez M, Alani H. Contextual semantics for sentiment analysis of Twitter. Inf Process Manag. 2015. doi:10.1016/j.ipm.2015.01.005.
Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst. 2015. doi:10.1016/j.knosys.2015.06.015.
Taboada M, Brooke J, Tofiloski M, Voll K, Stede M. Lexicon-based methods for sentiment analysis. Comput Ling. 2011;37(2):267–307.
Kang H, Yoo SJ, Han D. Senti-lexicon and improved Naive Bayes algorithms for sentiment analysis of restaurant reviews. Expert Syst Appl. 2012;39:6000–10.
Fan N, An YS, Li HX. Research on analyzing sentiment of texts based on k-nearest neighbor algorithm. Comput Eng Des. 2012;33(3):1160–64.
Etzioni O, Cafarella M, Downey D, Kok S, Popescu A, Shaked T, Soderland S, Weld D, Yates A. Unsupervised named-entity extraction from the web: an experimental study. Artif Intell. 2005;165(1):91–134.
Go A, Bhayani R, Huang L. Twitter sentiment classification using distant supervision. CS224 N project report, Stanford. 2009.
Kouloumpis E, Wilson T, Moore J. Twitter sentiment analysis: the good the bad and the omg! In: Proceedings of the ICWSM. Barcelona, Spain, 2011.
Pak A, Paroubek P. Twitter as a corpus for sentiment analysis and opinion mining. In: Proceedings of LREC, Valletta, Malta, 2010.
Khan FH, Bashir S, Qamar U. TOM: Twitter opinion mining framework using hybrid classification scheme. Decis Support Syst. 2014;57:245–57.
Pang B, Lee L. Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd annual meeting on association for computational linguistics, p 115–24, 2005.
Liu B, Li X, Lee WS, Yu PS. Text classification by labeling words. Am Assoc Artif Intel. 2004;4:425–30.
Baccianella S, Esuli A, Sebastiani F. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. Int Conf Lang Resourc Eval. 2010;10:2200–4.
Rice DR, Zorn C. Corpus-based dictionaries for sentiment analysis of specialized vocabularies. In: Proceedings of NDATAD, 2013.
Hung C, Lin HK. Using objective words in SentiWordNet to improve word-of-mouth sentiment classification. IEEE Intell Syst. 2013;2:47–54.
Sharma A, Dey S. Performance investigation of feature selection methods and sentiment lexicons for sentiment analysis. In Special Issue of International Journal of Computer Applications (0975 – 8887) on Advanced Computing and Communication Technologies for HPC Applications—ACCTHPCA, 2012.
Mudinas A, Zhang D, Levene M. Combining lexicon and learning based approaches for concept-level sentiment analysis. In: Proceedings of the first international workshop on issues of sentiment discovery and opinion mining, p. 5. New York: ACM; 2012.
Bagheri A, Saraee M, De Jong F. Care more about customers: unsupervised domain-independent aspect detection for sentiment analysis of customer reviews. Knowl-Based Syst. 2013;52:201–13.
Cho H, Kim S, Lee J, Lee JS. Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews. Knowl-Based Syst. 2014;71:61–71.
Montejo-Ráez A, Díaz-Galiano MC, Martinez-Santiago F, Ureña-López LA. Crowd explicit sentiment analysis. Knowl-Based Syst. 2014;69:134–9.
Franco-Salvador M, Cruz FL, Troyano JA, Rosso P. Cross-domain polarity classification using a knowledge-enhanced meta-classifier. Knowl Based Syst. 2015;86:46–56.
Bhaskar J, Sruthi K, Nedungadi P. Hybrid approach for emotion classification of audio conversation based on text and speech mining. Proc Comput Sci. 2015;46:635–43.
Zhou S, Chen Q, Wang X, Li X. Hybrid deep belief networks for semi-supervised sentiment classification. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, 2014, p. 1341–9.
Socher R, Pennington J, Huang EH, Ng AY, Manning CD. Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, 2011, p 151–61.
Verma S, Bhattacharyya P. Incorporating semantic knowledge for sentiment analysis. In: Proceedings of 6th international conference on natural language processing, 2009.
Ohana B, Tierney B. Sentiment classification of reviews using SentiWordNet. In: 9th IT&T conference, 2009, p. 13.
Ikeda D, Takamura H, Okumura M. Semi-supervised learning for blog classification. In: Proceedings of AAAI, 2008.
Li ST, Tsai FC. A fuzzy conceptualization model for text mining with application in opinion polarity classification. Knowl-Based Syst. 2013;39:23–33.
Davidov D, Tsur O, Rappoport A. Semi-supervised recognition of sarcastic sentences in twitter and Amazon. In: Proceedings of the fourteenth conference on computational natural, language learning, 2010, p. 107–16.
Huang S, Niu Z, Shi C. Automatic construction of domain-specific sentiment lexicon based on constrained label propagation. Knowl-Based Syst. 2014;56:191–200.
Wu Q, Tan S. A two-stage framework for cross-domain sentiment classification. Expert Syst Appl. 2011;38(11):14269–75.
Park S, Lee W, Moon IC. Efficient extraction of domain specific sentiment lexicon with active learning. Pattern Recogn Lett. 2015;56:38–44.
Agarwal B, Mittal N, Bansal P, Garg S. Sentiment analysis using common-sense and context information. Comput Intell Neurosci. 2015; Article ID 715730, 9. doi:10.1155/2015/715730.
Hamouda A, Marei M, Rohaim M. Building machine learning based senti-word lexicon for sentiment analysis. J Adv Inf Technol. 2011;2(4):199–203.
Pandarachalil R, Sendhilkumar S, Mahalakshmi GS. Twitter sentiment analysis for large-scale data: an unsupervised approach. Cogn Comput. 2015;7(2):254–62.
Ghosh M, Kar A. Unsupervised linguistic approach for sentiment classification from online reviews using SentiWordNet 3.0. Int J Eng Res Technol. 2013;2(9).
Singh VK, Piryani R, Uddin A, Waila P. Sentiment analysis of movie reviews: a new feature-based heuristic for aspect-level sentiment classification. In: International multi-conference on automation, computing, communication, control and compressed sensing (iMac4s), 2013, p. 712–7. IEEE.
Chikersal P, Poria S, Cambria E, Gelbukh A, Siong CE. Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Computational linguistics and intelligent text processing, p. 49–65. Berlin: Springer International Publishing; 2015.
He Y, Zhou D. Self-training from labeled features for sentiment analysis. Inf Process Manage. 2011;47(4):606–16.
Demiroz G, Yanikoglu B, Tapucu D, Saygin Y. Learning domain-specific polarity lexicons. In: IEEE 12th international conference on data mining workshops (ICDMW), 2012, p. 674–9. IEEE.
Poria S, Gelbukh A, Cambria E, Hussain A, Huang GB. EmoSenticSpace: a novel framework for affective common-sense reasoning. Knowl-Based Syst. 2014;69:108–23.
Weichselbraun A, Gindl S, Scharl A. Enriching semantic knowledge bases for opinion mining in big data applications. Knowl-Based Syst. 2014;69:78–85.
Poria S, Cambria E, Winterstein G, Huang GB. Sentic patterns: dependency-based rules for concept-level sentiment analysis. Knowl-Based Syst. 2014;69:45–63.
Recupero DR, Presutti V, Consoli S, Gangemi A, Nuzzolese AG. Sentilo: frame-based sentiment analysis. Cogn Comput. 2014;7(2):211–25.
Dragoni M, Tettamanzi AG, da Costa Pereira C. Propagating and aggregating fuzzy polarities for concept-level sentiment analysis. Cogn Comput. 2015;7(2):186–97.
Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1, p. 142–50. New York: Association for Computational Linguistics; 2011.
Pang B, Lee L. A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on association for computational linguistics, p. 271. New York: Association for Computational Linguistics; 2004.
Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: ACL, vol. 7, p. 440–7, 2007.
Varela PL, Martins AF, Aguiar PM, Figueiredo MA. An empirical study of feature selection for sentiment analysis. In: 9th conference on telecommunications, Conftele, Castelo Branco, 2013.
Singh PK, Husain MS. Methodological study of opinion mining and sentiment analysis techniques. Int J Soft Comput. 2014;5(1):11.
Kalaivani P, Shunmuganathan KL. Feature reduction based on genetic algorithm and hybrid model for opinion mining. Sci Progr. 2015; Article ID 961454, 15. doi:10.1155/2015/961454.
Xia R, Zong C, Li S. Ensemble of feature sets and classification algorithms for sentiment classification. Inf Sci. 2011;181(6):1138–52.
Su F, Markert K. 2008. From words to senses: a case study of subjectivity recognition. In: Proceedings of the 22nd international conference on computational linguistics, vol 1, p. 825–32. New York: Association for Computational Linguistics.
Wang G, Sun J, Ma J, Xu K, Gu J. Sentiment classification: the contribution of ensemble learning. Decis Support Syst. 2014;57:77–93.
Dhande LL, Patnaik GK. Analyzing sentiment of movie review data using naive Bayes neural classifier. Int J Emerg Trends Technol Comput Sci. 2014;3:313–20.
Liu B, Blasch E, Chen Y, Shen D, Chen G. Scalable sentiment classification for big data analysis using naive Bayes classifier. In: IEEE international conference on Big Data, 2013, p. 99–104. IEEE. 2013.
Lin C, He Y, Everson Y. A comparative study of bayesian models for unsupervised sentiment. In: Proceedings of the fourteenth conference on computational natural language learning, Uppsala, Sweden, p. 144–52. 2010.
Serrano-Guerrero J, Olivas JA, Romero FP, Herrera-Viedma E. Sentiment analysis: a review and comparative analysis of web services. Inf Sci. 2015;311:18–38.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
Farhan Hassan Khan, Usman Qamar, and Saba Bashir declare that they have no conflict of interest.
Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Declaration of Helsinki 1975, as revised in 2008 (5). Additional informed consent was obtained from all patients for which identifying information is included in this article.
Human and Animal Rights
This article does not contain any studies with human or animals studies performed by any of the authors.
Rights and permissions
About this article
Cite this article
Khan, F.H., Qamar, U. & Bashir, S. Multi-Objective Model Selection (MOMS)-based Semi-Supervised Framework for Sentiment Analysis. Cogn Comput 8, 614–628 (2016). https://doi.org/10.1007/s12559-016-9386-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-016-9386-8