Skip to main content
Log in

Document-level sentiment classification using hybrid machine learning approach

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

It is a practice that users or customers intend to share their comments or reviews about any product in different social networking sites. An analyst usually processes to reviews properly to obtain any meaningful information from it. Classification of sentiments associated with reviews is one of these processing steps. The reviews framed are often made in text format. While processing the text reviews, each word of the review is considered as a feature. Thus, selection of right kind of features needs to be carried out to select the best feature from the set of all features. In this paper, the machine learning algorithm, i.e., support vector machine, is used to select the best features from the training data. These features are then given input to artificial neural network method, to process further. Different performance evaluation parameters such as precision, recall, f-measure, accuracy have been considered to evaluate the performance of the proposed approach on two different datasets, i.e., IMDb dataset and polarity dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Pang B, Lee L, Vaithyanathan S (2002) Thumbs up?: Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol 10, Association for Computational Linguistics, 2002, pp 79–86

  2. Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, 2004, p 271

  3. Turney PD (2002) Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th annual meeting on Association for Computational Linguistics, 2002, pp 417–424

  4. Liu B (2012) Sentiment analysis and opinion mining. Synth Lect Hum Lang Technol 5(1):1–167

    Article  MathSciNet  Google Scholar 

  5. Feldman R (2013) Techniques and applications for sentiment analysis. Commun ACM 56(4):82–89

    Article  Google Scholar 

  6. Gautam G, Yadav D (2014) Sentiment analysis of twitter data using machine learning approaches and semantic analysis. In: 2014 seventh international conference on contemporary computing (IC3), IEEE, 2014, pp 437–442

  7. Hastie T, Tibshirani R, Friedman J (2009) Unsupervised learning. Springer, Berlin

    Book  MATH  Google Scholar 

  8. Hady MFA, Schwenker F (2013) Semi-supervised learning. In: Bianchini M, Maggini M, Jain LC (eds) Handbook on neural information processing. Springer, Berlin, pp 215–239

  9. IMDb, Internet movie database (IMDb) (2011). http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz

  10. Garreta R, Moncecchi G (2013) Learning scikit-learn: machine Learning in Python. Packt Publishing Ltd, Birmingham

    Google Scholar 

  11. Matsumoto S, Takamura H, Okumura M (2005) Sentiment classification using word sub-sequences and dependency sub-trees. In: Ho TB, Chung D, Liu H (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 301–311

    Chapter  Google Scholar 

  12. Moraes R, Valiati JF, Neto WPG (2013) Document-level sentiment classification: an empirical comparison between SVM and ANN. Expert Syst Appl 40(2):621–633

    Article  Google Scholar 

  13. Tang D (2015) Sentiment-specific representation learning for document-level sentiment analysis. In: Proceedings of the eighth ACM international conference on web search and data mining, ACM, 2015, pp 447–452

  14. Tu Z, He Y, Foster J, van Genabith J, Liu Q, Lin S (2012) Identifying high-impact sub-structures for convolution kernels in document-level sentiment classification. In: Proceedings of the 50th annual meeting of the Association for Computational Linguistics: short papers, vol 2, Association for Computational Linguistics, 2012, pp 338–343

  15. Liu SM, Chen J-H (2015) A multi-label classification based approach for sentiment classification. Expert Syst Appl 42(3):1083–1093

    Article  Google Scholar 

  16. Zhang D, Xu H, Su Z, Xu Y (2015) Chinese comments sentiment classification based on word2vec and SVM perf. Expert Syst Appl 42(4):1857–1863

    Article  Google Scholar 

  17. Luo B, Zeng J, Duan J (2016) Emotion space model for classifying opinions in stock message board. Expert Syst Appl 44:138–146

    Article  Google Scholar 

  18. Niu T, Zhu S, Pang L, El Saddik A (2016) Sentiment analysis on multi-view social data. In: Tian Q, Sebe N, Qi G, Huet B, Hong R, Liu X (eds) Multimedia modeling. Springer, Berlin, pp 15–27

    Chapter  Google Scholar 

  19. Tripathy A, Agrawal A, Rath SK (2016) Classification of sentiment reviews using n-gram machine learning approach. Expert Syst Appl 57:117–126

    Article  Google Scholar 

  20. Govindarajan M (2013) Sentiment analysis of movie reviews using hybrid method of naive bayes and genetic algorithm. Int J Adv Comput Res 3(4):139

    Google Scholar 

  21. Abbasi A, Chen H, Salem A (2008) Sentiment analysis in multiple languages: feature selection for opinion classification in web forums. ACM Trans Inf Syst (TOIS) 26(3):12

    Article  Google Scholar 

  22. Balage Filho PP, Avanço L, Pardo TA, Nunes MG (2014) NILC USP: an improved hybrid system for sentiment analysis in Twitter messages. SemEval 2014:428

    Google Scholar 

  23. Jagtap B, Dhotre V (2014) SVM and HMM based hybrid approach of sentiment analysis for teacher feedback assessment. Int J Emerg Trends Technol Comput Sci (IJETCS) 3(3):229–232

    Google Scholar 

  24. Wang S, Wei Y, Li D, Zhang W, Li W (2007) A hybrid method of feature selection for Chinese text sentiment classification, In: Fourth international conference on fuzzy systems and knowledge discovery, 2007 (FSKD 2007), vol 3, IEEE, 2007, pp 435–439

  25. Babatunde O, Armstrong L, Leng J, Diepeveen D (2014) A genetic algorithm-based feature selection. Br J Math Comput Sci 4(21):889–905

    Google Scholar 

  26. Neumann J, Schnörr C, Steidl G (2005) Combined SVM-based feature selection and classification. Mach Learn 61(1–3):129–150

    Article  MATH  Google Scholar 

  27. Fernandez-Lozano C, Seoane JA, Gestal M, Gaunt TR, Dorado J, Campbell C (2015) Texture classification using feature selection and kernel-based techniques. Soft Comput 19(9):2469–2480

    Article  Google Scholar 

  28. Maldonado S, Weber R, Basak J (2011) Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf Sci 181(1):115–128

    Article  Google Scholar 

  29. Zheng L, Wang H, Gao S (2015) Sentimental feature selection for sentiment analysis of Chinese online reviews. Int J Mach Learn Cybern 6:1–10

  30. Sharma A, Dey S (2012) A comparative study of feature selection and machine learning techniques for sentiment analysis. In: Proceedings of the 2012 ACM Research in Applied Computation Symposium, ACM, 2012, pp 1–7

  31. Hardin D, Tsamardinos I, Aliferis CF (2004) A theoretical characterization of linear svm-based feature selection. In: Proceedings of the twenty-first international conference on machine learning, ACM, 2004, p 48

  32. Tang H, Tan S, Cheng X (2009) A survey on sentiment detection of reviews. Expert Syst Appl 36(7):10760–10773

    Article  Google Scholar 

  33. Refaeilzadeh P, Tang L, Liu H Cross-validation. http://www.public.asu.edu.tang9/papers/ency-cross-validation.pdf

  34. Hsu CW, Chang CC, Lin CJ (2003) A practical guide to support vector classification. Technical Report, Department of Computer Science, National Taiwan University

  35. Zhang GP (2000) Neural networks for classification: a survey. IEEE Trans Syst Man Cybern C Appl Rev 30(4):451–462

    Article  Google Scholar 

  36. Reby D, Lek S, Dimopoulos I, Joachim J, Lauga J, Aulagnier S (1997) Artificial neural networks as a classification method in the behavioural sciences. Behav Process 40(1):35–43

    Article  Google Scholar 

  37. Mouthami K, Devi KN, Bhaskaran VM (2013) Sentiment analysis and classification based on textual reviews. In: 2013 international conference on information communication and embedded systems (ICICES), IEEE, 2013, pp 271–276

  38. Salvetti F, Lewis S, Reichenbach C (2004) Automatic opinion polarity classification of movie. Colo Res Linguist 17:2

    Google Scholar 

  39. Mullen T, Collier N (2004) Sentiment analysis using support vector machines with diverse information sources. In: Lin D, Wu D (eds) EMNLP, vol 4, pp 412–418

  40. Beineke P, Hastie T, Vaithyanathan S (2004) The sentimental factor: improving review classification via human-provided information. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics, 2004, p 263

  41. Whitelaw C, Garg N, Argamon S (2005) Using appraisal groups for sentiment analysis. In: Proceedings of the 14th ACM international conference on information and knowledge management, ACM, 2005, pp 625–631

  42. Aue A, Gamon M (2005) Customizing sentiment classifiers to new domains: a case study. In: Proceedings of recent advances in natural language processing (RANLP), vol. 1, 2005, pp 1–7

  43. Read J (2005) Using emoticons to reduce dependency in machine learning techniques for sentiment classification. In: Proceedings of the ACL student research workshop, Association for Computational Linguistics, 2005, pp 43–48

  44. Kennedy A, Inkpen D (2006) Sentiment classification of movie reviews using contextual valence shifters. Comput Intell 22(2):110–125

    Article  MathSciNet  Google Scholar 

  45. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning, pp 137–142

  46. Socher R, Perelygin A, Wu JY, Chuang J, Manning C, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), pp 1642–1654

  47. Cao Y, Xu R, Chen T (2015) Combining convolutional neural network and support vector machine for sentiment classification. In: Chinese national conference on social media processing, pp 144–155

  48. Liu B (2015) Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge University Press, Cambridge

    Book  Google Scholar 

  49. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  50. van Rijsbergen CJ, Robertson SE, Porter MF, Martin F (1980) New models in probabilistic information retrieval. British Library Research and Development Department, London

    Google Scholar 

  51. Goldberg Y, Levy O (2014) word2vec Explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722

  52. Blake C, Merz CJ (1998) \(\{\text{UCI}\}\) Repository of machine learning databases. University of California, Dept. of Inform. Computer science, Irvine, CA, Available: http://www.ics.uci.edu/mlearn/ML-Repository.html

  53. Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J Mach Learn Res 3:1439–1461

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abinash Tripathy.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tripathy, A., Anand, A. & Rath, S.K. Document-level sentiment classification using hybrid machine learning approach. Knowl Inf Syst 53, 805–831 (2017). https://doi.org/10.1007/s10115-017-1055-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-017-1055-z

Keywords

Navigation