Skip to main content
Log in

An improved algorithm for sentiment analysis based on maximum entropy

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Sentiment analysis is an important field of study in natural language processing. In the massive data and irregular data, sentiment classification with high accuracy is a major challenge in sentiment analysis. To address this problem, a novel maximum entropy-PLSA model is proposed. In this model, we first use the probabilistic latent semantic analysis to extract the seed emotion words from the Wikipedia and the training corpus. Then features are extracted from these seed emotion words, which are the input of the maximum entropy model for training the maximum entropy model. The test set is processed similarly into the maximum entropy model for emotional classification. Meanwhile, the training set and the test set are divided by the K-fold method. The maximum entropy classification based on probabilistic latent semantic analysis uses important emotional classification features to classify words, such as the relevance of words and parts of speech in the context, the relevance with degree adverbs, the similarity with the benchmark emotional words and so on. The experiments prove that the classification method proposed by this paper outperforms the compared methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71

    Google Scholar 

  • Brody S, Elhadad N (2009) Restaurant review corpus. http://people.dbmi.columbia.edu/noemie/ursa

  • Brody S, Elhadad N (2013) An unsupervised aspect-sentiment model for online reviews. In: Human language technologies: conference of the North American chapter of the Association of Computational Linguistics, Proceedings, June 2–4, 2010. Los Angeles, California, USA, pp 804–812

  • Burges CJC (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167

    Article  Google Scholar 

  • Cheeseman P, Stutz J (1996) Bayesian classification (autoclass): theory and results. Fayyad U.m.etc. advances in Knowledge Discovery & Data Mining Aaai, pp 153–180

  • Chen Q, Wenjie Li Y, Lei XL, He Y (2015) Learning to adapt credible knowledge in cross-lingual sentiment analysis. ACL 1:419–429

    Google Scholar 

  • Cheng K, Li J, Tang J, Liu H (2017) Unsupervised sentiment analysis with signed social networks. In: AAAI, pp 3429–3435

  • Chen D, Wang D, Yu G, Yu F (2007) A PLSA-based approach for building user profile and implementing personalized recommendation. In: Advances in data and web management. Springer, pp 606–613

  • Du K, Shi Y, Lei B, Chen J, Sun M (2016) A method of human action recognition based on spatio-temporal interest points and PLSA. In: 2016 international conference on industrial informatics-computing technology, intelligent technology, industrial information integration (ICIICII). IEEE, pp 69–72

  • Ganu G, Elhadad N, Marian A (2009) Beyond the stars: improving rating predictions using review text content. In: International workshop on the web and databases, WEBDB (2009) Providence. Rhode Island, USA, June

  • Gehring J, Miao Y, Metze F, Waibel A (2013) Extracting deep bottleneck features using stacked auto-encoders. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3377–3381

  • Haidar MA, O’Shaughnessy D (2015) Document-specific context PLSA language model for speech recognition. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 5326–5330

  • Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1):177–196

    Article  MATH  Google Scholar 

  • Hong HZ, Hwang JI (2015) Multimodal PLSA for movie genre classification. In: International workshop on multiple classifier systems. Springer, pp 159–167

  • Huang F, Jing X, Sun S, Lu Y (2012) Incorporate spatial information into PLSA for scene classification. In: International conference on trustworthy computing and services. Springer, pp 170–177

  • JyFantas (2014) PLSA. https://github.com/JFantasy/plsa

  • Lipenkova J (2015) A system for fine-grained aspect-based sentiment analysis of Chinese. In: ACL (system demonstrations), pp 55–60

  • Nguyen TH, Shirai K, Velcin J (2015 Modeling based sentiment analysis on social media for stock market prediction. In: The meeting of the association for computational linguistics and the international joint conference on natural language processing of the Asian Federation of natural language processing

  • Ni X, Xue GR, Ling X, Yu Y, Yang Q (2007) Exploring in the weblog space by detecting informative and affective articles. In: Proceedings of the 16th international conference on World Wide Web. ACM, pp 281–290

  • Pang B, Lee L (2002) Movie review data. http://www.cs.cornell.edu/people/pabo/movie-review-data

  • Pang B, Lee L (2004) A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, p 271

  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. In: Proceedings of the ACL-02 conference on empirical methods in natural language processing-vol 10. Association for Computational Linguistics, pp 79–86

  • Toutanova K (2004) Stanford log-linear part-of-speech tagger. http://nlp.stanford.edu/software/tagger.shtml

  • Wang SY, Hsieh JW, Yan Y, Chen LC, Chen DY (2015a) PLSA-based sparse representation for vehicle color classification. In: 2015 12th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, pp 1–6

  • Wang Y, Wang S, Tang J, Liu H, Li B (2015b) Supervised sentiment analysis for social media images. In: IJCAI, pp 2378–2379

  • Wang J, Fu J, Xu Y, Mei T (2016) Beyond object recognition: visual sentiment analysis with deep coupled adjective and noun neural networks. In: IJCAI, pp 3484–3490

  • Wasilewski J, Hurley N (2016) Intent-aware diversification using a constrained PLSA. In: ACM conference on recommender systems, pp 39–42

  • Xu WR, Liu DX, Guo J, Cai YC et al (2009) Supervised dual-PLSA for personalized SMS filtering. In: Asia information retrieval symposium. Springer, pp 254–264

  • You Q, Jin H, Luo J (2017) Visual sentiment analysis by attending on local image regions. In: AAAI, pp 231–237

  • You Q, Luo J, Jin H, Yang J (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. arXiv preprint arXiv:1509.06041

  • Zhang L (2015) A maximum entropy modeling toolkit for python and C++. https://github.com/lzhang10/maxent

  • Zhang Y, Yuan Y, Guoren W (2015) A multimodal multimedia retrieval model based on PLSA. In: Web information system and application conference, pp 33–36

  • Zhang M, Zhang Y, Vo DT (2016) Gated neural networks for targeted sentiment analysis. In: AAAI, pp 3087–3093

  • Zhong C, Miao Z (2014) Modeling correlation between multi-modal continuous words for PLSA-based video classification. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 4304–4308

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation, under Grant Nos. 61762037, 61640217, 41402290, 61462028, Science and Technology Support Program of Jiangxi Province, under Grant No. 20151BBE50055, and Science and Technology Project supported by education department of Jiangxi Province under Grant No. GJJ150541, and Nanchang City Sensor Network and Compressed Sensing Knowledge Innovation Team under Grant No. 2016T75.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Xie.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, X., Ge, S., Hu, F. et al. An improved algorithm for sentiment analysis based on maximum entropy. Soft Comput 23, 599–611 (2019). https://doi.org/10.1007/s00500-017-2904-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-017-2904-0

Keywords

Navigation