Skip to main content
Log in

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy (MaxEnt) modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Working paper, Santa Clara University, Available at http://scumis.scu.edu/srdas/chat.pdf.

  2. Chesley P, Vincent B, Xu L, Srihari R. Using verbs and adjectives to automatically classify blog sentiment. In Proc. Computational Approaches to Analyzing Weblogs: Papers from the 2006 Spring Symposium, Nicolov N, Salvetti F, Liberman M, Maartin J H (eds.), AAAI Press, Menlo Park, CA, Technical Report SS-06-03, 2006, pp.27–29.

  3. Gamon M. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of language analysis. In Proc. 20th Int. Conf. Computational Languages, Geneva, CH, 2004, pp.841–847.

  4. Kennedy A, Inkpen D. Sentiment classification of movie and product reviews using contextual valence shifters. Computational Intelligence, 2006, 22(2): 110–125.

    Article  MathSciNet  Google Scholar 

  5. Berger A L, Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. Computational Languages, 1996, 22(1): 39–71.

    Article  Google Scholar 

  6. Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 1996, 10: 187–228.

    Article  Google Scholar 

  7. Sebastiani F. Machine learning in automated text categorization: A survey. Tech. Rep. IEI-B4-31-1999, Istituto di Elaborazione dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.

  8. Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999, 1: 69–90.

    Article  Google Scholar 

  9. Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. In Proc. Conf. Empirical Methods in Natural Language Processing, Philadelphia, US, 2002, pp.79–86.

  10. Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proc. 42nd Meeting of the Association for Computational Languages, Barcelona, ES, 2004, pp.271–278.

  11. Chen B, He H, Guo J. Language feature mining for document subjectivity analysis. In Proc. 1st Int. Symp. Data, Privacy, & E-Commerce, Chengdu, China, November 1–3, 2007, pp.62–67.

  12. Huang X D, Alleva F, Hon H W, Hwang M Y, Lee K F, Rosenfeld R. The SPHINX-II speech recognition system: An overview. Computer, Speech and Language, 1993, 2: 137–148.

    Article  Google Scholar 

  13. Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 1988, 24(5): 513–523.

    Article  Google Scholar 

  14. Della Pietra S A, Della Pietra V J, Lafferty J. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(4): 380–393.

    Article  Google Scholar 

  15. Bahl L, Jelinek F, Mercer R. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983, 5(2): 179–190.

    Article  Google Scholar 

  16. Chen S F, Goodman J. An empirical study of smoothing techniques for language modeling. Tech. Rep. TR-10-98, Harvard University, 1998.

  17. Berger A. Convexity, maximum likelihood and all that, 1996. http://www.cs.cmu.edu/afs/cs/user/aberger/www/ps/con-vex.ps.

  18. Chen S F, Rosenfeld R. A Gaussian prior for smoothing maximum entropy models. Tech. Rep. CMUCS-99-108, Carnegie Mellon University, 1999.

  19. Kazama J, Tsujii J. Evaluation and extension of maximum entropy models with inequality constraints. In Proc. EMNLP 2003, 2003, pp.137–144.

  20. Goodman J. Exponential priors for maximum entropy models. Microsoft Research Tech. Rep., 2003.

  21. Cormack G. TREC 2006 spam track overview. In Proc. TREC 2006, Gaithersburg, MD, 2006.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Chen.

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 60475007 and 60675001, the Key Project of Chinese Ministry of Education under Grant No. 02029, and the Foundation of Chinese Ministry of Education for Century Spanning Talent.

This dataset comes from http://www.cs.cornell.edu/people/pabo/movie-review-data, offered by Pang et al. at Cornell.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 86.6 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, B., He, H. & Guo, J. Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis. J. Comput. Sci. Technol. 23, 231–239 (2008). https://doi.org/10.1007/s11390-008-9125-z

Download citation

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-008-9125-z

Keywords

Navigation