Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

Chen, Bo; He, Hui; Guo, Jun

doi:10.1007/s11390-008-9125-z

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

Regular Paper
Published: 05 April 2008

Volume 23, pages 231–239, (2008)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Bo Chen¹,
Hui He¹ &
Jun Guo¹

126 Accesses
6 Citations
Explore all metrics

Abstract

Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy (MaxEnt) modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment analysis using product review data

Article Open access 16 June 2015

Xing Fang & Justin Zhan

Survey on sentiment analysis: evolution of research methods and topics

Article 06 January 2023

Jingfeng Cui, Zhaoxia Wang, … Erik Cambria

Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review

References

Das S R, Chen M Y. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Working paper, Santa Clara University, Available at http://scumis.scu.edu/srdas/chat.pdf.
Chesley P, Vincent B, Xu L, Srihari R. Using verbs and adjectives to automatically classify blog sentiment. In Proc. Computational Approaches to Analyzing Weblogs: Papers from the 2006 Spring Symposium, Nicolov N, Salvetti F, Liberman M, Maartin J H (eds.), AAAI Press, Menlo Park, CA, Technical Report SS-06-03, 2006, pp.27–29.
Gamon M. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of language analysis. In Proc. 20th Int. Conf. Computational Languages, Geneva, CH, 2004, pp.841–847.
Kennedy A, Inkpen D. Sentiment classification of movie and product reviews using contextual valence shifters. Computational Intelligence, 2006, 22(2): 110–125.
Article MathSciNet Google Scholar
Berger A L, Della Pietra S A, Della Pietra V J. A maximum entropy approach to natural language processing. Computational Languages, 1996, 22(1): 39–71.
Article Google Scholar
Rosenfeld R. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language, 1996, 10: 187–228.
Article Google Scholar
Sebastiani F. Machine learning in automated text categorization: A survey. Tech. Rep. IEI-B4-31-1999, Istituto di Elaborazione dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, IT, 1999.
Yang Y. An evaluation of statistical approaches to text categorization. Journal of Information Retrieval, 1999, 1: 69–90.
Article Google Scholar
Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. In Proc. Conf. Empirical Methods in Natural Language Processing, Philadelphia, US, 2002, pp.79–86.
Pang B, Lee L. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In Proc. 42nd Meeting of the Association for Computational Languages, Barcelona, ES, 2004, pp.271–278.
Chen B, He H, Guo J. Language feature mining for document subjectivity analysis. In Proc. 1st Int. Symp. Data, Privacy, & E-Commerce, Chengdu, China, November 1–3, 2007, pp.62–67.
Huang X D, Alleva F, Hon H W, Hwang M Y, Lee K F, Rosenfeld R. The SPHINX-II speech recognition system: An overview. Computer, Speech and Language, 1993, 2: 137–148.
Article Google Scholar
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 1988, 24(5): 513–523.
Article Google Scholar
Della Pietra S A, Della Pietra V J, Lafferty J. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(4): 380–393.
Article Google Scholar
Bahl L, Jelinek F, Mercer R. A maximum likelihood approach to continuous speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1983, 5(2): 179–190.
Article Google Scholar
Chen S F, Goodman J. An empirical study of smoothing techniques for language modeling. Tech. Rep. TR-10-98, Harvard University, 1998.
Berger A. Convexity, maximum likelihood and all that, 1996. http://www.cs.cmu.edu/afs/cs/user/aberger/www/ps/con-vex.ps.
Chen S F, Rosenfeld R. A Gaussian prior for smoothing maximum entropy models. Tech. Rep. CMUCS-99-108, Carnegie Mellon University, 1999.
Kazama J, Tsujii J. Evaluation and extension of maximum entropy models with inequality constraints. In Proc. EMNLP 2003, 2003, pp.137–144.
Goodman J. Exponential priors for maximum entropy models. Microsoft Research Tech. Rep., 2003.
Cormack G. TREC 2006 spam track overview. In Proc. TREC 2006, Gaithersburg, MD, 2006.

Download references

Author information

Authors and Affiliations

Pattern Recognition and Intelligent System Laboratory, School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
Bo Chen, Hui He & Jun Guo

Authors

Bo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hui He
View author publications
You can also search for this author in PubMed Google Scholar
Jun Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Chen.

Additional information

Supported by the National Natural Science Foundation of China under Grant Nos. 60475007 and 60675001, the Key Project of Chinese Ministry of Education under Grant No. 02029, and the Foundation of Chinese Ministry of Education for Century Spanning Talent.

This dataset comes from http://www.cs.cornell.edu/people/pabo/movie-review-data, offered by Pang et al. at Cornell.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(PDF 86.6 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, B., He, H. & Guo, J. Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis. J. Comput. Sci. Technol. 23, 231–239 (2008). https://doi.org/10.1007/s11390-008-9125-z

Download citation

Revised: 23 September 2007
Published: 05 April 2008
Issue Date: March 2008
DOI: https://doi.org/10.1007/s11390-008-9125-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

Abstract

Access this article

Similar content being viewed by others

Sentiment analysis using product review data

Survey on sentiment analysis: evolution of research methods and topics

Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 86.6 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

Abstract

Access this article

Similar content being viewed by others

Sentiment analysis using product review data

Survey on sentiment analysis: evolution of research methods and topics

Resume Screening Using Natural Language Processing and Machine Learning: A Systematic Review

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic Supplementary Material

(PDF 86.6 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation