Abstract
Sentiment Analysis aims to determine the overall sentiment orientation of a given input text. One motivation for research in this area is the need for consumer related industries to extract public opinion from online portals such as blogs, discussion boards, and reviews. Estimating sentiment orientation in text involves extraction of sentiment rich phrases and the aggregation of their sentiment orientation. Identifying sentiment rich phrases is typically achieved by using manually selected part-of-speech (PoS) patterns. In this paper we present an algorithm for automated discovery of PoS patterns from sentiment rich background data. Here PoS patterns are selected by applying standard feature selection heuristics: Information Gain (IG), Chi-Squared (CHI) score, and Document Frequency (DF). Experimental results from two real-world datasets suggests that classification accuracy is significantly better with DF selected patterns than with IG or the CHI score. Importantly, we also found DF selected patterns to result in comparative classifier accuracy to that of manually selected patterns.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Lada Adamic and Natalie Glance. The Political Blogosphere and the 2004 U.S. Election: Divided They Blog. In Proc. of 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
Ted Briscoe and John Carroll. Robust Accurate Statistical Annotation of General Text. In Proc. of LREC, pages 1499–1504, Las Palmas, Canary Islands, May 2002.
Sutanu Chakraborti, Rahman Mukras, Robert Lothian, Nirmalie Wiratunga, Stuart Watt, and David Harper. Supervised Latent Semantic Indexing using Adaptive Sprinkling. In Proc. of IJCAI, pages 1582–1587. AAAI Press, 2007.
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22–29, 1990.
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press,1998.
George Forman. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. JMLR, 3:1289–1305, 2003.
Vasileios Hatzivassiloglou and Janyce M. Wiebe. Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In Proc. of Computational Linguistics, pages 299–305, Morristown, NJ, USA, 2000. ACL.
John S. Justeson and Slava M. Katz. Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. Natural Language Engineering, 1:9–27, 1995.
Craig Macdonald and Iadh Ounis. The TREC Blogs06 Collection : Creating and Analysing a Blog Test Collection. Technical report, Department of Computing Science, University of Glasgow, Glasgow, UK, 2006.
Rahman Mukras, Nirmalie Wiratunga, Robert Lothian, Sutanu Chakraborti, and David Harper. Information Gain Feature Selection for Ordinal Text Classification using Probability Re-distribution. In Proc. of IJCAI Textlink Workshop, 2007.
Shinsuke Nakajima, Junichi Tatemura, Yoichiro Hino, Yoshinori Hara, and Katsumi Tanaka. Discovering Important Bloggers based on Analyzing Blog Threads. In Proc. of 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proc. of EMNLP, pages 79–86, 2002.
Reuters. Reuters-21578 text classification corpus. daviddlewis.com/resources/testcollections/reuters21578/, 1997. 194 Max Bramer, Frans Coenen and Miltos Petridis (Eds)
Ellen Riloff, Janyce Wiebe, and Theresa Wilson. Learning Subjective Nouns using Extraction Pattern Bootstrapping. In Proc. of CoNLL , ACL SIGNLL, 2003.
Fabrizio Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1–47, 2002.
P.D. Turney and M.L. Littman. Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. Technical report, National Research Council, Institute for Information Technology, 2002.
Peter D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proc. of EMCL, pages 491–502, London, UK, 2001. Springer-Verlag.
Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proc. of ACL, pages 417–424, Morristown, NJ, USA, 2002. ACL.
Peter D. Turney and Michael L. Littman. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst., 21(4):315–346, 2003.
Janyce Wiebe and Ellen Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. In Proc. of CICLing, pages 486–497, 2005.
Theresa Wilson, Janyce Wiebe, and Rebecca Hwa. Just How Mad Are You? Finding Strong and Weak Opinion Clauses. In Proc. of AAAI, pages 761–769. AAAI Press, 2004.
Yiming Yang and Jan O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proc. of ICML, pages 412–420. Morgan Kaufmann, 1997.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag London Limited
About this paper
Cite this paper
Mukras, R., Wiratunga, N., Lothian, R. (2008). Selecting Bi-Tags for Sentiment Analysis of Text. In: Bramer, M., Coenen, F., Petridis, M. (eds) Research and Development in Intelligent Systems XXIV. SGAI 2007. Springer, London. https://doi.org/10.1007/978-1-84800-094-0_14
Download citation
DOI: https://doi.org/10.1007/978-1-84800-094-0_14
Publisher Name: Springer, London
Print ISBN: 978-1-84800-093-3
Online ISBN: 978-1-84800-094-0
eBook Packages: Computer ScienceComputer Science (R0)