Selecting Bi-Tags for Sentiment Analysis of Text

Mukras, Rahman; Wiratunga, Nirmalie; Lothian, Robert

doi:10.1007/978-1-84800-094-0_14

Selecting Bi-Tags for Sentiment Analysis of Text

Rahman Mukras⁴,
Nirmalie Wiratunga⁴ &
Robert Lothian⁴

Conference paper

419 Accesses
14 Citations

Abstract

Sentiment Analysis aims to determine the overall sentiment orientation of a given input text. One motivation for research in this area is the need for consumer related industries to extract public opinion from online portals such as blogs, discussion boards, and reviews. Estimating sentiment orientation in text involves extraction of sentiment rich phrases and the aggregation of their sentiment orientation. Identifying sentiment rich phrases is typically achieved by using manually selected part-of-speech (PoS) patterns. In this paper we present an algorithm for automated discovery of PoS patterns from sentiment rich background data. Here PoS patterns are selected by applying standard feature selection heuristics: Information Gain (IG), Chi-Squared (CHI) score, and Document Frequency (DF). Experimental results from two real-world datasets suggests that classification accuracy is significantly better with DF selected patterns than with IG or the CHI score. Importantly, we also found DF selected patterns to result in comparative classifier accuracy to that of manually selected patterns.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lada Adamic and Natalie Glance. The Political Blogosphere and the 2004 U.S. Election: Divided They Blog. In Proc. of 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
Google Scholar
Ted Briscoe and John Carroll. Robust Accurate Statistical Annotation of General Text. In Proc. of LREC, pages 1499–1504, Las Palmas, Canary Islands, May 2002.
Google Scholar
Sutanu Chakraborti, Rahman Mukras, Robert Lothian, Nirmalie Wiratunga, Stuart Watt, and David Harper. Supervised Latent Semantic Indexing using Adaptive Sprinkling. In Proc. of IJCAI, pages 1582–1587. AAAI Press, 2007.
Google Scholar
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22–29, 1990.
Google Scholar
Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press,1998.
Google Scholar
George Forman. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. JMLR, 3:1289–1305, 2003.
MATH MathSciNet Google Scholar
Vasileios Hatzivassiloglou and Janyce M. Wiebe. Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In Proc. of Computational Linguistics, pages 299–305, Morristown, NJ, USA, 2000. ACL.
Google Scholar
John S. Justeson and Slava M. Katz. Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. Natural Language Engineering, 1:9–27, 1995.
Google Scholar
Craig Macdonald and Iadh Ounis. The TREC Blogs06 Collection : Creating and Analysing a Blog Test Collection. Technical report, Department of Computing Science, University of Glasgow, Glasgow, UK, 2006.
Google Scholar
Rahman Mukras, Nirmalie Wiratunga, Robert Lothian, Sutanu Chakraborti, and David Harper. Information Gain Feature Selection for Ordinal Text Classification using Probability Re-distribution. In Proc. of IJCAI Textlink Workshop, 2007.
Google Scholar
Shinsuke Nakajima, Junichi Tatemura, Yoichiro Hino, Yoshinori Hara, and Katsumi Tanaka. Discovering Important Bloggers based on Analyzing Blog Threads. In Proc. of 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.
Google Scholar
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proc. of EMNLP, pages 79–86, 2002.
Google Scholar
Reuters. Reuters-21578 text classification corpus. daviddlewis.com/resources/testcollections/reuters21578/, 1997. 194 Max Bramer, Frans Coenen and Miltos Petridis (Eds)
Google Scholar
Ellen Riloff, Janyce Wiebe, and Theresa Wilson. Learning Subjective Nouns using Extraction Pattern Bootstrapping. In Proc. of CoNLL , ACL SIGNLL, 2003.
Google Scholar
Fabrizio Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1–47, 2002.
Article MathSciNet Google Scholar
P.D. Turney and M.L. Littman. Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. Technical report, National Research Council, Institute for Information Technology, 2002.
Google Scholar
Peter D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proc. of EMCL, pages 491–502, London, UK, 2001. Springer-Verlag.
Google Scholar
Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proc. of ACL, pages 417–424, Morristown, NJ, USA, 2002. ACL.
Google Scholar
Peter D. Turney and Michael L. Littman. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst., 21(4):315–346, 2003.
Article Google Scholar
Janyce Wiebe and Ellen Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. In Proc. of CICLing, pages 486–497, 2005.
Google Scholar
Theresa Wilson, Janyce Wiebe, and Rebecca Hwa. Just How Mad Are You? Finding Strong and Weak Opinion Clauses. In Proc. of AAAI, pages 761–769. AAAI Press, 2004.
Google Scholar
Yiming Yang and Jan O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proc. of ICML, pages 412–420. Morgan Kaufmann, 1997.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, The Robert Gordon University, St Andrew Street, Aberdeen, AB25 1HG, UK
Rahman Mukras, Nirmalie Wiratunga & Robert Lothian

Authors

Rahman Mukras
View author publications
You can also search for this author in PubMed Google Scholar
Nirmalie Wiratunga
View author publications
You can also search for this author in PubMed Google Scholar
Robert Lothian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Technology, University of Portsmouth, Portsmouth, UK
Max Bramer BSc, PhD, CEng, CITP, FBCS, FIET, FRSA, FHEA
Department of Computer Science, University of Liverpool, Liverpool, UK
Frans Coenen BSc, PhD
University of Greenwich, UK
Miltos Petridis DipEng, MBA, PhD, MBCS, AMBA

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mukras, R., Wiratunga, N., Lothian, R. (2008). Selecting Bi-Tags for Sentiment Analysis of Text. In: Bramer, M., Coenen, F., Petridis, M. (eds) Research and Development in Intelligent Systems XXIV. SGAI 2007. Springer, London. https://doi.org/10.1007/978-1-84800-094-0_14

Download citation

DOI: https://doi.org/10.1007/978-1-84800-094-0_14
Publisher Name: Springer, London
Print ISBN: 978-1-84800-093-3
Online ISBN: 978-1-84800-094-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics