Skip to main content

Selecting Bi-Tags for Sentiment Analysis of Text

  • Conference paper

Abstract

Sentiment Analysis aims to determine the overall sentiment orientation of a given input text. One motivation for research in this area is the need for consumer related industries to extract public opinion from online portals such as blogs, discussion boards, and reviews. Estimating sentiment orientation in text involves extraction of sentiment rich phrases and the aggregation of their sentiment orientation. Identifying sentiment rich phrases is typically achieved by using manually selected part-of-speech (PoS) patterns. In this paper we present an algorithm for automated discovery of PoS patterns from sentiment rich background data. Here PoS patterns are selected by applying standard feature selection heuristics: Information Gain (IG), Chi-Squared (CHI) score, and Document Frequency (DF). Experimental results from two real-world datasets suggests that classification accuracy is significantly better with DF selected patterns than with IG or the CHI score. Importantly, we also found DF selected patterns to result in comparative classifier accuracy to that of manually selected patterns.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lada Adamic and Natalie Glance. The Political Blogosphere and the 2004 U.S. Election: Divided They Blog. In Proc. of 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.

    Google Scholar 

  2. Ted Briscoe and John Carroll. Robust Accurate Statistical Annotation of General Text. In Proc. of LREC, pages 1499–1504, Las Palmas, Canary Islands, May 2002.

    Google Scholar 

  3. Sutanu Chakraborti, Rahman Mukras, Robert Lothian, Nirmalie Wiratunga, Stuart Watt, and David Harper. Supervised Latent Semantic Indexing using Adaptive Sprinkling. In Proc. of IJCAI, pages 1582–1587. AAAI Press, 2007.

    Google Scholar 

  4. Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Comput. Linguist., 16(1):22–29, 1990.

    Google Scholar 

  5. Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press,1998.

    Google Scholar 

  6. George Forman. An Extensive Empirical Study of Feature Selection Metrics for Text Classification. JMLR, 3:1289–1305, 2003.

    MATH  MathSciNet  Google Scholar 

  7. Vasileios Hatzivassiloglou and Janyce M. Wiebe. Effects of Adjective Orientation and Gradability on Sentence Subjectivity. In Proc. of Computational Linguistics, pages 299–305, Morristown, NJ, USA, 2000. ACL.

    Google Scholar 

  8. John S. Justeson and Slava M. Katz. Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text. Natural Language Engineering, 1:9–27, 1995.

    Google Scholar 

  9. Craig Macdonald and Iadh Ounis. The TREC Blogs06 Collection : Creating and Analysing a Blog Test Collection. Technical report, Department of Computing Science, University of Glasgow, Glasgow, UK, 2006.

    Google Scholar 

  10. Rahman Mukras, Nirmalie Wiratunga, Robert Lothian, Sutanu Chakraborti, and David Harper. Information Gain Feature Selection for Ordinal Text Classification using Probability Re-distribution. In Proc. of IJCAI Textlink Workshop, 2007.

    Google Scholar 

  11. Shinsuke Nakajima, Junichi Tatemura, Yoichiro Hino, Yoshinori Hara, and Katsumi Tanaka. Discovering Important Bloggers based on Analyzing Blog Threads. In Proc. of 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.

    Google Scholar 

  12. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques. In Proc. of EMNLP, pages 79–86, 2002.

    Google Scholar 

  13. Reuters. Reuters-21578 text classification corpus. daviddlewis.com/resources/testcollections/reuters21578/, 1997. 194 Max Bramer, Frans Coenen and Miltos Petridis (Eds)

    Google Scholar 

  14. Ellen Riloff, Janyce Wiebe, and Theresa Wilson. Learning Subjective Nouns using Extraction Pattern Bootstrapping. In Proc. of CoNLL , ACL SIGNLL, 2003.

    Google Scholar 

  15. Fabrizio Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1):1–47, 2002.

    Article  MathSciNet  Google Scholar 

  16. P.D. Turney and M.L. Littman. Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus. Technical report, National Research Council, Institute for Information Technology, 2002.

    Google Scholar 

  17. Peter D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proc. of EMCL, pages 491–502, London, UK, 2001. Springer-Verlag.

    Google Scholar 

  18. Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In Proc. of ACL, pages 417–424, Morristown, NJ, USA, 2002. ACL.

    Google Scholar 

  19. Peter D. Turney and Michael L. Littman. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst., 21(4):315–346, 2003.

    Article  Google Scholar 

  20. Janyce Wiebe and Ellen Riloff. Creating Subjective and Objective Sentence Classifiers from Unannotated Texts. In Proc. of CICLing, pages 486–497, 2005.

    Google Scholar 

  21. Theresa Wilson, Janyce Wiebe, and Rebecca Hwa. Just How Mad Are You? Finding Strong and Weak Opinion Clauses. In Proc. of AAAI, pages 761–769. AAAI Press, 2004.

    Google Scholar 

  22. Yiming Yang and Jan O. Pedersen. A Comparative Study on Feature Selection in Text Categorization. In Proc. of ICML, pages 412–420. Morgan Kaufmann, 1997.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this paper

Cite this paper

Mukras, R., Wiratunga, N., Lothian, R. (2008). Selecting Bi-Tags for Sentiment Analysis of Text. In: Bramer, M., Coenen, F., Petridis, M. (eds) Research and Development in Intelligent Systems XXIV. SGAI 2007. Springer, London. https://doi.org/10.1007/978-1-84800-094-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-094-0_14

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-093-3

  • Online ISBN: 978-1-84800-094-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics