Skip to main content

Parallel Sentiment Polarity Classification Method with Substring Feature Reduction

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7867))

Abstract

Sentiment analysis is an important issue in machine learning, which aims to identify the emotion expressed in corpus. However, sentiment analysis is a difficult task, especially in large-scale data, where feature reduction is needed. In this paper, we propose a parallel feature reduction algorithm for sentiment polarity classification based on a substring method. Specifically, the proposed algorithm is based on parallel computing under the Hadoop platform. The proposed algorithm is examined on a large data set and a K-nearest neighbor algorithm and a Rocchio algorithm are used for classification. Experimental results show that the proposed algorithm outperforms other commonly used methods in terms of the classification performance and the computational cost.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. ACM press, New York (1999)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Dzogang, F., Lesot, M.J., Rfqi, M., Meunier, B.B.: Early Fusion of Low Level Features for Emotion Mining. Biomedical Informatics Insights 5 (2012)

    Google Scholar 

  4. Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp. 355–362. Association for Computational Linguistics (October 2005)

    Google Scholar 

  5. Titov, I., McDonald, R.: Modeling online reviews 383 with multi-grain topic models. In: WWW 2008: Proceeding of the 17th International Conference on World Wide Web, pp. 111–120. ACM, New York (2008)

    Chapter  Google Scholar 

  6. Lin, C., He, Y.: Joint Sentiment/Topic Model for Sentiment Analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384

    Google Scholar 

  7. Boss, G., Malladi, P., Quan, D., et al.: Cloud computing. IBM white paper, 1369 (2007)

    Google Scholar 

  8. Liu, B.: Web Data Mining: Exploring Hyperlinks,Contents, and Usage Data (Data-Centric Systems and Applications). Springer (2007)

    Google Scholar 

  9. Liu, H.M.B.: Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)

    Google Scholar 

  10. Littman, T.P.M.: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus, in Technical Report ERB-1094, National Research Council Canada (2002)

    Google Scholar 

  11. Zhang, Z.Q., Li, Y.J., Ye, Q., Law, R.: Sentiment Classification for Chinese Product Reviews Using an Unsupervised Internet-based Method. In: Proceeding of International Conference on Management Science & Engineering, pp. 3–9 (2008)

    Google Scholar 

  12. Wan, X.: Co-Training for Cross-Lingual Sentiment Classification. In: Wan, X. (ed.) Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 235–243 (2009)

    Google Scholar 

  13. Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, Morristown, NJ, USA, p. 1367. Association for Computational Linguistics (2004)

    Google Scholar 

  14. http://www.datatang.com/member/6880/t02-p1

  15. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification Using Machine Learning Techniques. In: Proceedings of the EMNLP 2002, pp. 79–86 (2002)

    Google Scholar 

  16. Michelle, T.M.:Machine Learning. McGraw-Hill Companies, Inc. (1997)

    Google Scholar 

  17. Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 440–447. Association for Computational Linguistics (June 2007)

    Google Scholar 

  18. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and TrendsR in Information Retrieval 2(12), 1135c (2008), doi:10.1561/1500000001

    Google Scholar 

  19. Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the ACL, pp. 115–124 (2005)

    Google Scholar 

  20. Li, J., Sun, M.: Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques. In: Proceedings of IEEE NLP-KE (2007)

    Google Scholar 

  21. Pang, B., Lee, L.: A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In: Proceeding of ACL, pp. 271–278 (2004)

    Google Scholar 

  22. Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the ACL, pp. 115–124 (2005)

    Google Scholar 

  23. Phyu Shein, K.P., Nyunt, T.T.S.: Sentiment classification based on Ontology and SVM Classifier. In: Proceedings of ICCSN, pp. 169–172 (2010)

    Google Scholar 

  24. Zhai, Z.W., Xu, H., Kang, B., Jia, P.: Exploiting effective features for Chinese sentiment classification. Expert. Syst. Appl. 38(8), 9139–9146 (2011)

    Article  Google Scholar 

  25. Liu, B.: Handbook of natural language processing. Sentiment Analysis and Subjectivity, 627–667 (2010)

    Google Scholar 

  26. Raaijmakers, S., Kraaij, W.: A shallow approach to subjectivity classification. In: Proceedings of ICWSM, pp. 216–217 (2008)

    Google Scholar 

  27. Tan, S., Zhang, J.: An empirical study of sentiment analysis for chinese documents. Expert Systems with Applications 34(4), 2622–2629 (2008)

    Article  Google Scholar 

  28. Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 271. Association for Computational Linguistics (2004)

    Google Scholar 

  29. Zhang, D., Lee, W.: Extracting key-Substring-group features for text classification, pp. 474–483. ACM, New York (2006)

    Google Scholar 

  30. Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TF-IDF for text categorization. In: Proceedings of ICML, pp. 143–151 (1997)

    Google Scholar 

  31. Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  32. Rocchio: Relevance Feedback in Information Retrieval. In: Salton (ed.) The SMART Retrieval System: Experiments. In Automatic Document Processing, ch. 14, pages 313–323. Prentice-Hall (1971)

    Google Scholar 

  33. Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhang, Y., Xiang, X., Yin, C., Shang, L. (2013). Parallel Sentiment Polarity Classification Method with Substring Feature Reduction. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40319-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40318-7

  • Online ISBN: 978-3-642-40319-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics