Abstract
Sentiment analysis is an important issue in machine learning, which aims to identify the emotion expressed in corpus. However, sentiment analysis is a difficult task, especially in large-scale data, where feature reduction is needed. In this paper, we propose a parallel feature reduction algorithm for sentiment polarity classification based on a substring method. Specifically, the proposed algorithm is based on parallel computing under the Hadoop platform. The proposed algorithm is examined on a large data set and a K-nearest neighbor algorithm and a Rocchio algorithm are used for classification. Experimental results show that the proposed algorithm outperforms other commonly used methods in terms of the classification performance and the computational cost.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern information retrieval. ACM press, New York (1999)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Dzogang, F., Lesot, M.J., Rfqi, M., Meunier, B.B.: Early Fusion of Low Level Features for Emotion Mining. Biomedical Informatics Insights 5 (2012)
Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: Identifying sources of opinions with conditional random fields and extraction patterns. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, pp. 355–362. Association for Computational Linguistics (October 2005)
Titov, I., McDonald, R.: Modeling online reviews 383 with multi-grain topic models. In: WWW 2008: Proceeding of the 17th International Conference on World Wide Web, pp. 111–120. ACM, New York (2008)
Lin, C., He, Y.: Joint Sentiment/Topic Model for Sentiment Analysis. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 375–384
Boss, G., Malladi, P., Quan, D., et al.: Cloud computing. IBM white paper, 1369 (2007)
Liu, B.: Web Data Mining: Exploring Hyperlinks,Contents, and Usage Data (Data-Centric Systems and Applications). Springer (2007)
Liu, H.M.B.: Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 168–177 (2004)
Littman, T.P.M.: Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus, in Technical Report ERB-1094, National Research Council Canada (2002)
Zhang, Z.Q., Li, Y.J., Ye, Q., Law, R.: Sentiment Classification for Chinese Product Reviews Using an Unsupervised Internet-based Method. In: Proceeding of International Conference on Management Science & Engineering, pp. 3–9 (2008)
Wan, X.: Co-Training for Cross-Lingual Sentiment Classification. In: Wan, X. (ed.) Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, pp. 235–243 (2009)
Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, Morristown, NJ, USA, p. 1367. Association for Computational Linguistics (2004)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification Using Machine Learning Techniques. In: Proceedings of the EMNLP 2002, pp. 79–86 (2002)
Michelle, T.M.:Machine Learning. McGraw-Hill Companies, Inc. (1997)
Blitzer, J., Dredze, M., Pereira, F.: Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, pp. 440–447. Association for Computational Linguistics (June 2007)
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Foundations and TrendsR in Information Retrieval 2(12), 1135c (2008), doi:10.1561/1500000001
Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the ACL, pp. 115–124 (2005)
Li, J., Sun, M.: Experimental Study on Sentiment Classification of Chinese Review using Machine Learning Techniques. In: Proceedings of IEEE NLP-KE (2007)
Pang, B., Lee, L.: A Sentimental Education:Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In: Proceeding of ACL, pp. 271–278 (2004)
Pang, B., Lee, L.: Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the ACL, pp. 115–124 (2005)
Phyu Shein, K.P., Nyunt, T.T.S.: Sentiment classification based on Ontology and SVM Classifier. In: Proceedings of ICCSN, pp. 169–172 (2010)
Zhai, Z.W., Xu, H., Kang, B., Jia, P.: Exploiting effective features for Chinese sentiment classification. Expert. Syst. Appl. 38(8), 9139–9146 (2011)
Liu, B.: Handbook of natural language processing. Sentiment Analysis and Subjectivity, 627–667 (2010)
Raaijmakers, S., Kraaij, W.: A shallow approach to subjectivity classification. In: Proceedings of ICWSM, pp. 216–217 (2008)
Tan, S., Zhang, J.: An empirical study of sentiment analysis for chinese documents. Expert Systems with Applications 34(4), 2622–2629 (2008)
Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: ACL 2004: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 271. Association for Computational Linguistics (2004)
Zhang, D., Lee, W.: Extracting key-Substring-group features for text classification, pp. 474–483. ACM, New York (2006)
Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TF-IDF for text categorization. In: Proceedings of ICML, pp. 143–151 (1997)
Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24(5), 513–523 (1988)
Rocchio: Relevance Feedback in Information Retrieval. In: Salton (ed.) The SMART Retrieval System: Experiments. In Automatic Document Processing, ch. 14, pages 313–323. Prentice-Hall (1971)
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, Y., Xiang, X., Yin, C., Shang, L. (2013). Parallel Sentiment Polarity Classification Method with Substring Feature Reduction. In: Li, J., et al. Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2013. Lecture Notes in Computer Science(), vol 7867. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40319-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-40319-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40318-7
Online ISBN: 978-3-642-40319-4
eBook Packages: Computer ScienceComputer Science (R0)