Abstract
Analyzing people’s feelings and emotions in social media has become a major concern for both academic researchers and commercial companies. The sentiment lexicon plays a crucial role in the most sentiment analysis applications. However, existing thesaurus based lexicon building methods suffer from the coverage problems when faced with the new words and new meanings in social media. Nowadays, millions of users share their opinions on different aspects of life everyday in microblogs. In this paper, a novel method based on occurrence probability with emoticons is presented to learn the candidate sentiment words from the massive microblog data and the accuracy of the learned lexicon is further improved by using the whole microblog space as the corpus. Extensive experiments were conducted on real world datasets with different topics. The results show that the proposed method is able to extract the emerging words, and learned lexicon outperforms two well-known Chinese lexicons in classifying the sentiments in microblogs.
Project supported by the State Key Development Program for Basic Research of China (Grant No. 2011CB302200-G), State Key Program of National Natural Science of China (Grant No. 61033007), National Natural Science Foundation of China (Grant No. 61100026, 60973019), and the Fundamental Research Funds for the Central Universities (N100704001).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proc. of EMNLP, pp. 79–86 (2002)
Jin, W., Ho, H., Srihari, R.: OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction. In: Proc. of KDD, pp. 1195–1204 (2009)
Das, S., Chen, M.: Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web. Management Science 53(9), 1375–1388 (2007)
Kim, S., Hovy, E.: Determining the Sentiment of Opinions. In: Proc. of COLING, pp. 1367–1373 (2004)
Baccianella, S., Esuli, A., Sebastiani, F.: SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In: Proc. of LREC, pp. 2200–2204 (2010)
Pak, A., Paroubek, P.: Twitter as a Corpus for Sentiment Analysis and Opinion Mining. In: Proc. of LREC, pp. 1320–1326 (2010)
Davidov, D., Tsur, O., Rappoport, A.: Enhanced Sentiment Learning Using Twitter Hashtags and Smileys. In: Proc. of COLING, pp. 241–249 (2010)
Hu, M., Liu, B.: Mining and Summarizing Customer Reviews. In: Proc. of KDD, pp. 168–177 (2004)
Esuli, A., Sebastiani, F.: PageRanking WordNet Synsets: An Application to Opinion Mining. In: Proc. of ACL, pp. 424–431 (2007)
Turney, P.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proc. of ACL, pp. 417–424 (2002)
Kanayama, H., Nasukawa, T.: Fully Automatic Lexicon Expansion for Domain-oriented Sentiment Analysis. In: Proc. of ENMLP, pp. 355–363 (2006)
Kaji, N., Kitsuregawa, M.: Building Lexicon for Sentiment Analysis from Massive Collection of HTML Documents. In: Proc. of EMNLP-CoNLL, pp. 1075–1083 (2007)
Velikovich, L., Blair-Goldensohn, S., Hannan, K., McDonald, R.: The Viability of Web-derived Polarity Lexicons. In: Proc. of HLT-NAACL, pp. 777–785 (2010)
Bermingham, A., Smeaton, A.: Classifying Sentiment in Microblogs: Is Brevity an Advantage? In: Proc. of CIKM, pp. 1833–1836 (2010)
Brody, S., Diakopoulos, N.: Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs. In: Proc. of EMNLP, pp. 562–570 (2011)
Cilibrasi, R., Vitnyi, P.: The Google Similarity Distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Ku, L., Chen, H.: Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology 58(12), 1838–1850 (2007)
HowNet, http://www.keenage.com/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feng, S., Wang, L., Xu, W., Wang, D., Yu, G. (2012). Unsupervised Learning Chinese Sentiment Lexicon from Massive Microblog Data. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-35527-1_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35526-4
Online ISBN: 978-3-642-35527-1
eBook Packages: Computer ScienceComputer Science (R0)