Abstract
The availability of labeled corpus is of great importance for emotion classification tasks. Because manual labeling is too time-consuming, hashtags have been used as naturally annotated labels to obtain large amount of labeled training data from microblog. However, the inconsistency and noise in annotation can adversely affect the data quality and thus the performance when used to train a classifier. In this paper, we propose a classification framework which allows naturally annotated data to be used as additional training data and employs a k-NN graph based data cleaning method to remove noise after noisy data has certain accumulations. Evaluation on NLP&CC2013 Chinese Weibo emotion classification dataset shows that our approach achieves 15.8 % better performance than directly using the noisy data without noise filtering. After adding the filtered data with hashtags into an existing high-quality training data, the performance increases 3.7 % compared to using the high-quality training data alone.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bandhakavi, A., Nirmalie, W., Deepak, P., Stewart, M.: Generating a word-emotion lexicon from #emotional tweets. In: Proceedings of the Third Joint Conference on Lexical and Computational Semantics, pp. 12–21 (2014)
Blum, A., Tom, M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)
Lin, G., Ruifeng X., Qin L.: Cross-lingual opinion analysis via negative transfer detection. In: ACL (2014)
Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)
Min-Ling, Z., Zhi-Hua, Z.: CoTrade: confident co-training with data editing. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(6), 1612–1626 (2011)
Mohammad, Saif. M.: #Emotional tweets, pp. 246–255. Association for Computational Linguistics (2012)
Mohammad, S.M., Svetlana, K.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2014)
Wenbo, W., Chen, L., Krishnaprasad, T., Amit, P.S.: Harnessing twitter “big data" for automatic emotion identification. In: Privacy, Security, Risk and Trust (PASSAT), International Conference on and International Confernece on Social Computing, pp. 587–592 (2012)
Wan, X.: Collaborative data cleaning for sentiment classification with noisy training corpus. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 326–337. Springer, Heidelberg (2011)
Quan, C., Ren, F.: Construction of a blog emotion corpus for Chinese emotional expression analysis. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1446–1454. Association for Computational Linguistics, August 2009
Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 70–74. Association for Computational Linguistics, June 2007
Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: Advances in Neural Information Processing Systems, pp. 368–374 (1999)
Goldberg, A.B., Xiaojin, Z., Singh, A., Xu, Z., Nowak, R.D.: Multi-manifold semi-supervised learning. In: AISTATS, pp. 169–176 (2009)
Acknowledgement
We give our thanks to the anonymous reviewers for the helpful comments. The work presented in this paper is supported by Hong Kong Polytechnic University (PolyU RTVU and CERG PolyU 15211/14E) and the National Nature Science Foundation of China (project number:6127229).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, M., Lu, Q., Gui, L., Long, Y. (2016). Towards Scalable Emotion Classification in Microblog Based on Noisy Training Data. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-47674-2_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)