Skip to main content

Towards Scalable Emotion Classification in Microblog Based on Noisy Training Data

  • Conference paper
  • First Online:
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data (NLP-NABD 2016, CCL 2016)

Abstract

The availability of labeled corpus is of great importance for emotion classification tasks. Because manual labeling is too time-consuming, hashtags have been used as naturally annotated labels to obtain large amount of labeled training data from microblog. However, the inconsistency and noise in annotation can adversely affect the data quality and thus the performance when used to train a classifier. In this paper, we propose a classification framework which allows naturally annotated data to be used as additional training data and employs a k-NN graph based data cleaning method to remove noise after noisy data has certain accumulations. Evaluation on NLP&CC2013 Chinese Weibo emotion classification dataset shows that our approach achieves 15.8 % better performance than directly using the noisy data without noise filtering. After adding the filtered data with hashtags into an existing high-quality training data, the performance increases 3.7 % compared to using the high-quality training data alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html.

  2. 2.

    http://open.weibo.com/.

  3. 3.

    http://scikit-learn.org/stable/.

  4. 4.

    http://ir.dlut.edu.cn.

References

  1. Bandhakavi, A., Nirmalie, W., Deepak, P., Stewart, M.: Generating a word-emotion lexicon from #emotional tweets. In: Proceedings of the Third Joint Conference on Lexical and Computational Semantics, pp. 12–21 (2014)

    Google Scholar 

  2. Blum, A., Tom, M.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp. 92–100. ACM (1998)

    Google Scholar 

  3. Lin, G., Ruifeng X., Qin L.: Cross-lingual opinion analysis via negative transfer detection. In: ACL (2014)

    Google Scholar 

  4. Mehrabian, A.: Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr. Psychol. 14(4), 261–292 (1996)

    Article  MathSciNet  Google Scholar 

  5. Min-Ling, Z., Zhi-Hua, Z.: CoTrade: confident co-training with data editing. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 41(6), 1612–1626 (2011)

    Article  Google Scholar 

  6. Mohammad, Saif. M.: #Emotional tweets, pp. 246–255. Association for Computational Linguistics (2012)

    Google Scholar 

  7. Mohammad, S.M., Svetlana, K.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31, 301–326 (2014)

    Article  MathSciNet  Google Scholar 

  8. Wenbo, W., Chen, L., Krishnaprasad, T., Amit, P.S.: Harnessing twitter “big data" for automatic emotion identification. In: Privacy, Security, Risk and Trust (PASSAT), International Conference on and International Confernece on Social Computing, pp. 587–592 (2012)

    Google Scholar 

  9. Wan, X.: Collaborative data cleaning for sentiment classification with noisy training corpus. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part I. LNCS, vol. 6634, pp. 326–337. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  10. Quan, C., Ren, F.: Construction of a blog emotion corpus for Chinese emotional expression analysis. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, vol. 3, pp. 1446–1454. Association for Computational Linguistics, August 2009

    Google Scholar 

  11. Strapparava, C., Mihalcea, R.: Semeval-2007 task 14: affective text. In: Proceedings of the 4th International Workshop on Semantic Evaluations, pp. 70–74. Association for Computational Linguistics, June 2007

    Google Scholar 

  12. Bennett, K., Demiriz, A.: Semi-supervised support vector machines. In: Advances in Neural Information Processing Systems, pp. 368–374 (1999)

    Google Scholar 

  13. Goldberg, A.B., Xiaojin, Z., Singh, A., Xu, Z., Nowak, R.D.: Multi-manifold semi-supervised learning. In: AISTATS, pp. 169–176 (2009)

    Google Scholar 

Download references

Acknowledgement

We give our thanks to the anonymous reviewers for the helpful comments. The work presented in this paper is supported by Hong Kong Polytechnic University (PolyU RTVU and CERG PolyU 15211/14E) and the National Nature Science Foundation of China (project number:6127229).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Minglei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Li, M., Lu, Q., Gui, L., Long, Y. (2016). Towards Scalable Emotion Classification in Microblog Based on Noisy Training Data. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47674-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47673-5

  • Online ISBN: 978-3-319-47674-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics