A Sentence Vector Based Over-Sampling Method for Imbalanced Emotion Classification

Chen, Tao; Xu, Ruifeng; Lu, Qin; Liu, Bin; Xu, Jun; Yao, Lin; He, Zhenyu

doi:10.1007/978-3-642-54903-8_6

Tao Chen¹⁷,
Ruifeng Xu¹⁷,
Qin Lu¹⁸,
Bin Liu¹⁷,
Jun Xu¹⁷,
Lin Yao¹⁹ &
…
Zhenyu He¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1820 Accesses
4 Citations

Abstract

Imbalanced training data poses a serious problem for supervised learning based text classification. Such a problem becomes more serious in emotion classification task with multiple emotion categories as the training data can be quite skewed. This paper presents a novel over-sampling method to form additional sum sentence vectors for minority classes in order to improve emotion classification for imbalanced data. Firstly, a large corpus is used to train a continuous skip-gram model to form each word vector using word/POS pair as the unit of word vector. The sentence vectors of the training data are then constructed as the sum vector of their word/POS vectors. The new minority class training samples are then generated by randomly add two sentence vectors in the corresponding class until the training samples for each class are the same so that the classifiers can be trained on fully balanced training dataset. Evaluations on NLP&CC2013 Chinese micro blog emotion classification dataset shows that the obtained classifier achieves 48.4% average precision, an 11.9 percent improvement over the state-of-art performance on this dataset (at 36.5%). This result shows that the proposed over-sampling method can effectively address the problem of data imbalance and thus achieve much improved performance for emotion classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Turney, P.-D.: Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews. In: Proceedings of ACL 2002, pp. 417–424 (2002)
Google Scholar
Kamps, J., Marx, M., Mokken, R.-J., de Rijke, M.: Using WordNet to Measure Semantic Orientation of Adjectives. In: Proceedings of LREC 2004, pp. 1115–1118 (2004)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? Sentiment Classification using Machine Learning Techniques. In: Proceedings of EMNLP 2002, pp. 79–86 (2002)
Google Scholar
Gu, X.-J., Wang, Z.-L., Liu, J.-W., Liu, S.: Research on Modeling Artificial Psychology Based on HMM. Application Research of Computers 12, 30–32 (2006)
Google Scholar
Quan, C., Ren, F.: Construction of a Blog Emotion Corpus for Chinese Emotional Expression Analysis. In: Proceedings of EMNLP 2009, pp. 1446–1454 (2009)
Google Scholar
Chawla, N.V., Japkowicz, N., Kolcz, A.: Editorial: Special Issue on Learning from Imbalanced Data Sets. SIGKDD Explorations 6(1), 1–6 (2004)
Article Google Scholar
Zhou, Z.-H., Liu, X.-Y.: Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem. Knowledge and Data Engineering 18(1), 63–77 (2006)
Article Google Scholar
Ertekin, S., Huang, J., Bottou, L., Giles, C.-L.: Learning on the Border: Active Learning in Imbalanced Data Classification. In: Proceedings of CIKM 2007 (2007)
Google Scholar
Chen, T., Xu, R., Wu, M., Liu, B.: A Sentiment Classification Approach based on Sentiment Sentence Framework. Journal of Chinese Information Processing 27(5), 67–74 (2013)
Google Scholar
Ren, J.-W., Yang, Y., Wang, H., Lin, H.: Construction of the Binary Affective Commonsense Knowledgebase and its Application in Text Affective Analysis. China Science Paper Online (2013), http://www.paper.edu.cn/releasepaper/content/201301-158
Longadge, R., Dongre, S.-S., Malik, L.: Class Imbalance Problem in Data Mining Review. International Journal of Computer Science and Network 2(1), 1305–1707 (2013)
Google Scholar
Wang, Z.-Q., Li, S.-S., Zhu, Q.-M., Li, P.-F., Zhou, G.-D.: Chinese Sentiment Classification on Imbalanced Data Distribution. Journal of Chinese Information Processing 26(3), 33–37 (2012)
Google Scholar
Deerwester, S., Dumais, S.-T., Furnas, G.-W., Landauer, T.-K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Article Google Scholar
Bellegarda, J.-R.: A Latent Semantic Analysis Framework for Large–span Language Modeling. In: Proceedings of Eurospeech 1997, pp. 1451–1454 (1997)
Google Scholar
Blei, D.-M., Ng, A.-Y., Jordan, M.-I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
Riis, S., Krogh, A.: Improving Protein Secondary Structure Prediction using Structured Neural Networks and Multiple Sequence Profiles. Journal of Computational Biology, 163–183 (1996)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space. In: Proceedings of ICLR Workshop (2013)
Google Scholar
Han, J., Kamber, M.: Data mining: Concepts and Technique. Morgan Kaufman, San Francisco (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China
Tao Chen, Ruifeng Xu, Bin Liu, Jun Xu & Zhenyu He
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
Qin Lu
Peking University Shenzhen Graduate School, Shenzhen, Guangdong, China
Lin Yao

Authors

Tao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ruifeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qin Lu
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Lin Yao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu He
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, T. et al. (2014). A Sentence Vector Based Over-Sampling Method for Imbalanced Emotion Classification. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics