Abstract
Text categorization, or text classification, is one of key tasks for representing the semantic information of documents. Traditional deep leaning models for text categorization are generally time-consuming on large scale datasets due to slow convergence rate or heavily rely on the pre-trained word vectors. Motivated by fully convolutional networks in the field of image processing, we introduce fully convolutional layers to substantially reduce the number of parameters in the text classification model. A character-level model for short text classification, integrating convolutional neural network, bidirectional gated recurrent unit, highway network with the fully connected layers, is proposed to capture both the global and the local textual semantics at the fast convergence speed. Furthermore, In addition, error minimization extreme learning machine is incorporated into the proposed model to improve the classification accuracy further. Extensive experiments show that our approach achieves the state-of-the-art performance compared with the existing methods on the large scale text datasets.
Similar content being viewed by others
Notes
References
Zhang W, Tang X, Yoshida T (2015) TESC: an approach to text classification using semi-supervised clustering. Knowl-Based Syst 75:152–160
Zhang W, Du Y, Yoshida T, Wang Q (2018) DRI-RCNN: an approach to deceptive review identification using recurrent convolutional neural network. Inf Process Manag 54(4):576–592
Poria S, Cambria E, Bajpai R, Hussain A (2017) A review of affective computing: from unimodal analysis to multimodal fusion. Inf Fusion 37:98–125
Cambria E (2016) Affective computing and sentiment analysis. IEEE Intell Syst 31(2):102–107
Gong Y, Ke Q, Isard M, Lazebnik S (2014) A multi-view embedding space for modeling internet images, tags, and their semantics. Int J Comput Vision 106(2):210–233
Carreras X, Marquez L (2001) Boosting trees for anti-spam email filtering. In: RANLP, pp 58–64
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, pp 1188–1196
Poria S, Cambria E, Howard N, Huang G-B, Hussain A (2016) Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174:50–59
Cambria E, White B (2014) Jumping NLP curves: a review of natural language processing research. IEEE Comput Intell Mag 9(2):48–57
Pavlinek M, Podgorelec V (2017) Text classification method based on self-training and LDA topic models. Expert Syst Appl 80:83–93
Fu R, Qin B, Liu T (2015) Open-categorical text classification based on multi-LDA models. Soft Comput 19(1):29–38
Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? And how to fix it using search-based software engineering. Inform Software Tech 98:74–88
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: Proceedings of the 2013 international conference on software engineering, IEEE Press, pp 522–531
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing, pp 1556–1566
Cambria E, Fu J, Bisio F, Poria S (2015) AffectiveSpace 2: enabling affective intuition for concept-level sentiment analysis. In: AAAI, Austin, pp 508–514
Chunting Z, Chonglin S, Zhiyuan L et al (2015) A C-LSTM neural network for text classification. Comput Sci 1(4):39–44
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1746–1751
dos Santos C, Gatti M (2014) Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th international conference on computational linguistics: technical papers, pp 69–78, Dublin, Ireland
Bengio Y, Schwenk H, Senécal J-S, Morin F, Gauvain J-L (2016) Neural probabilistic language models. In: Innovations in machine learning. Springer, pp 137–186
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Zhang X, Zhao JB, Lecun Y (2015) Character-level convolutional networks for text classification. Adv Neural Inf Process Syst 28:1–9
Kanaris I, Kanaris K, Houvardas I, Stamatatos E (2007) Words versus character n-grams for anti-spam filtering. Int J Artif Intell Tools 16(06):1047–1067
Santos CD, Zadrozny B (2014) Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st international conference on machine learning (ICML-14), pp 1818–1826
Shen Y, He X, Gao J, Deng L, Mesnil G (2014) A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 101–110
Poria S, Cambria E, Gelbukh A (2015) Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: EMNLP, pp 2539–2544
Xu B, Ye D, Xing Z, Xia X, Chen G, Li S (2016) Predicting semantically linkable knowledge in developer online forums via convolutional neural network. In: Proceedings of the 31st IEEE/ACM international conference on automated software engineering (ASE 2016). New York, NY, USA, pp 51–62
Chaturvedi I, Ong Y-S, Tsang I, Welsch R, Cambria E (2016) Learning word dependencies in text by means of a deep recurrent belief network. Knowl-Based Syst 108:144–154
Cho K, Van Merriёnboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of empirical methods in natural language processing, pp 1724–1734
Poria S, Cambria E, Gelbukh A (2016) Aspect extraction for opinion mining with a deep convolutional neural network. Knowl-Based Syst 108:42–49
Majumder N, Poria S, Gelbukh A, Cambria E (2017) Deep learning based document modeling for personality detection from text. IEEE Intell Syst 32(2):74–79
Jaderberg M, Vedaldi A, Zisserman A (2014) Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British machine vision conference
Lebedev V, Ganin Y, Rakhuba M, Oseledets I, Lempitsky V (2015) Speeding-up convolutional neural networks using fine-tuned cp-decomposition. In: 3rd international conference on learning representations
Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1422–1432
Cho K, Van Merriёnboer B, Gulcehre C, Bahdanau D, Bougares F Schwenk H, Bengio Y (2015) Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1724–1734
Feng G, Huang GB, Lin Q et al (2009) Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20(8):1352–1357
Zhang M-L, Zhou Z-H (2006) Multilabel neural networks with applications to functional genomics and text categorization. IEEE Trans Knowl Data Eng 18(10):1338–1351
Nam J, Kim J, Mencía EL, Gurevych I, Fürnkranz J (2014) Largescale multi-label text classification—revisiting neural networks. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 437–452
Benites F, Sapozhnikova E (2015) Haram: a hierarchical aram neural network for large-scale text classification. In: 2015 IEEE international conference on data mining workshop (ICDMW). IEEE, pp 847–854
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Advances in neural information processing systems, pp 3079–3087
Ling W, Luís T, Marujo L et al (2015) Finding function in form: compositional character models for open vocabulary word representation. Comput Sci 10:1899–1907
Huang Z, Xu W, Yu K (2015) Bidirectional LSTM-CRF models for sequence Tagging. arXiv:1508.01991
Srivastava RK, Greff K, Schmidhuber J (2015) Training very deep networks. Comput Sci 10:1–5
Qiao C, Huang B, Niu G, Li D, Dong D, He W, Yu D, Wu H (2018) A new method of region embedding for text classification. In: International conference on learning representations
Xiao Y, Cho K (2016) Efficient character-level document classification by combining convolution and recurrent layers. arXiv:1602.00367
Joulin A, Grave E, Bojanowski P, Mikolov T (2017) Bag of tricks for efficient text classification. In: Proceedings of the 15th conference of the european chapter of the association for computational linguistics, vol 2, pp 427–431
Yogatama D, Dyer C, Ling W, Blunsom P (2017) Generative and discriminative text classification with recurrent neural networks. arXiv:1703.01898
Conneau A, Schwenk H, Barrault L, Lecun Y (2016) Very deep convolutional networks for natural language processing. arXiv:1606.01781
Acknowledgements
This work is supported by “the Fundamental Research Funds for the Central Universities” (No. 2017XKQY082).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, B., Zhou, Y. & Sun, W. Character-level text classification via convolutional neural network and gated recurrent unit. Int. J. Mach. Learn. & Cyber. 11, 1939–1949 (2020). https://doi.org/10.1007/s13042-020-01084-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-020-01084-9