Abstract
The paper presents deep learning models for tweet classification. Our approach is based on the long short-term memory (LSTM) recurrent neural network and hence expects to be able to capture long-term dependencies among words. We first focus on binary classification task. The basic model, called LSTM-TC, takes word embeddings as inputs, uses LSTM to derive the semantic tweet representation, and applies logistic regression to predict the tweet label. The basic LSTM-TC model, like other deep learning models, requires a large amount of well-labeled training data to achieve good performance. To address this challenge, we further develop an improved model, called LSTM-TC*, that incorporates a large amount of weakly labeled data for classifying tweets. Finally, we extend the models, called LSTM-Multi-TC and LSTM-Multi-TC*, to multiclass classification task. We present two approaches of constructing the weakly labeled data. One is based on hashtag information and the other is based on the prediction output of a traditional classifier that does not need a large amount of well-labeled training data. Our LSTM-TC* and LSTM-Multi-TC* models first learn tweet representation based on the weakly labeled data, and then train the classifiers based on the small amount of well-labeled data. Experimental results show that: (1) the proposed methods can be successfully used for tweet classification and outperform existing state-of-the-art methods; (2) pre-training tweet representations, which utilizes weakly labeled tweets, can significantly improve the accuracy of tweet classification.
Similar content being viewed by others
References
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations, 7–9 May 2015
Bamman D, Smith NA (2015) Contextualized sarcasm detection on twitter. In: Proceedings of the ninth international AAAI conference on web and social media, pp 574–577
Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow IJ, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: The deep learning workshop, NIPS 2012
Bengio Y, Simard P, Frasconi P (1997) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. TPAMI 35(8):1798–1828
Blacoe W, Lapata M (2012) A comparison of vector-based representations for semantic composition. IN: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL ’12), pp 546–556
Bonchi F, Hajian S, Mishra B, Ramazzotti D (2017) Exposing the probabilistic causal structure of discrimination. Int J Data Sci Anal 3(1):1–21
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16. ACM, New York, pp 785–794. https://doi.org/10.1145/2939672.2939785
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
Denil M, Demiraj A, Kalchbrenner N, Blunsom P, de Freitas N (2014) Modelling, visualising and summarising documents with a single convolutional neural network. University of Oxford
Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’15), pp 259–268
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Gers F, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with LSTM. In: Ninth international conference on artificial neural networks ICANN, pp 850–855
Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850 [cs]
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. TKDE 25(7):1445–1459
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. In: Proceedings of the conference on empirical methods in natural language processing, pp. 720–728
Iyyer M, Enns P, Boyd-Graber J, Resnik P (2014) Political ideology detection using recursive neural networks. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1113–1122
Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection: a survey. ACM Comput Surv 50(5):73:1–73:22
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751
Lebret R, Collobert R (2014) Word embeddings through Hellinger PCA. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics, pp 482–490
LeCun YA, Bottou L, Orr GB, Mller KR (2012) Efficient BackProp. In: Neural networks: tricks of the trade, 2nd edn. LNCS 7700. Springer, Berlin, pp 9–48
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations, January 2013
Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cogn Sci 34(8):1388–1429
Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization. In: International conference on learning representations, 30 Apr–3 May 2018
Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers B, Calders T, Schermer B, Zarsky T (eds) Discrimination and privacy in the information society. Studies in applied philosophy, epistemology and rational ethics, vol 3. Springer, Berlin, Heidelberg, pp 91–108
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Rajadesingan A, Zafarani R, Liu H (2015) Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 97–106
Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(05):582–638
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 379–389
Shen Y, Jin R, Chen J, He X, Gao J, Deng L (2015) A deep embedding model for co-occurrence learning. arXiv:1504.02824 [cs]
Shen Y, Huang PS, Gao J, Chen W (2017) ReasoNet: learning to stop reading in machine comprehension. In: KDD, ACM, KDD ’17, pp 1047–1055
Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, pp 151–161
Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211
Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics, pp 1556–1566
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1555–1565
Tomas M, Karafiat M, Burget L, Cernocky JH, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH
Turney PD (2014) Semantic composition and decomposition: from recognition to generation. arXiv:1405.7908 [cs]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Vincent P, Larochelle H, Lajoie I (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Weston J, Ratle F, Mobahi H, Collobert R (2012) Deep learning via semi-supervised embedding. In: Montavon G, Orr GB, Mller KR (eds) Neural networks: tricks of the trade, vol 7700. Lecture notes in computer science. Springer, Berlin, pp 639–655
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pp 1480–1489
Yih Wt, Toutanova K, Platt J, Meek C (2011) Learning discriminative projections for text similarity measures. In: Proceedings of the fifteenth conference on computational natural language learning, pp 247–256
Yuan S, Wu X, Xiang Y (2016) Incorporating pre-training in long short-term memory networks for tweets classification. In: IEEE 16th international conference on data mining (ICDM), pp 1329–1334
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701 [cs]
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Zhu X, Grefenstette E (2017) Deep learning for semantic composition. In: Proceedings of annual meeting of the Association for Computational Linguistics, Tutorial Abstracts, pp 6–7
Acknowledgements
The work is significant extended from our previous work (Yuan et al. 2016). In our previous work (Yuan et al. 2016), we proposed LSTM-TC and LSTM-TC* to deal with the binary classification task. In this work, we further extend our framework to address the multi-class classification task. The authors acknowledge the support from National Science Foundation (1646654) to Xintao Wu, and the National Natural Science Foundation of China (71571136) and the Research Project of Science and Technology Commission of Shanghai Municipality (14511108002, 16JC1403000) to Yang Xiang.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yuan, S., Wu, X. & Xiang, Y. Incorporating pre-training in long short-term memory networks for tweet classification. Soc. Netw. Anal. Min. 8, 52 (2018). https://doi.org/10.1007/s13278-018-0530-1
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13278-018-0530-1