Skip to main content
Log in

Incorporating pre-training in long short-term memory networks for tweet classification

  • Original Article
  • Published:
Social Network Analysis and Mining Aims and scope Submit manuscript

Abstract

The paper presents deep learning models for tweet classification. Our approach is based on the long short-term memory (LSTM) recurrent neural network and hence expects to be able to capture long-term dependencies among words. We first focus on binary classification task. The basic model, called LSTM-TC, takes word embeddings as inputs, uses LSTM to derive the semantic tweet representation, and applies logistic regression to predict the tweet label. The basic LSTM-TC model, like other deep learning models, requires a large amount of well-labeled training data to achieve good performance. To address this challenge, we further develop an improved model, called LSTM-TC*, that incorporates a large amount of weakly labeled data for classifying tweets. Finally, we extend the models, called LSTM-Multi-TC and LSTM-Multi-TC*, to multiclass classification task. We present two approaches of constructing the weakly labeled data. One is based on hashtag information and the other is based on the prediction output of a traditional classifier that does not need a large amount of well-labeled training data. Our LSTM-TC* and LSTM-Multi-TC* models first learn tweet representation based on the weakly labeled data, and then train the classifiers based on the small amount of well-labeled data. Experimental results show that: (1) the proposed methods can be successfully used for tweet classification and outperform existing state-of-the-art methods; (2) pre-training tweet representations, which utilizes weakly labeled tweets, can significantly improve the accuracy of tweet classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. https://code.google.com/archive/p/word2vec/.

  2. http://scikit-learn.org/stable/.

  3. http://xgboost.readthedocs.io/en/latest/.

  4. http://everydaysexism.com/, http://stemfeminist.com/.

  5. http://www.cs.cmu.edu/~ark/TweetNLP/.

References

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations, 7–9 May 2015

  • Bamman D, Smith NA (2015) Contextualized sarcasm detection on twitter. In: Proceedings of the ninth international AAAI conference on web and social media, pp 574–577

  • Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow IJ, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: The deep learning workshop, NIPS 2012

  • Bengio Y, Simard P, Frasconi P (1997) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181

    Article  Google Scholar 

  • Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155

    MATH  Google Scholar 

  • Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. TPAMI 35(8):1798–1828

    Article  Google Scholar 

  • Blacoe W, Lapata M (2012) A comparison of vector-based representations for semantic composition. IN: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL ’12), pp 546–556

  • Bonchi F, Hajian S, Mishra B, Ramazzotti D (2017) Exposing the probabilistic causal structure of discrimination. Int J Data Sci Anal 3(1):1–21

    Article  Google Scholar 

  • Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16. ACM, New York, pp 785–794. https://doi.org/10.1145/2939672.2939785

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537

    MATH  Google Scholar 

  • Denil M, Demiraj A, Kalchbrenner N, Blunsom P, de Freitas N (2014) Modelling, visualising and summarising documents with a single convolutional neural network. University of Oxford

  • Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’15), pp 259–268

  • Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66

    Article  Google Scholar 

  • Gers F, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with LSTM. In: Ninth international conference on artificial neural networks ICANN, pp 850–855

  • Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850 [cs]

  • Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. TKDE 25(7):1445–1459

    Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. In: Proceedings of the conference on empirical methods in natural language processing, pp. 720–728

  • Iyyer M, Enns P, Boyd-Graber J, Resnik P (2014) Political ideology detection using recursive neural networks. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1113–1122

  • Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection: a survey. ACM Comput Surv 50(5):73:1–73:22

    Article  Google Scholar 

  • Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751

  • Lebret R, Collobert R (2014) Word embeddings through Hellinger PCA. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics, pp 482–490

  • LeCun YA, Bottou L, Orr GB, Mller KR (2012) Efficient BackProp. In: Neural networks: tricks of the trade, 2nd edn. LNCS 7700. Springer, Berlin, pp 9–48

    Chapter  Google Scholar 

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  • Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations, January 2013

  • Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cogn Sci 34(8):1388–1429

    Article  Google Scholar 

  • Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization. In: International conference on learning representations, 30 Apr–3 May 2018

  • Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers B, Calders T, Schermer B, Zarsky T (eds) Discrimination and privacy in the information society. Studies in applied philosophy, epistemology and rational ethics, vol 3. Springer, Berlin, Heidelberg, pp 91–108

    Chapter  Google Scholar 

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Rajadesingan A, Zafarani R, Liu H (2015) Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 97–106

  • Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(05):582–638

    Article  Google Scholar 

  • Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 379–389

  • Shen Y, Jin R, Chen J, He X, Gao J, Deng L (2015) A deep embedding model for co-occurrence learning. arXiv:1504.02824 [cs]

  • Shen Y, Huang PS, Gao J, Chen W (2017) ReasoNet: learning to stop reading in machine comprehension. In: KDD, ACM, KDD ’17, pp 1047–1055

  • Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, pp 151–161

  • Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211

  • Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  • Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics, pp 1556–1566

  • Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1555–1565

  • Tomas M, Karafiat M, Burget L, Cernocky JH, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH

  • Turney PD (2014) Semantic composition and decomposition: from recognition to generation. arXiv:1405.7908 [cs]

  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  • Vincent P, Larochelle H, Lajoie I (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  • Weston J, Ratle F, Mobahi H, Collobert R (2012) Deep learning via semi-supervised embedding. In: Montavon G, Orr GB, Mller KR (eds) Neural networks: tricks of the trade, vol 7700. Lecture notes in computer science. Springer, Berlin, pp 639–655

    Chapter  Google Scholar 

  • Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pp 1480–1489

  • Yih Wt, Toutanova K, Platt J, Meek C (2011) Learning discriminative projections for text similarity measures. In: Proceedings of the fifteenth conference on computational natural language learning, pp 247–256

  • Yuan S, Wu X, Xiang Y (2016) Incorporating pre-training in long short-term memory networks for tweets classification. In: IEEE 16th international conference on data mining (ICDM), pp 1329–1334

  • Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701 [cs]

  • Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

  • Zhu X, Grefenstette E (2017) Deep learning for semantic composition. In: Proceedings of annual meeting of the Association for Computational Linguistics, Tutorial Abstracts, pp 6–7

Download references

Acknowledgements

The work is significant extended from our previous work (Yuan et al. 2016). In our previous work (Yuan et al. 2016), we proposed LSTM-TC and LSTM-TC* to deal with the binary classification task. In this work, we further extend our framework to address the multi-class classification task. The authors acknowledge the support from National Science Foundation (1646654) to Xintao Wu, and the National Natural Science Foundation of China (71571136) and the Research Project of Science and Technology Commission of Shanghai Municipality (14511108002, 16JC1403000) to Yang Xiang.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xintao Wu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, S., Wu, X. & Xiang, Y. Incorporating pre-training in long short-term memory networks for tweet classification. Soc. Netw. Anal. Min. 8, 52 (2018). https://doi.org/10.1007/s13278-018-0530-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13278-018-0530-1

Keywords

Navigation