Incorporating pre-training in long short-term memory networks for tweet classification

Yuan, Shuhan; Wu, Xintao; Xiang, Yang

doi:10.1007/s13278-018-0530-1

Incorporating pre-training in long short-term memory networks for tweet classification

Original Article
Published: 14 August 2018

Volume 8, article number 52, (2018)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

376 Accesses
12 Citations
Explore all metrics

Abstract

The paper presents deep learning models for tweet classification. Our approach is based on the long short-term memory (LSTM) recurrent neural network and hence expects to be able to capture long-term dependencies among words. We first focus on binary classification task. The basic model, called LSTM-TC, takes word embeddings as inputs, uses LSTM to derive the semantic tweet representation, and applies logistic regression to predict the tweet label. The basic LSTM-TC model, like other deep learning models, requires a large amount of well-labeled training data to achieve good performance. To address this challenge, we further develop an improved model, called LSTM-TC*, that incorporates a large amount of weakly labeled data for classifying tweets. Finally, we extend the models, called LSTM-Multi-TC and LSTM-Multi-TC*, to multiclass classification task. We present two approaches of constructing the weakly labeled data. One is based on hashtag information and the other is based on the prediction output of a traditional classifier that does not need a large amount of well-labeled training data. Our LSTM-TC* and LSTM-Multi-TC* models first learn tweet representation based on the weakly labeled data, and then train the classifiers based on the small amount of well-labeled data. Experimental results show that: (1) the proposed methods can be successfully used for tweet classification and outperform existing state-of-the-art methods; (2) pre-training tweet representations, which utilizes weakly labeled tweets, can significantly improve the accuracy of tweet classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SentiDariPers: Sentiment Analysis of Dari-Persian Tweets Based on People’s Views and Opinion

Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

Article 22 February 2018

Girish Keshav Palshikar, Manoj Apte & Deepak Pandita

A Wide & Deep Learning Approach for Covid-19 Tweet Classification

Notes

References

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations, 7–9 May 2015
Bamman D, Smith NA (2015) Contextualized sarcasm detection on twitter. In: Proceedings of the ninth international AAAI conference on web and social media, pp 574–577
Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow IJ, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: The deep learning workshop, NIPS 2012
Bengio Y, Simard P, Frasconi P (1997) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166. https://doi.org/10.1109/72.279181
Article Google Scholar
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137–1155
MATH Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. TPAMI 35(8):1798–1828
Article Google Scholar
Blacoe W, Lapata M (2012) A comparison of vector-based representations for semantic composition. IN: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL ’12), pp 546–556
Bonchi F, Hajian S, Mishra B, Ramazzotti D (2017) Exposing the probabilistic causal structure of discrimination. Int J Data Sci Anal 3(1):1–21
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’16. ACM, New York, pp 785–794. https://doi.org/10.1145/2939672.2939785
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
MATH Google Scholar
Denil M, Demiraj A, Kalchbrenner N, Blunsom P, de Freitas N (2014) Modelling, visualising and summarising documents with a single convolutional neural network. University of Oxford
Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’15), pp 259–268
Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47(1):1–66
Article Google Scholar
Gers F, Schmidhuber J, Cummins F (1999) Learning to forget: continual prediction with LSTM. In: Ninth international conference on artificial neural networks ICANN, pp 850–855
Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850 [cs]
Hajian S, Domingo-Ferrer J (2013) A methodology for direct and indirect discrimination prevention in data mining. TKDE 25(7):1445–1459
Google Scholar
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Irsoy O, Cardie C (2014) Opinion mining with deep recurrent neural networks. In: Proceedings of the conference on empirical methods in natural language processing, pp. 720–728
Iyyer M, Enns P, Boyd-Graber J, Resnik P (2014) Political ideology detection using recursive neural networks. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1113–1122
Joshi A, Bhattacharyya P, Carman MJ (2017) Automatic sarcasm detection: a survey. ACM Comput Surv 50(5):73:1–73:22
Article Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1746–1751
Lebret R, Collobert R (2014) Word embeddings through Hellinger PCA. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics, pp 482–490
LeCun YA, Bottou L, Orr GB, Mller KR (2012) Efficient BackProp. In: Neural networks: tricks of the trade, 2nd edn. LNCS 7700. Springer, Berlin, pp 9–48
Chapter Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Mikolov T, Corrado G, Chen K, Dean J (2013) Efficient estimation of word representations in vector space. In: International conference on learning representations, January 2013
Mitchell J, Lapata M (2010) Composition in distributional models of semantics. Cogn Sci 34(8):1388–1429
Article Google Scholar
Paulus R, Xiong C, Socher R (2018) A deep reinforced model for abstractive summarization. In: International conference on learning representations, 30 Apr–3 May 2018
Pedreschi D, Ruggieri S, Turini F (2013) The discovery of discrimination. In: Custers B, Calders T, Schermer B, Zarsky T (eds) Discrimination and privacy in the information society. Studies in applied philosophy, epistemology and rational ethics, vol 3. Springer, Berlin, Heidelberg, pp 91–108
Chapter Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543
Rajadesingan A, Zafarani R, Liu H (2015) Sarcasm detection on twitter: a behavioral modeling approach. In: Proceedings of the eighth ACM international conference on web search and data mining, pp 97–106
Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29(05):582–638
Article Google Scholar
Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 379–389
Shen Y, Jin R, Chen J, He X, Gao J, Deng L (2015) A deep embedding model for co-occurrence learning. arXiv:1504.02824 [cs]
Shen Y, Huang PS, Gao J, Chen W (2017) ReasoNet: learning to stop reading in machine comprehension. In: KDD, ACM, KDD ’17, pp 1047–1055
Socher R, Pennington J, Huang EH, Ng AY, Manning CD (2011) Semi-supervised recursive autoencoders for predicting sentiment distributions. In: Proceedings of the conference on empirical methods in natural language processing, pp 151–161
Socher R, Huval B, Manning CD, Ng AY (2012) Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1201–1211
Socher R, Perelygin A, Wu JY, Chuang J, Manning CD, Ng AY, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1631–1642
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics, pp 1556–1566
Tang D, Wei F, Yang N, Zhou M, Liu T, Qin B (2014) Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics, pp 1555–1565
Tomas M, Karafiat M, Burget L, Cernocky JH, Khudanpur S (2010) Recurrent neural network based language model. In: INTERSPEECH
Turney PD (2014) Semantic composition and decomposition: from recognition to generation. arXiv:1405.7908 [cs]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Vincent P, Larochelle H, Lajoie I (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Weston J, Ratle F, Mobahi H, Collobert R (2012) Deep learning via semi-supervised embedding. In: Montavon G, Orr GB, Mller KR (eds) Neural networks: tricks of the trade, vol 7700. Lecture notes in computer science. Springer, Berlin, pp 639–655
Chapter Google Scholar
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of the 2016 conference of the North American chapter of the Association for Computational Linguistics: human language technologies, pp 1480–1489
Yih Wt, Toutanova K, Platt J, Meek C (2011) Learning discriminative projections for text similarity measures. In: Proceedings of the fifteenth conference on computational natural language learning, pp 247–256
Yuan S, Wu X, Xiang Y (2016) Incorporating pre-training in long short-term memory networks for tweets classification. In: IEEE 16th international conference on data mining (ICDM), pp 1329–1334
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701 [cs]
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Zhu X, Grefenstette E (2017) Deep learning for semantic composition. In: Proceedings of annual meeting of the Association for Computational Linguistics, Tutorial Abstracts, pp 6–7

Download references

Acknowledgements

The work is significant extended from our previous work (Yuan et al. 2016). In our previous work (Yuan et al. 2016), we proposed LSTM-TC and LSTM-TC* to deal with the binary classification task. In this work, we further extend our framework to address the multi-class classification task. The authors acknowledge the support from National Science Foundation (1646654) to Xintao Wu, and the National Natural Science Foundation of China (71571136) and the Research Project of Science and Technology Commission of Shanghai Municipality (14511108002, 16JC1403000) to Yang Xiang.

Author information

Authors and Affiliations

University of Arkansas, Fayetteville, AR, USA
Shuhan Yuan & Xintao Wu
Tongji University, Shanghai, China
Yang Xiang

Authors

Shuhan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Xintao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xintao Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yuan, S., Wu, X. & Xiang, Y. Incorporating pre-training in long short-term memory networks for tweet classification. Soc. Netw. Anal. Min. 8, 52 (2018). https://doi.org/10.1007/s13278-018-0530-1

Download citation

Received: 20 February 2018
Revised: 29 July 2018
Accepted: 30 July 2018
Published: 14 August 2018
DOI: https://doi.org/10.1007/s13278-018-0530-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incorporating pre-training in long short-term memory networks for tweet classification

Abstract

Access this article

Similar content being viewed by others

SentiDariPers: Sentiment Analysis of Dari-Persian Tweets Based on People’s Views and Opinion

Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

A Wide & Deep Learning Approach for Covid-19 Tweet Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incorporating pre-training in long short-term memory networks for tweet classification

Abstract

Access this article

Similar content being viewed by others

SentiDariPers: Sentiment Analysis of Dari-Persian Tweets Based on People’s Views and Opinion

Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets

A Wide & Deep Learning Approach for Covid-19 Tweet Classification

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation