Abstract
Online social media is a powerful source of information that can influence users’ decisions. Due to the huge volume of data generated by such media, many researches have been done to automate text categorization. However, finding useful information to satisfy user’s needs is not an easy task. There are many challenges to overcome especially in short text categorization that in addition to being a time-consuming and costly process, short messages have misspellings, typos, irony words and lack of context. To solve these challenges, this article proposes GM-ShorT, a Generic framework for Multilingual Short Text Categorization based on Convolutional Neural Network (CNN). For this, GM-ShorT collects online social media data. Such data were used as input to CNN that is combined with a word embedding mechanism to categorize short text messages. We explored several architectures for CNN and show that GM-ShorT can be used in multilingual Short text categorization with an accuracy of 13.58% higher when compared to other classical approaches.
Similar content being viewed by others
References
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137
Caragea C, Silvescu A, Tapia AH (2016) Identifying informative messages in disaster events using convolutional neural networks. In: International conference on information systems for crisis response and management, pp 137–147
Georgakopoulos SV, Tasoulis SK, Vrahatis AG, Plagianakos VP (2018) Convolutional neural networks for toxic comment classification. In: Proceedings of the 10th hellenic conference on artificial intelligence, pp 1–6
Geraldo Filho P, Villas L A, Gonçalves V P, Pessin G, Loureiro A A, Ueyama J (2019) Energy-efficient smart home systems: infrastructure and decision-making process. Internet Things 5:153
Hartmann N, Fonseca E, Shulby C, Treviso M, Rodrigues J, Aluisio S (2017) Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv:1708.06025
Johnson R, Zhang T (2014) Effective use of word order for text categorization with convolutional neural networks. arXiv:1412.1058
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI conference on artificial intelligence
Lu Y, Sakamoto K, Shibuki H, Mori T (2017) Construction of a multilingual annotated corpus for deeper sentiment understanding in social media. Inf Media Technol 12:111
Lu Y, Sakamoto K, Shibuki H, Mori T (2017) Are deep learning methods better for twitter sentiment analysis. In: Proceedings of the 23rd annual meeting of natural language processing (Japan), pp 787–790
Mandelbaum A, Shalev A (2016) Word embeddings and their use in sentence classification tasks. arXiv:1610.08229
Merchant R M, Elmer S, Lurie N (2011) Integrating social media into emergency-preparedness efforts. New Engl J Med 365(4):289
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Neto J, Filho G, Mano L, Ueyama J (2018) Verbo: voice emotion recognition database in Portuguese language. J Comput Sci 14(11):1420
Nguyen D T, Joty S, Imran M, Sajjad H, Mitra P (2016) Applications of online deep learning for crisis response using social media information. arXiv:1610.01030
Oliveira D F, Chan K S (2019) The effects of trust and influence on the spreading of low and high quality information. Phys A: Stat Mech Appl 525:657
Rocha Filho G P, Meneguette R I, Maia G, Pessin G, Gonçalves V P, Weigang L, Ueyama J, Villas L A (2020) A fog-enabled smart home solution for decision-making using smart objects. Future Gener Comput Syst 103:18
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. ACM, pp 851–860, DOI Proceedings of the 19th international conference on World wide web, (to appear in print)
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1
Simon T, Goldberg A, Adini B (2015) Socializing in emergencies—a review of the use of social media in emergency situations. Int J Inf Manag 35(5):609
Sosa P M, Sadigh S (2016) Twitter sentiment analysis with neural networks. Academia. edu
Steiner-Correa F, Viedma-del Jesus M I, Lopez-Herrera A (2018) A survey of multilingual human-tagged short message datasets for sentiment analysis tasks. Soft Comput 22(24):8227
Sun F, Belatreche A, Coleman S, McGinnity TM, Li Y (2014) Pre-processing online financial text for sentiment classification: a natural language processing approach. In: 2014 IEEE conference on computational intelligence for financial engineering & economics (CIFEr). IEEE, pp 122–129
Vilas A F, Redondo R P D, Crockett K, Owda M, Evans L (2019) Twitter permeability to financial events: an experiment towards a model for sensing irregularities. Multimed Tools Appl 78(7):9217
Wang J, Wang Z, Zhang D, Yan J (2017) Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI, pp 2915–2921
Yang Y, Zheng L, Zhang J, Cui Q, Li Z, Yu PS (2018) TI-CNN: convolutional neural networks for fake news detection. arXiv:1806.00749
Zhang X, LeCun Y (2017) Which encoding is the best for text classification in Chinese, English, Japanese and Korean? arXiv:1708.02657
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Enamoto, L., Weigang, L. & Filho, G.P.R. Generic framework for multilingual short text categorization using convolutional neural network. Multimed Tools Appl 80, 13475–13490 (2021). https://doi.org/10.1007/s11042-020-10314-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10314-9