Generic framework for multilingual short text categorization using convolutional neural network

Enamoto, Liriam; Weigang, Li; Filho, Geraldo P. Rocha

doi:10.1007/s11042-020-10314-9

Generic framework for multilingual short text categorization using convolutional neural network

Published: 15 January 2021

Volume 80, pages 13475–13490, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Liriam Enamoto¹,
Li Weigang¹ &
Geraldo P. Rocha Filho¹

448 Accesses
9 Citations
Explore all metrics

Abstract

Online social media is a powerful source of information that can influence users’ decisions. Due to the huge volume of data generated by such media, many researches have been done to automate text categorization. However, finding useful information to satisfy user’s needs is not an easy task. There are many challenges to overcome especially in short text categorization that in addition to being a time-consuming and costly process, short messages have misspellings, typos, irony words and lack of context. To solve these challenges, this article proposes GM-ShorT, a Generic framework for Multilingual Short Text Categorization based on Convolutional Neural Network (CNN). For this, GM-ShorT collects online social media data. Such data were used as input to CNN that is combined with a word embedding mechanism to categorize short text messages. We explored several architectures for CNN and show that GM-ShorT can be used in multilingual Short text categorization with an accuracy of 13.58% higher when compared to other classical approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Article 07 January 2021

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

Sentiment analysis using deep learning architectures: a review

Article 02 December 2019

Notes

References

Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3:1137
MATH Google Scholar
Caragea C, Silvescu A, Tapia AH (2016) Identifying informative messages in disaster events using convolutional neural networks. In: International conference on information systems for crisis response and management, pp 137–147
Georgakopoulos SV, Tasoulis SK, Vrahatis AG, Plagianakos VP (2018) Convolutional neural networks for toxic comment classification. In: Proceedings of the 10th hellenic conference on artificial intelligence, pp 1–6
Geraldo Filho P, Villas L A, Gonçalves V P, Pessin G, Loureiro A A, Ueyama J (2019) Energy-efficient smart home systems: infrastructure and decision-making process. Internet Things 5:153
Article Google Scholar
Hartmann N, Fonseca E, Shulby C, Treviso M, Rodrigues J, Aluisio S (2017) Portuguese word embeddings: evaluating on word analogies and natural language tasks. arXiv:1708.06025
Johnson R, Zhang T (2014) Effective use of word order for text categorization with convolutional neural networks. arXiv:1412.1058
Kim Y (2014) Convolutional neural networks for sentence classification. arXiv:1408.5882
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. In: Twenty-ninth AAAI conference on artificial intelligence
Lu Y, Sakamoto K, Shibuki H, Mori T (2017) Construction of a multilingual annotated corpus for deeper sentiment understanding in social media. Inf Media Technol 12:111
Google Scholar
Lu Y, Sakamoto K, Shibuki H, Mori T (2017) Are deep learning methods better for twitter sentiment analysis. In: Proceedings of the 23rd annual meeting of natural language processing (Japan), pp 787–790
Mandelbaum A, Shalev A (2016) Word embeddings and their use in sentence classification tasks. arXiv:1610.08229
Merchant R M, Elmer S, Lurie N (2011) Integrating social media into emergency-preparedness efforts. New Engl J Med 365(4):289
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Neto J, Filho G, Mano L, Ueyama J (2018) Verbo: voice emotion recognition database in Portuguese language. J Comput Sci 14(11):1420
Article Google Scholar
Nguyen D T, Joty S, Imran M, Sajjad H, Mitra P (2016) Applications of online deep learning for crisis response using social media information. arXiv:1610.01030
Oliveira D F, Chan K S (2019) The effects of trust and influence on the spreading of low and high quality information. Phys A: Stat Mech Appl 525:657
Article Google Scholar
Rocha Filho G P, Meneguette R I, Maia G, Pessin G, Gonçalves V P, Weigang L, Ueyama J, Villas L A (2020) A fog-enabled smart home solution for decision-making using smart objects. Future Gener Comput Syst 103:18
Article Google Scholar
Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes Twitter users: real-time event detection by social sensors. ACM, pp 851–860, DOI Proceedings of the 19th international conference on World wide web, (to appear in print)
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1
Article Google Scholar
Simon T, Goldberg A, Adini B (2015) Socializing in emergencies—a review of the use of social media in emergency situations. Int J Inf Manag 35(5):609
Article Google Scholar
Sosa P M, Sadigh S (2016) Twitter sentiment analysis with neural networks. Academia. edu
Steiner-Correa F, Viedma-del Jesus M I, Lopez-Herrera A (2018) A survey of multilingual human-tagged short message datasets for sentiment analysis tasks. Soft Comput 22(24):8227
Article Google Scholar
Sun F, Belatreche A, Coleman S, McGinnity TM, Li Y (2014) Pre-processing online financial text for sentiment classification: a natural language processing approach. In: 2014 IEEE conference on computational intelligence for financial engineering & economics (CIFEr). IEEE, pp 122–129
Vilas A F, Redondo R P D, Crockett K, Owda M, Evans L (2019) Twitter permeability to financial events: an experiment towards a model for sensing irregularities. Multimed Tools Appl 78(7):9217
Article Google Scholar
Wang J, Wang Z, Zhang D, Yan J (2017) Combining knowledge with deep convolutional neural networks for short text classification. In: IJCAI, pp 2915–2921
Yang Y, Zheng L, Zhang J, Cui Q, Li Z, Yu PS (2018) TI-CNN: convolutional neural networks for fake news detection. arXiv:1806.00749
Zhang X, LeCun Y (2017) Which encoding is the best for text classification in Chinese, English, Japanese and Korean? arXiv:1708.02657
Zhang Y, Wallace B (2015) A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv:1510.03820
Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Brasília, Brasília, DF, Brazil
Liriam Enamoto, Li Weigang & Geraldo P. Rocha Filho

Authors

Liriam Enamoto
View author publications
You can also search for this author in PubMed Google Scholar
Li Weigang
View author publications
You can also search for this author in PubMed Google Scholar
Geraldo P. Rocha Filho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liriam Enamoto.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Enamoto, L., Weigang, L. & Filho, G.P.R. Generic framework for multilingual short text categorization using convolutional neural network. Multimed Tools Appl 80, 13475–13490 (2021). https://doi.org/10.1007/s11042-020-10314-9

Download citation

Received: 21 November 2019
Revised: 26 August 2020
Accepted: 22 December 2020
Published: 15 January 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11042-020-10314-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generic framework for multilingual short text categorization using convolutional neural network

Abstract

Access this article

Similar content being viewed by others

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Impact of word embedding models on text analytics in deep learning environment: a review

Sentiment analysis using deep learning architectures: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generic framework for multilingual short text categorization using convolutional neural network

Abstract

Access this article

Similar content being viewed by others

FakeBERT: Fake news detection in social media with a BERT-based deep learning approach

Impact of word embedding models on text analytics in deep learning environment: a review

Sentiment analysis using deep learning architectures: a review

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation