Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models

Mandal, Ranju; Chen, Jinyan; Becken, Susanne; Stantic, Bela

doi:10.1007/978-3-030-73280-6_27

Ranju Mandal¹²,
Jinyan Chen¹³,
Susanne Becken¹³ &
…
Bela Stantic¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12672))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1873 Accesses
1 Citations

Abstract

Social media opens up a great opportunity for policymakers to analyze and understand a large volume of online content for decision-making purposes. People’s opinions and experiences on social media platforms such as Twitter are extremely significant because of its volume, variety, and veracity. However, processing and retrieving useful information from natural language content is very challenging because of its ambiguity and complexity. Recent advances in Natural Language Understanding (NLU)-based techniques more specifically Transformer-based architecture solve sequence-to-sequence modeling tasks while handling long-range dependencies efficiently, and models based on transformers setting new benchmarks in performance across a wide variety of NLU-based tasks. In this paper, we applied transformer-based sequence modeling on short texts’ topic classification from tourist/user-posted tweets. Multiple BERT-like state-of-the-art sequence modeling approaches on topic/target classification tasks are investigated on the Great Barrier Reef tweet dataset and obtained findings can be valuable for researchers working on classification with large data sets and a large number of target classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://huggingface.co/.

References

Alaei, A.R., Becken, S., Stantic, B.: Sentiment analysis in tourism: capitalizing on big data. J. Travel Res. 58(2), 175–191 (2019)
Article Google Scholar
Allan, J.: Introduction to topic detection and tracking. The Information Retrieval Series, vol. 12 (2012)
Google Scholar
Becken, S., Connolly, R.M., Chen, J., Stantic, B.: A hybrid is born: integrating collective sensing, citizen science and professional monitoring of the environment. Ecol. Inform. 52, 35–45 (2019)
Article Google Scholar
Becken, S., Stantic, B., Chen, J., Alaei, A., Connolly, R.M.: Monitoring the environment and human sentiment on the great barrier reef: assessing the potential of collective sensing. J. Environ. Manag. 203, 87–97 (2017)
Google Scholar
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI (2011)
Google Scholar
Dai, Z., et al.: Crest: cluster-based representation enrichment for short text classification. In: Advances in Knowledge Discovery and Data Mining, pp. 256–267 (2013)
Google Scholar
Daume, S., Galaz, V.: “Anyone know what species this is?” - twitter conversations as embryonic citizen science communities. Plos One 11, 1–25 (2016)
Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. CoRR abs/1801.06146 (2018). http://arxiv.org/abs/1801.06146
Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)
Google Scholar
Kumar, A., Jaiswal, A.: Systematic literature review of sentiment analysis on twitter using soft computing techniques. Concurrency and Computation: Practice and Experience, vol. 32, no. 1 (2019)
Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations (2019)
Google Scholar
Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: International Conference on Data Mining Workshops, pp. 251–258 (2011)
Google Scholar
Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692
Lodia, L., Tardin, R.: Citizen science contributes to the understanding of the occurrence and distribution of cetaceans in south-eastern brazil - a case study. Ocean Coast. Manag. 158, 45–55 (2018)
Article Google Scholar
Nigam, K., Mccallum, A.K., Thrun, S.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2), 103–134 (2000)
Google Scholar
Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)
Google Scholar
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)
Google Scholar
Ribeiro, F.N., Araújo, M., Gonçalves, P., Benevenuto, F., Gonçalves, M.A.: A benchmark comparison of state-of-the-practice sentiment analysis methods. CoRR abs/1512.01818 (2015)
Google Scholar
Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842 (2010)
Google Scholar
Tang, D., Qin, B., Liu, T.: Deep learning for sentiment analysis: successful approaches and future challenges. WIREs Data Min. Knowl. Disc. 5(6), 292–303 (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762
Vo, D.T., Ock, C.Y.: Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst. Appl. 42, 1684–1698 (2015). https://doi.org/10.1016/j.eswa.2014.09.031
Article Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019). http://arxiv.org/abs/1906.08237
Yüksel, A.E., Türkmen, Y.A., Özgür, A., Altınel, B.: Turkish tweet classification with transformer encoder. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 1380–1387. INCOMA Ltd. (2019). https://doi.org/10.26615/978-954-452-056-4_158

Download references

Author information

Authors and Affiliations

School of Information and Communication Technology, Griffith University, Brisbane, Australia
Ranju Mandal & Bela Stantic
Griffith Institute for Tourism, Griffith University, Brisbane, Australia
Jinyan Chen & Susanne Becken

Authors

Ranju Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Jinyan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Susanne Becken
View author publications
You can also search for this author in PubMed Google Scholar
Bela Stantic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bela Stantic .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Suphamit Chittayasothorn
Nanyang Technological University, Singapore, Singapore
Dusit Niyato
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mandal, R., Chen, J., Becken, S., Stantic, B. (2021). Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-73280-6_27
Published: 05 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics