Skip to main content

Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12672))

Included in the following conference series:

Abstract

Social media opens up a great opportunity for policymakers to analyze and understand a large volume of online content for decision-making purposes. People’s opinions and experiences on social media platforms such as Twitter are extremely significant because of its volume, variety, and veracity. However, processing and retrieving useful information from natural language content is very challenging because of its ambiguity and complexity. Recent advances in Natural Language Understanding (NLU)-based techniques more specifically Transformer-based architecture solve sequence-to-sequence modeling tasks while handling long-range dependencies efficiently, and models based on transformers setting new benchmarks in performance across a wide variety of NLU-based tasks. In this paper, we applied transformer-based sequence modeling on short texts’ topic classification from tourist/user-posted tweets. Multiple BERT-like state-of-the-art sequence modeling approaches on topic/target classification tasks are investigated on the Great Barrier Reef tweet dataset and obtained findings can be valuable for researchers working on classification with large data sets and a large number of target classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://huggingface.co/.

References

  1. Alaei, A.R., Becken, S., Stantic, B.: Sentiment analysis in tourism: capitalizing on big data. J. Travel Res. 58(2), 175–191 (2019)

    Article  Google Scholar 

  2. Allan, J.: Introduction to topic detection and tracking. The Information Retrieval Series, vol. 12 (2012)

    Google Scholar 

  3. Becken, S., Connolly, R.M., Chen, J., Stantic, B.: A hybrid is born: integrating collective sensing, citizen science and professional monitoring of the environment. Ecol. Inform. 52, 35–45 (2019)

    Article  Google Scholar 

  4. Becken, S., Stantic, B., Chen, J., Alaei, A., Connolly, R.M.: Monitoring the environment and human sentiment on the great barrier reef: assessing the potential of collective sensing. J. Environ. Manag. 203, 87–97 (2017)

    Google Scholar 

  5. Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI (2011)

    Google Scholar 

  6. Dai, Z., et al.: Crest: cluster-based representation enrichment for short text classification. In: Advances in Knowledge Discovery and Data Mining, pp. 256–267 (2013)

    Google Scholar 

  7. Daume, S., Galaz, V.: “Anyone know what species this is?” - twitter conversations as embryonic citizen science communities. Plos One 11, 1–25 (2016)

    Google Scholar 

  8. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805

  9. Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. CoRR abs/1801.06146 (2018). http://arxiv.org/abs/1801.06146

  10. Hutto, C.J., Gilbert, E.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media, pp. 216–225 (2014)

    Google Scholar 

  11. Kumar, A., Jaiswal, A.: Systematic literature review of sentiment analysis on twitter using soft computing techniques. Concurrency and Computation: Practice and Experience, vol. 32, no. 1 (2019)

    Google Scholar 

  12. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations (2019)

    Google Scholar 

  13. Lee, K., Palsetia, D., Narayanan, R., Patwary, M.M.A., Agrawal, A., Choudhary, A.: Twitter trending topic classification. In: International Conference on Data Mining Workshops, pp. 251–258 (2011)

    Google Scholar 

  14. Liu, Y., et al.: Roberta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). http://arxiv.org/abs/1907.11692

  15. Lodia, L., Tardin, R.: Citizen science contributes to the understanding of the occurrence and distribution of cetaceans in south-eastern brazil - a case study. Ocean Coast. Manag. 158, 45–55 (2018)

    Article  Google Scholar 

  16. Nigam, K., Mccallum, A.K., Thrun, S.: Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2), 103–134 (2000)

    Google Scholar 

  17. Peters, M.E., et al.: Deep contextualized word representations. In: Proceedings of NAACL (2018)

    Google Scholar 

  18. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners (2019)

    Google Scholar 

  19. Ribeiro, F.N., Araújo, M., Gonçalves, P., Benevenuto, F., Gonçalves, M.A.: A benchmark comparison of state-of-the-practice sentiment analysis methods. CoRR abs/1512.01818 (2015)

    Google Scholar 

  20. Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in twitter to improve information filtering. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 841–842 (2010)

    Google Scholar 

  21. Tang, D., Qin, B., Liu, T.: Deep learning for sentiment analysis: successful approaches and future challenges. WIREs Data Min. Knowl. Disc. 5(6), 292–303 (2015)

    Google Scholar 

  22. Vaswani, A., et al.: Attention is all you need. CoRR abs/1706.03762 (2017). http://arxiv.org/abs/1706.03762

  23. Vo, D.T., Ock, C.Y.: Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst. Appl. 42, 1684–1698 (2015). https://doi.org/10.1016/j.eswa.2014.09.031

    Article  Google Scholar 

  24. Yang, Z., Dai, Z., Yang, Y., Carbonell, J.G., Salakhutdinov, R., Le, Q.V.: Xlnet: generalized autoregressive pretraining for language understanding. CoRR abs/1906.08237 (2019). http://arxiv.org/abs/1906.08237

  25. Yüksel, A.E., Türkmen, Y.A., Özgür, A., Altınel, B.: Turkish tweet classification with transformer encoder. In: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pp. 1380–1387. INCOMA Ltd. (2019). https://doi.org/10.26615/978-954-452-056-4_158

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bela Stantic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mandal, R., Chen, J., Becken, S., Stantic, B. (2021). Empirical Study of Tweets Topic Classification Using Transformer-Based Language Models. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73280-6_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73279-0

  • Online ISBN: 978-3-030-73280-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics