Skip to main content

Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study

  • Chapter
  • First Online:
Book cover Nature-Inspired Computation in Data Mining and Machine Learning

Part of the book series: Studies in Computational Intelligence ((SCI,volume 855))

Abstract

Twitter is a popular micro-blogging platform that has obtained a lot of reputation in the last few years and offer a diverse source of real-time information about different events, often during mass crises. During any crisis, it is necessary to filter through a huge amount of tweets rapidly to extract incident related information. Different machine learning (ML) algorithms have been used to classify crisis related tweets from non crisis-related ones, and has great importance in constructing an emergency management framework. These algorithms rely heavily on datasets used, and also different hyper-parameters which need to be tuned to provide better performance. Hence, this paper focuses on: (1) different Natural Language Processing (NLP) techniques to make tweets suitable for applying ML algorithms, (2) hyper-parameter tuning of neural networks when used as classifiers on short messages, tweets, (3) comparative analysis of different state-of-the-art ML algorithms (classifiers) which can be applied to categorize crisis-related tweets with a higher accuracy. The experiments have been done on six different crisis related datasets, each approximately consisting of 10,000 tweets. Analysis have shown that Support Vector Machines and Logistic Regression have performed significantly well than Naive Bayes and Neural Networks (NN) with a very high accuracy of 96% (variations seen with different dataset though). With proper hyper-parameter tuning, NN have also showed promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nltk.org/.

  2. 2.

    http://scikit-learn.org/stable/.

  3. 3.

    https://www.tensorflow.org/tutorials/representation/word2vec.

  4. 4.

    https://keras.io/.

  5. 5.

    https://www.tensorflow.org/.

References

  1. Aas, K., Eikvil, L.: Text categorisation: a survey. Technical report, Norwegian Computing Center (1999)

    Google Scholar 

  2. Abdelhaq, H., Gertz, M., Sengstock, C.: Spatio-temporal characteristics of bursty words in twitter streams. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 194–203. ACM (2013)

    Google Scholar 

  3. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. (2009)

    Google Scholar 

  4. Brown, C., Noulas, A., Mascolo, C., Blondel, V.: A place-focused model for social networks in cities. In: 2013 International Conference on Social Computing (SocialCom), pp. 75–80. IEEE (2013)

    Google Scholar 

  5. Brynielsson, J., Johansson, F., Westling, A.: Learning to classify emotional content in crisis-related tweets. In: 2013 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 33–38. IEEE (2013)

    Google Scholar 

  6. Chollet, F.: Keras documentation. Keras.io (2015)

    Google Scholar 

  7. Gelernter, J., Mushegian, N.: Geo-parsing messages from microtext. Trans. GIS 15(6), 753–773 (2011)

    Article  Google Scholar 

  8. Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010)

    Google Scholar 

  9. Hu, X., Manna, S., Truong, B.N.: Product aspect identification: analyzing role of different classifiers. In: 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 202–209. IEEE (2014)

    Google Scholar 

  10. Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Comput. Surv. 47(4), 67:1–67:38 (2015). ISSN 0360-0300. https://doi.org/10.1145/2771588. http://doi.acm.org/10.1145/2771588

    Article  Google Scholar 

  11. Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65. ACM (2007)

    Google Scholar 

  12. Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Springer (1998)

    Google Scholar 

  13. Kim, E., Ihm, H., Myaeng, S.H.: Topic-based place semantics discovered from microblogging text messages. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 561–562. ACM (2014)

    Google Scholar 

  14. Kinsella, S., Murdock, V., O’Hare, N.: I’m eating a sandwich in glasgow: modeling locations with tweets. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 61–68. ACM (2011)

    Google Scholar 

  15. Komarek, P., Moore, A.W.: Making logistic regression a core data mining tool with TR-IRLS. In: Fifth IEEE International Conference on Data Mining (ICDM’05), p. 4. IEEE (2005)

    Google Scholar 

  16. Korde, V., Mahender, C.N.: Text classification and classifiers: a survey. Int. J. Artif. Intell. Appl. 3(2), 85 (2012)

    Google Scholar 

  17. Krishnamurthy, B., Gill, P., Arlitt, M.: A few chirps about twitter. In: Proceedings of the First Workshop on Online Social Networks, pp. 19–24. ACM (2008)

    Google Scholar 

  18. Leidner, J.L., Lieberman, M.D.: Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Spec. 3(2), 5–11 (2011)

    Article  Google Scholar 

  19. Lin, C.X., Zhao, B., Mei, Q., Han, J.: PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 929–938. ACM (2010)

    Google Scholar 

  20. McCallum, A., Nigam, K., et al.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. Citeseer (1998)

    Google Scholar 

  21. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  22. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Google Scholar 

  23. Okazaki, M., Matsuo, Y.: Semantic twitter: analyzing tweets for real-time event notification. In: Recent Trends and Developments in Social Software, pp. 63–74. Springer (2010)

    Google Scholar 

  24. Olteanu, A., Castillo, C., Diaz, F., Vieweg, S.: CrisisLex: a lexicon for collecting and filtering microblogged communications in crises. In: ICWSM (2014)

    Google Scholar 

  25. Olteanu, A., Vieweg, S., Castillo, C.: What to expect when the unexpected happens: social media communications across crises. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW’15, pp. 994–1009. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2675133.2675242. http://doi.acm.org/10.1145/2675133.2675242

  26. Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 385–388. ACM (2009)

    Google Scholar 

  27. Qu, L., Ifrim, G., Weikum, G.: The bag-of-opinions method for review rating prediction from sparse text patterns. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 913–921. Association for Computational Linguistics (2010)

    Google Scholar 

  28. Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10, 1–1 (2010)

    Google Scholar 

  29. Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)

    Google Scholar 

  30. Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inf. 53, (2014). https://doi.org/10.1016/j.jbi.2014.11.002

    Article  Google Scholar 

  31. Schneider, K.M.: A comparison of event models for naive bayes anti-spam e-mail filtering. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 307–314. Association for Computational Linguistics (2003)

    Google Scholar 

  32. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)

    Article  Google Scholar 

  33. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  34. Starbird, K., Stamberger, J.: Tweak the tweet: leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting (2010)

    Google Scholar 

  35. Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)

    Article  Google Scholar 

  36. Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1079–1088. ACM (2010)

    Google Scholar 

  37. Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 123–131. ACM (2012)

    Google Scholar 

  38. Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)

    Google Scholar 

  39. Yin, J., Lampert, A., Cameron, M., Robinson, B., Power, R.: Using social media to enhance emergency situation awareness. IEEE Intell. Syst. 27(6), 52–59 (2012)

    Article  Google Scholar 

  40. Zhang, H., Zhong, G.: Improving short text classification by learning vector representations of both words and hidden topics. Knowl. Based Syst. 102, 76–86 (2016)

    Article  Google Scholar 

  41. Zhang, W., Gelernter, J.: Geocoding location expressions in Twitter messages: a preference learning method. J. Spat. Inf. Sci. 2014(9), 37–70 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sukanya Manna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Manna, S., Nakai, H. (2020). Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study. In: Yang, XS., He, XS. (eds) Nature-Inspired Computation in Data Mining and Machine Learning. Studies in Computational Intelligence, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-030-28553-1_4

Download citation

Publish with us

Policies and ethics