Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study

Manna, Sukanya; Nakai, Haruto

doi:10.1007/978-3-030-28553-1_4

Sukanya Manna⁴ &
Haruto Nakai⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 855))

1006 Accesses
4 Citations

Abstract

Twitter is a popular micro-blogging platform that has obtained a lot of reputation in the last few years and offer a diverse source of real-time information about different events, often during mass crises. During any crisis, it is necessary to filter through a huge amount of tweets rapidly to extract incident related information. Different machine learning (ML) algorithms have been used to classify crisis related tweets from non crisis-related ones, and has great importance in constructing an emergency management framework. These algorithms rely heavily on datasets used, and also different hyper-parameters which need to be tuned to provide better performance. Hence, this paper focuses on: (1) different Natural Language Processing (NLP) techniques to make tweets suitable for applying ML algorithms, (2) hyper-parameter tuning of neural networks when used as classifiers on short messages, tweets, (3) comparative analysis of different state-of-the-art ML algorithms (classifiers) which can be applied to categorize crisis-related tweets with a higher accuracy. The experiments have been done on six different crisis related datasets, each approximately consisting of 10,000 tweets. Analysis have shown that Support Vector Machines and Logistic Regression have performed significantly well than Naive Bayes and Neural Networks (NN) with a very high accuracy of 96% (variations seen with different dataset though). With proper hyper-parameter tuning, NN have also showed promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Aas, K., Eikvil, L.: Text categorisation: a survey. Technical report, Norwegian Computing Center (1999)
Google Scholar
Abdelhaq, H., Gertz, M., Sengstock, C.: Spatio-temporal characteristics of bursty words in twitter streams. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 194–203. ACM (2013)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. (2009)
Google Scholar
Brown, C., Noulas, A., Mascolo, C., Blondel, V.: A place-focused model for social networks in cities. In: 2013 International Conference on Social Computing (SocialCom), pp. 75–80. IEEE (2013)
Google Scholar
Brynielsson, J., Johansson, F., Westling, A.: Learning to classify emotional content in crisis-related tweets. In: 2013 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 33–38. IEEE (2013)
Google Scholar
Chollet, F.: Keras documentation. Keras.io (2015)
Google Scholar
Gelernter, J., Mushegian, N.: Geo-parsing messages from microtext. Trans. GIS 15(6), 753–773 (2011)
Article Google Scholar
Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010)
Google Scholar
Hu, X., Manna, S., Truong, B.N.: Product aspect identification: analyzing role of different classifiers. In: 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 202–209. IEEE (2014)
Google Scholar
Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Comput. Surv. 47(4), 67:1–67:38 (2015). ISSN 0360-0300. https://doi.org/10.1145/2771588. http://doi.acm.org/10.1145/2771588
Article Google Scholar
Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65. ACM (2007)
Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Springer (1998)
Google Scholar
Kim, E., Ihm, H., Myaeng, S.H.: Topic-based place semantics discovered from microblogging text messages. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 561–562. ACM (2014)
Google Scholar
Kinsella, S., Murdock, V., O’Hare, N.: I’m eating a sandwich in glasgow: modeling locations with tweets. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 61–68. ACM (2011)
Google Scholar
Komarek, P., Moore, A.W.: Making logistic regression a core data mining tool with TR-IRLS. In: Fifth IEEE International Conference on Data Mining (ICDM’05), p. 4. IEEE (2005)
Google Scholar
Korde, V., Mahender, C.N.: Text classification and classifiers: a survey. Int. J. Artif. Intell. Appl. 3(2), 85 (2012)
Google Scholar
Krishnamurthy, B., Gill, P., Arlitt, M.: A few chirps about twitter. In: Proceedings of the First Workshop on Online Social Networks, pp. 19–24. ACM (2008)
Google Scholar
Leidner, J.L., Lieberman, M.D.: Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Spec. 3(2), 5–11 (2011)
Article Google Scholar
Lin, C.X., Zhao, B., Mei, Q., Han, J.: PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 929–938. ACM (2010)
Google Scholar
McCallum, A., Nigam, K., et al.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. Citeseer (1998)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Google Scholar
Okazaki, M., Matsuo, Y.: Semantic twitter: analyzing tweets for real-time event notification. In: Recent Trends and Developments in Social Software, pp. 63–74. Springer (2010)
Google Scholar
Olteanu, A., Castillo, C., Diaz, F., Vieweg, S.: CrisisLex: a lexicon for collecting and filtering microblogged communications in crises. In: ICWSM (2014)
Google Scholar
Olteanu, A., Vieweg, S., Castillo, C.: What to expect when the unexpected happens: social media communications across crises. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW’15, pp. 994–1009. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2675133.2675242. http://doi.acm.org/10.1145/2675133.2675242
Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 385–388. ACM (2009)
Google Scholar
Qu, L., Ifrim, G., Weikum, G.: The bag-of-opinions method for review rating prediction from sparse text patterns. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 913–921. Association for Computational Linguistics (2010)
Google Scholar
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10, 1–1 (2010)
Google Scholar
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)
Google Scholar
Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inf. 53, (2014). https://doi.org/10.1016/j.jbi.2014.11.002
Article Google Scholar
Schneider, K.M.: A comparison of event models for naive bayes anti-spam e-mail filtering. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 307–314. Association for Computational Linguistics (2003)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Article Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Starbird, K., Stamberger, J.: Tweak the tweet: leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting (2010)
Google Scholar
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)
Article Google Scholar
Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1079–1088. ACM (2010)
Google Scholar
Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 123–131. ACM (2012)
Google Scholar
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)
Google Scholar
Yin, J., Lampert, A., Cameron, M., Robinson, B., Power, R.: Using social media to enhance emergency situation awareness. IEEE Intell. Syst. 27(6), 52–59 (2012)
Article Google Scholar
Zhang, H., Zhong, G.: Improving short text classification by learning vector representations of both words and hidden topics. Knowl. Based Syst. 102, 76–86 (2016)
Article Google Scholar
Zhang, W., Gelernter, J.: Geocoding location expressions in Twitter messages: a preference learning method. J. Spat. Inf. Sci. 2014(9), 37–70 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Santa Clara University, Santa Clara, CA, 95053, USA
Sukanya Manna & Haruto Nakai

Authors

Sukanya Manna
View author publications
You can also search for this author in PubMed Google Scholar
Haruto Nakai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sukanya Manna .

Editor information

Editors and Affiliations

School of Science and Technology, Middlesex University, London, UK
Xin-She Yang
College of Science, Xi’an Polytechnic University, Xi’an, China
Xing-Shi He

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Manna, S., Nakai, H. (2020). Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study. In: Yang, XS., He, XS. (eds) Nature-Inspired Computation in Data Mining and Machine Learning. Studies in Computational Intelligence, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-030-28553-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-28553-1_4
Published: 04 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28552-4
Online ISBN: 978-3-030-28553-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study