Abstract
Twitter is a popular micro-blogging platform that has obtained a lot of reputation in the last few years and offer a diverse source of real-time information about different events, often during mass crises. During any crisis, it is necessary to filter through a huge amount of tweets rapidly to extract incident related information. Different machine learning (ML) algorithms have been used to classify crisis related tweets from non crisis-related ones, and has great importance in constructing an emergency management framework. These algorithms rely heavily on datasets used, and also different hyper-parameters which need to be tuned to provide better performance. Hence, this paper focuses on: (1) different Natural Language Processing (NLP) techniques to make tweets suitable for applying ML algorithms, (2) hyper-parameter tuning of neural networks when used as classifiers on short messages, tweets, (3) comparative analysis of different state-of-the-art ML algorithms (classifiers) which can be applied to categorize crisis-related tweets with a higher accuracy. The experiments have been done on six different crisis related datasets, each approximately consisting of 10,000 tweets. Analysis have shown that Support Vector Machines and Logistic Regression have performed significantly well than Naive Bayes and Neural Networks (NN) with a very high accuracy of 96% (variations seen with different dataset though). With proper hyper-parameter tuning, NN have also showed promising results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aas, K., Eikvil, L.: Text categorisation: a survey. Technical report, Norwegian Computing Center (1999)
Abdelhaq, H., Gertz, M., Sengstock, C.: Spatio-temporal characteristics of bursty words in twitter streams. In: Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 194–203. ACM (2013)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc. (2009)
Brown, C., Noulas, A., Mascolo, C., Blondel, V.: A place-focused model for social networks in cities. In: 2013 International Conference on Social Computing (SocialCom), pp. 75–80. IEEE (2013)
Brynielsson, J., Johansson, F., Westling, A.: Learning to classify emotional content in crisis-related tweets. In: 2013 IEEE International Conference on Intelligence and Security Informatics (ISI), pp. 33–38. IEEE (2013)
Chollet, F.: Keras documentation. Keras.io (2015)
Gelernter, J., Mushegian, N.: Geo-parsing messages from microtext. Trans. GIS 15(6), 753–773 (2011)
Hong, L., Davison, B.D.: Empirical study of topic modeling in twitter. In: Proceedings of the First Workshop on Social Media Analytics, pp. 80–88. ACM (2010)
Hu, X., Manna, S., Truong, B.N.: Product aspect identification: analyzing role of different classifiers. In: 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 202–209. IEEE (2014)
Imran, M., Castillo, C., Diaz, F., Vieweg, S.: Processing social media messages in mass emergency: a survey. ACM Comput. Surv. 47(4), 67:1–67:38 (2015). ISSN 0360-0300. https://doi.org/10.1145/2771588. http://doi.acm.org/10.1145/2771588
Java, A., Song, X., Finin, T., Tseng, B.: Why we twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis, pp. 56–65. ACM (2007)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Springer (1998)
Kim, E., Ihm, H., Myaeng, S.H.: Topic-based place semantics discovered from microblogging text messages. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 561–562. ACM (2014)
Kinsella, S., Murdock, V., O’Hare, N.: I’m eating a sandwich in glasgow: modeling locations with tweets. In: Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents, pp. 61–68. ACM (2011)
Komarek, P., Moore, A.W.: Making logistic regression a core data mining tool with TR-IRLS. In: Fifth IEEE International Conference on Data Mining (ICDM’05), p. 4. IEEE (2005)
Korde, V., Mahender, C.N.: Text classification and classifiers: a survey. Int. J. Artif. Intell. Appl. 3(2), 85 (2012)
Krishnamurthy, B., Gill, P., Arlitt, M.: A few chirps about twitter. In: Proceedings of the First Workshop on Online Social Networks, pp. 19–24. ACM (2008)
Leidner, J.L., Lieberman, M.D.: Detecting geographical references in the form of place names and associated spatial natural language. SIGSPATIAL Spec. 3(2), 5–11 (2011)
Lin, C.X., Zhao, B., Mei, Q., Han, J.: PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 929–938. ACM (2010)
McCallum, A., Nigam, K., et al.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. Citeseer (1998)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Okazaki, M., Matsuo, Y.: Semantic twitter: analyzing tweets for real-time event notification. In: Recent Trends and Developments in Social Software, pp. 63–74. Springer (2010)
Olteanu, A., Castillo, C., Diaz, F., Vieweg, S.: CrisisLex: a lexicon for collecting and filtering microblogged communications in crises. In: ICWSM (2014)
Olteanu, A., Vieweg, S., Castillo, C.: What to expect when the unexpected happens: social media communications across crises. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing, CSCW’15, pp. 994–1009. ACM, New York, NY, USA (2015). https://doi.org/10.1145/2675133.2675242. http://doi.acm.org/10.1145/2675133.2675242
Phelan, O., McCarthy, K., Smyth, B.: Using twitter to recommend real-time topical news. In: Proceedings of the Third ACM Conference on Recommender Systems, pp. 385–388. ACM (2009)
Qu, L., Ifrim, G., Weikum, G.: The bag-of-opinions method for review rating prediction from sparse text patterns. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 913–921. Association for Computational Linguistics (2010)
Ramage, D., Dumais, S.T., Liebling, D.J.: Characterizing microblogs with topic models. ICWSM 10, 1–1 (2010)
Sakaki, T., Okazaki, M., Matsuo, Y.: Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th International Conference on World Wide Web, pp. 851–860. ACM (2010)
Sarker, A., Gonzalez, G.: Portable automatic text classification for adverse drug reaction detection via multi-corpus training. J. Biomed. Inf. 53, (2014). https://doi.org/10.1016/j.jbi.2014.11.002
Schneider, K.M.: A comparison of event models for naive bayes anti-spam e-mail filtering. In: Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics, vol. 1, pp. 307–314. Association for Computational Linguistics (2003)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Starbird, K., Stamberger, J.: Tweak the tweet: leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting (2010)
Uysal, A.K., Gunal, S.: The impact of preprocessing on text classification. Inf. Process. Manag. 50(1), 104–112 (2014)
Vieweg, S., Hughes, A.L., Starbird, K., Palen, L.: Microblogging during two natural hazards events: what twitter may contribute to situational awareness. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 1079–1088. ACM (2010)
Wang, Y., Agichtein, E., Benzi, M.: TM-LDA: efficient online modeling of latent topic transitions in social media. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 123–131. ACM (2012)
Weng, J., Lim, E.P., Jiang, J., He, Q.: Twitterrank: finding topic-sensitive influential twitterers. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 261–270. ACM (2010)
Yin, J., Lampert, A., Cameron, M., Robinson, B., Power, R.: Using social media to enhance emergency situation awareness. IEEE Intell. Syst. 27(6), 52–59 (2012)
Zhang, H., Zhong, G.: Improving short text classification by learning vector representations of both words and hidden topics. Knowl. Based Syst. 102, 76–86 (2016)
Zhang, W., Gelernter, J.: Geocoding location expressions in Twitter messages: a preference learning method. J. Spat. Inf. Sci. 2014(9), 37–70 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Manna, S., Nakai, H. (2020). Comparative Analysis of Different Classifiers on Crisis-Related Tweets: An Elaborate Study. In: Yang, XS., He, XS. (eds) Nature-Inspired Computation in Data Mining and Machine Learning. Studies in Computational Intelligence, vol 855. Springer, Cham. https://doi.org/10.1007/978-3-030-28553-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-28553-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28552-4
Online ISBN: 978-3-030-28553-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)