Abstract
Nowadays, disaster “detection”, based on Twitter tweets, has become an interesting research challenge. As such it has even found its way to a Kaggle competition. In this work, we explore (and compare) multiple classifiers, applied to the data set from that challenge. Moreover, we explore usefulness of different preprocessing approaches. We experimentally establish the most successful pairs, consisting of a preprocessor and a classifier. We also report on initial steps undertaken towards combining results from multiple classifiers into a meta-level one.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ajao, O., Bhowmik, D., Zargari, S.: Fake news identification on twitter with hybrid CNN and RNN models. In: Proceedings of the International Conference on Social Media and Society. SMSociety (2018). https://doi.org/10.1145/3217804.3217917
Ashktorab, Z., Brown, C., Nandi, M., Mellon, C.: Tweedr: mining twitter to inform disaster response. In: Proceedings of the 11th International ISCRAM Conference - University Park, May 2014. http://amulyayadav.com/spring19/pdf/asht.pdf
CrisisLex: Crisislex data set. https://github.com/sajao/CrisisLex/tree/master/data/CrisisLexT26
CrisisNLP: Crisisnlp data set. https://crisisnlp.qcri.org/
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, May 2019. https://arxiv.org/pdf/1810.04805.pdf, arXiv: 1810.04805
Es, S.: Basic EDA, cleaning and glove. https://www.kaggle.com/shahules/basic-eda-cleaning-and-glove
Ganzha, M., Paprzycki, M., Stadnik, J.: Combining information from multiple search engines - preliminary comparison. Inf. Sci. 180(10), 1908–1923 (2010). https://doi.org/10.1016/j.ins.2010.01.010
Google: BERT GitHub. https://github.com/google-research/bert
Kaggle: Kaggle competition leaderboard. https://www.kaggle.com/c/nlp-getting-started/leaderboard
Kaggle: Kaggle competition: real or not? NLP with disaster tweets. Predict which tweets are about real disasters and which ones are not. https://www.kaggle.com/c/nlp-getting-started/overview
Keras: Keras embedding layer. https://keras.io/api/layers/core_layers/embedding/
Keras: Keras library. https://keras.io/
Kursuncu, U., Gaur, M., Lokala, K.T.U., Sheth, A., Arpinar, I.B.: Predictive analysis on Twitter: techniques and applications. Kno.e.sis Center, Wright State University, Jun 2018. https://arxiv.org/pdf/1806.02377.pdf, arXiv: 1806.02377
Ma, G.: Tweets classification with BERT in the field of disaster management. Department of Civil Engineering, Stanford University (2019). https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/custom/15785631.pdf
NLTK: NLTK library. https://www.nltk.org/
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. https://nlp.stanford.edu/projects/glove/
Plakhtiy, M.: Applying machine learning to disaster detection in twitter tweets. https://drive.google.com/file/d/1k2BGDn3t76rQjQIMA2GaRIzSFLXwnEIf/view?usp=sharing
Plakhtiy, M.: Results on google drive. https://docs.google.com/spreadsheets/d/1eP0DdEMxzNLT6ecfdN5Kf5ctK1BSYNgoq1YHylSVDLc/edit?usp=sharing
SkLearn: count vectorizer. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
SkLearn: Scikit-learn library user guide. https://scikit-learn.org/stable/user_guide.html
SkLearn: StratifiedKFold. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
SkLearn: TFIDF vectorizer. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
Stadnik, J., Ganzha, M., Paprzycki, M.: Are many heads better than one - on combining information from multiple internet sources. Intelligent Distributed Computing, Systems and Applications, pp. 177–186 (2008). http://www.ibspan.waw.pl/~paprzyck/mp/cvr/research/agent_papers/IDC_consensus_2008.pdf, https://doi.org/10.1007/978-3-540-85257-5_18
Thanos, K.G., Polydouri, A., Danelakis, A., Kyriazanos, D., Thomopoulos, S.C.: Combined deep learning and traditional NLP approaches for fire burst detection based on twitter posts. In: IntechOpen, April 2019. https://doi.org/10.5772/intechopen.85075
Wikipedia: Logistic regression. https://en.wikipedia.org/wiki/Logistic_regression
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. Carnegie Mellon University, Microsoft Research, Redmond, June 2016. https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf
Zhang, C., Rajendran, A., Abdul-Mageed, M.: Hyperpartisan news detection with attention-based BI-LSTMS. Natural Language Processing Lab, The University of British Columbia (2019). https://www.aclweb.org/anthology/S19-2188.pdf
Zubiaga, A., Hoi, G.W.S., Liakata, M., Procter, R.: PHEME data set. https://figshare.com/articles/PHEME_dataset_of_rumours_and_non-rumours/4010619
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Plakhtiy, M., Ganzha, M., Paprzycki, M. (2020). Comparing Performance of Classifiers Applied to Disaster Detection in Twitter Tweets – Preliminary Considerations. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds) Big Data Analytics. BDA 2020. Lecture Notes in Computer Science(), vol 12581. Springer, Cham. https://doi.org/10.1007/978-3-030-66665-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-66665-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66664-4
Online ISBN: 978-3-030-66665-1
eBook Packages: Computer ScienceComputer Science (R0)