Comparing Performance of Classifiers Applied to Disaster Detection in Twitter Tweets – Preliminary Considerations

Plakhtiy, Maryan; Ganzha, Maria; Paprzycki, Marcin

doi:10.1007/978-3-030-66665-1_16

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12581))

Included in the following conference series:

International Conference on Big Data Analytics

986 Accesses

Abstract

Nowadays, disaster “detection”, based on Twitter tweets, has become an interesting research challenge. As such it has even found its way to a Kaggle competition. In this work, we explore (and compare) multiple classifiers, applied to the data set from that challenge. Moreover, we explore usefulness of different preprocessing approaches. We experimentally establish the most successful pairs, consisting of a preprocessor and a classifier. We also report on initial steps undertaken towards combining results from multiple classifiers into a meta-level one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ajao, O., Bhowmik, D., Zargari, S.: Fake news identification on twitter with hybrid CNN and RNN models. In: Proceedings of the International Conference on Social Media and Society. SMSociety (2018). https://doi.org/10.1145/3217804.3217917
Ashktorab, Z., Brown, C., Nandi, M., Mellon, C.: Tweedr: mining twitter to inform disaster response. In: Proceedings of the 11th International ISCRAM Conference - University Park, May 2014. http://amulyayadav.com/spring19/pdf/asht.pdf
CrisisLex: Crisislex data set. https://github.com/sajao/CrisisLex/tree/master/data/CrisisLexT26
CrisisNLP: Crisisnlp data set. https://crisisnlp.qcri.org/
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding, May 2019. https://arxiv.org/pdf/1810.04805.pdf, arXiv: 1810.04805
Es, S.: Basic EDA, cleaning and glove. https://www.kaggle.com/shahules/basic-eda-cleaning-and-glove
Ganzha, M., Paprzycki, M., Stadnik, J.: Combining information from multiple search engines - preliminary comparison. Inf. Sci. 180(10), 1908–1923 (2010). https://doi.org/10.1016/j.ins.2010.01.010
Article Google Scholar
Google: BERT GitHub. https://github.com/google-research/bert
Kaggle: Kaggle competition leaderboard. https://www.kaggle.com/c/nlp-getting-started/leaderboard
Kaggle: Kaggle competition: real or not? NLP with disaster tweets. Predict which tweets are about real disasters and which ones are not. https://www.kaggle.com/c/nlp-getting-started/overview
Keras: Keras embedding layer. https://keras.io/api/layers/core_layers/embedding/
Keras: Keras library. https://keras.io/
Kursuncu, U., Gaur, M., Lokala, K.T.U., Sheth, A., Arpinar, I.B.: Predictive analysis on Twitter: techniques and applications. Kno.e.sis Center, Wright State University, Jun 2018. https://arxiv.org/pdf/1806.02377.pdf, arXiv: 1806.02377
Ma, G.: Tweets classification with BERT in the field of disaster management. Department of Civil Engineering, Stanford University (2019). https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1194/reports/custom/15785631.pdf
NLTK: NLTK library. https://www.nltk.org/
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. https://nlp.stanford.edu/projects/glove/
Plakhtiy, M.: Applying machine learning to disaster detection in twitter tweets. https://drive.google.com/file/d/1k2BGDn3t76rQjQIMA2GaRIzSFLXwnEIf/view?usp=sharing
Plakhtiy, M.: Results on google drive. https://docs.google.com/spreadsheets/d/1eP0DdEMxzNLT6ecfdN5Kf5ctK1BSYNgoq1YHylSVDLc/edit?usp=sharing
SkLearn: count vectorizer. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html
SkLearn: Scikit-learn library user guide. https://scikit-learn.org/stable/user_guide.html
SkLearn: StratifiedKFold. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html
SkLearn: TFIDF vectorizer. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
Stadnik, J., Ganzha, M., Paprzycki, M.: Are many heads better than one - on combining information from multiple internet sources. Intelligent Distributed Computing, Systems and Applications, pp. 177–186 (2008). http://www.ibspan.waw.pl/~paprzyck/mp/cvr/research/agent_papers/IDC_consensus_2008.pdf, https://doi.org/10.1007/978-3-540-85257-5_18
Thanos, K.G., Polydouri, A., Danelakis, A., Kyriazanos, D., Thomopoulos, S.C.: Combined deep learning and traditional NLP approaches for fire burst detection based on twitter posts. In: IntechOpen, April 2019. https://doi.org/10.5772/intechopen.85075
Wikipedia: Logistic regression. https://en.wikipedia.org/wiki/Logistic_regression
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classification. Carnegie Mellon University, Microsoft Research, Redmond, June 2016. https://www.cs.cmu.edu/~./hovy/papers/16HLT-hierarchical-attention-networks.pdf
Zhang, C., Rajendran, A., Abdul-Mageed, M.: Hyperpartisan news detection with attention-based BI-LSTMS. Natural Language Processing Lab, The University of British Columbia (2019). https://www.aclweb.org/anthology/S19-2188.pdf
Zubiaga, A., Hoi, G.W.S., Liakata, M., Procter, R.: PHEME data set. https://figshare.com/articles/PHEME_dataset_of_rumours_and_non-rumours/4010619

Download references

Author information

Authors and Affiliations

Warsaw University of Technology, Warsaw, Poland
Maryan Plakhtiy & Maria Ganzha
Systems Research Institute Polish Academy of Sciences, Warsaw, Poland
Marcin Paprzycki

Authors

Maryan Plakhtiy
View author publications
You can also search for this author in PubMed Google Scholar
Maria Ganzha
View author publications
You can also search for this author in PubMed Google Scholar
Marcin Paprzycki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Paprzycki .

Editor information

Editors and Affiliations

ISAE-ENSMA, Chasseneuil, France
Ladjel Bellatreche
Indraprastha Institute of Information Technology, New Delhi, India
Vikram Goyal
Iwate Prefectural University, Takizawa, Japan
Hamido Fujita
Ashoka University, Sonepat, India
Anirban Mondal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Plakhtiy, M., Ganzha, M., Paprzycki, M. (2020). Comparing Performance of Classifiers Applied to Disaster Detection in Twitter Tweets – Preliminary Considerations. In: Bellatreche, L., Goyal, V., Fujita, H., Mondal, A., Reddy, P.K. (eds) Big Data Analytics. BDA 2020. Lecture Notes in Computer Science(), vol 12581. Springer, Cham. https://doi.org/10.1007/978-3-030-66665-1_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-66665-1_16
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66664-4
Online ISBN: 978-3-030-66665-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics