Abstract
During disasters, multimedia content on social media sites offers vital information. Reports of injured or deceased people, infrastructure destruction, and missing or found people are among the types of information exchanged. While several studies have demonstrated the importance of both text and picture content for disaster response, previous research has primarily concentrated on the text modality and not so much success with multi-modality. Latest research in multi-modal classification in disaster related tweets uses comparatively primitive models such as KIMCNN and VGG16. In this research work we have taken this further and utilized state-of-the-art models in both text and image classification to try and improve multi-modal classification of disaster related tweets. The research was conducted on two different classification tasks, first to detect if a tweet is informative or not, second to understand the response needed. The process of multimodal analysis is broken down by incorporating different methods of feature extraction from the textual data corpus and pre-processing the corresponding image corpus, then we use several classification models to train and predict the output and compare their performances while tweaking the parameters to improve the results. Models such as XLNet, BERT and RoBERTa in text classification and ResNet, ResNeXt and DenseNet in image classification were trained and analyzed. Results show that the proposed multimodal architecture outperforms models trained using a single modality (text or image alone). Also, it proves that the newer state-of-the-art models outperform the baseline models by a reasonable margin for both the classification tasks.
Similar content being viewed by others
Code Availability
All the code developed and used in this research are available at: https://github.com/adwaith007/disaster-response-cnn.
References
Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training Of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1. Long and Short Papers), Association for Computational Linguistics, Minneapolis, pp 4171–4186
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, pp 1746–1751
Kumar A, Singh JP, Dwivedi YK, Rana NP (2020) A deep multi-modal neural network for informative twitter content classification during emergencies. Ann Oper Res:1–32
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2020) RoBERTa: A robustly optimized BERT pretraining approach. In: (ICLR 2020). Conference Blind Submission
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st international conference on learning representations, ICLR 2013, workshop track proceedings, Scottsdale, pp 1–12
Ofli F, Alam F, Imran M (2018) CrisisMMD: multimodal twitter datasets from natural disasters. In: International AAAI conference on web and social media, North America, pp 465–473
Ofli F, Alam F, Imran M (2020) Analysis of social media data using multimodal deep learning for disaster response. In: Hughes A, McNeill F, Zobel CW (eds) ISCRAM 2020 Conference proceedings - 17th international conference on information systems for crisis response and management. Virginia Tech, Blacksburg, pp 802–811
Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1. Long Papers) Association for Computational Linguistics, Berlin, pp 1715–1725
Shu X, Qi G-J, Tang J, Wang J (2015) Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: Proceedings of the 23rd ACM international conference on multimedia (MM ’15). Association for Computing Machinery, New York, pp 35–44
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3Rd international conference on learning representations, ICLR 2015, conference track proceedings, San Diego
Singh JP, Dwivedi YK, Rana NP, Kumar A, Kapoor K (2019) Event classification and location prediction from tweets during disasters. Ann Oper Res 283:737–757
Tang J, Shu X, Li Z, Qi G-J, Wang J (2016) Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains. ACM Trans Multimed Comput Commun Appl 12, 4s, Article 68, 22
Xie S, Girshick R, Dollr P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 5987–5995
Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2020) XLNEt: Generalized autoregressive pre-training for language understanding. In: 33Rd conference on neural information processing systems (neurIPS), Vancouver
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Availability of Data and Material
All the datasets used for supporting the conclusions are available at: https://crisisnlp.qcri.org/crisismmd.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Adwaith, D., Abishake, A.K., Raghul, S.V. et al. Enhancing multimodal disaster tweet classification using state-of-the-art deep learning networks. Multimed Tools Appl 81, 18483–18501 (2022). https://doi.org/10.1007/s11042-022-12217-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12217-3