Skip to main content
Log in

Enhancing multimodal disaster tweet classification using state-of-the-art deep learning networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

During disasters, multimedia content on social media sites offers vital information. Reports of injured or deceased people, infrastructure destruction, and missing or found people are among the types of information exchanged. While several studies have demonstrated the importance of both text and picture content for disaster response, previous research has primarily concentrated on the text modality and not so much success with multi-modality. Latest research in multi-modal classification in disaster related tweets uses comparatively primitive models such as KIMCNN and VGG16. In this research work we have taken this further and utilized state-of-the-art models in both text and image classification to try and improve multi-modal classification of disaster related tweets. The research was conducted on two different classification tasks, first to detect if a tweet is informative or not, second to understand the response needed. The process of multimodal analysis is broken down by incorporating different methods of feature extraction from the textual data corpus and pre-processing the corresponding image corpus, then we use several classification models to train and predict the output and compare their performances while tweaking the parameters to improve the results. Models such as XLNet, BERT and RoBERTa in text classification and ResNet, ResNeXt and DenseNet in image classification were trained and analyzed. Results show that the proposed multimodal architecture outperforms models trained using a single modality (text or image alone). Also, it proves that the newer state-of-the-art models outperform the baseline models by a reasonable margin for both the classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Code Availability

All the code developed and used in this research are available at: https://github.com/adwaith007/disaster-response-cnn.

References

  1. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training Of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol 1. Long and Short Papers), Association for Computational Linguistics, Minneapolis, pp 4171–4186

  2. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778

  3. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  4. Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, pp 1746–1751

  5. Kumar A, Singh JP, Dwivedi YK, Rana NP (2020) A deep multi-modal neural network for informative twitter content classification during emergencies. Ann Oper Res:1–32

  6. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2020) RoBERTa: A robustly optimized BERT pretraining approach. In: (ICLR 2020). Conference Blind Submission

  7. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: Bengio Y, LeCun Y (eds) 1st international conference on learning representations, ICLR 2013, workshop track proceedings, Scottsdale, pp 1–12

  8. Ofli F, Alam F, Imran M (2018) CrisisMMD: multimodal twitter datasets from natural disasters. In: International AAAI conference on web and social media, North America, pp 465–473

  9. Ofli F, Alam F, Imran M (2020) Analysis of social media data using multimodal deep learning for disaster response. In: Hughes A, McNeill F, Zobel CW (eds) ISCRAM 2020 Conference proceedings - 17th international conference on information systems for crisis response and management. Virginia Tech, Blacksburg, pp 802–811

  10. Sennrich R, Haddow B, Birch A (2016) Neural machine translation of rare words with subword units. In: Proceedings of the 54th annual meeting of the association for computational linguistics, vol 1. Long Papers) Association for Computational Linguistics, Berlin, pp 1715–1725

  11. Shu X, Qi G-J, Tang J, Wang J (2015) Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In: Proceedings of the 23rd ACM international conference on multimedia (MM ’15). Association for Computing Machinery, New York, pp 35–44

  12. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3Rd international conference on learning representations, ICLR 2015, conference track proceedings, San Diego

  13. Singh JP, Dwivedi YK, Rana NP, Kumar A, Kapoor K (2019) Event classification and location prediction from tweets during disasters. Ann Oper Res 283:737–757

    Article  Google Scholar 

  14. Tang J, Shu X, Li Z, Qi G-J, Wang J (2016) Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains. ACM Trans Multimed Comput Commun Appl 12, 4s, Article 68, 22

  15. Xie S, Girshick R, Dollr P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 5987–5995

  16. Yang Z, Dai Z, Yang Y, Carbonell J, Salakhutdinov R, Le QV (2020) XLNEt: Generalized autoregressive pre-training for language understanding. In: 33Rd conference on neural information processing systems (neurIPS), Vancouver

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Divakaran Adwaith.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Additional information

Availability of Data and Material

All the datasets used for supporting the conclusions are available at: https://crisisnlp.qcri.org/crisismmd.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adwaith, D., Abishake, A.K., Raghul, S.V. et al. Enhancing multimodal disaster tweet classification using state-of-the-art deep learning networks. Multimed Tools Appl 81, 18483–18501 (2022). https://doi.org/10.1007/s11042-022-12217-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12217-3

Keywords

Navigation