Skip to main content

Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification

  • Conference paper
  • First Online:
Data Mining (AusDM 2021)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1504))

Included in the following conference series:

  • 1144 Accesses

Abstract

When effectively used in deep learning models for classification, multi-modal data can provide rich and complementary information and can represent complex situations. An essential step in multi-modal classification is data fusion which aims to combine features from multiple modalities into a single joint representation. This study investigates how fusion mechanisms influence multi-modal data classification. We conduct experiments on four social media datasets and evaluate multi-modal models with several classification criteria. The results show that the quality of data and class distribution significantly influence the performance of the fusion strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abavisani, M., Wu, L., Hu, S., Tetreault, J., Jaimes, A.: Multimodal categorization of crisis events in social media. IEEE (2020)

    Google Scholar 

  2. Alam, F., Ofli, F., Imran, M.: Crisismmd: multimodal twitter datasets from natural disasters. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)

    Google Scholar 

  3. Austin, V.B., Hale, S.A.B.: Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate (2021)

    Google Scholar 

  4. Boididou, C., et al.: Verifying multimedia use at mediaeval (2016)

    Google Scholar 

  5. Czodrowski, P.: Count on kappa. J. Comput.-Aided Molec. Des. 28, 1049–1055 (2014)

    Article  Google Scholar 

  6. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. IEEE (2009)

    Google Scholar 

  7. Jin, P., Li, J., Mu, L., Zhou, J., Zhao, J.: Effective sentiment analysis for multimodal review data on the web. In: Qiu, M. (ed.) ICA3PP 2020. LNCS, vol. 12454, pp. 623–638. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60248-2_43

    Chapter  Google Scholar 

  8. Jin, Z., Cao, J., Guo, H., Zhang, Y., Luo, J.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. ACM (2017)

    Google Scholar 

  9. Khattar, D., Goud, J.S., Gupta, M., Varma, V.: MVAE: multimodal variational autoencoder for fake news detection. ACM (2019)

    Google Scholar 

  10. Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  11. Kim, Y.: Convolutional neural networks for sentence classification. Association for Computational Linguistics (2014)

    Google Scholar 

  12. Kumari, K., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: Towards cyberbullying-free social media in smart cities: a unified multi-modal approach. Soft Comput. 24 (2020)

    Google Scholar 

  13. Lin, D., Li, L., Cao, D., Lv, Y., Ke, X.: Multi-modality weakly labeled sentiment learning based on explicit emotion signal for Chinese microblog. Neurocomputing 272 (2018)

    Google Scholar 

  14. Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering, pp. 289–297. Curran Associates Inc. (2016)

    Google Scholar 

  15. Madichetty, S., Muthukumarasamy, S., Jayadev, P.: Multi-modal classification of twitter data during disasters for humanitarian response. J. Ambient Intell. Hum. Comput. 12, 1022–10237 (2021)

    Article  Google Scholar 

  16. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. Association for Computational Linguistics (2014)

    Google Scholar 

  17. Pranesh, R.R., Shekhar, A., Kumar, A.: Exploring multimodal features and fusion strategies for analyzing disaster tweets (2021)

    Google Scholar 

  18. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)

    Google Scholar 

  19. Vaswani, A., et al.: Attention is all you need, vol. 30. Curran Associates, Inc. (2017)

    Google Scholar 

  20. Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? IEEE (2020)

    Google Scholar 

  21. Xu, N., Mao, W.: Multisentinet. ACM (2017)

    Google Scholar 

  22. Xue, J., Wang, Y., Tian, Y., Li, Y., Shi, L., Wei, L.: Detecting fake news by exploring the consistency of multimodal data. Inf. Process. Manag. 58, 102610 (2021)

    Article  Google Scholar 

  23. Zhang, C., Yang, Z., He, X., Deng, L.: Multimodal intelligence: representation learning, information fusion, and applications. IEEE J. Sel. Topics Signal Process. 14, 478–493 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duoyi Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, D., Nayak, R., Bashar, M.A. (2021). Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification. In: Xu, Y., et al. Data Mining. AusDM 2021. Communications in Computer and Information Science, vol 1504. Springer, Singapore. https://doi.org/10.1007/978-981-16-8531-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-8531-6_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-8530-9

  • Online ISBN: 978-981-16-8531-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics