Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification

Zhang, Duoyi; Nayak, Richi; Bashar, Md Abul

doi:10.1007/978-981-16-8531-6_8

Duoyi Zhang¹²,
Richi Nayak¹² &
Md Abul Bashar¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1504))

Included in the following conference series:

Australasian Conference on Data Mining

1144 Accesses

Abstract

When effectively used in deep learning models for classification, multi-modal data can provide rich and complementary information and can represent complex situations. An essential step in multi-modal classification is data fusion which aims to combine features from multiple modalities into a single joint representation. This study investigates how fusion mechanisms influence multi-modal data classification. We conduct experiments on four social media datasets and evaluate multi-modal models with several classification criteria. The results show that the quality of data and class distribution significantly influence the performance of the fusion strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Learning Joint Multimodal Representation Based on Multi-fusion Deep Neural Networks

Enhancing Sentiment Analysis Accuracy Through Multimodal Data Fusion: A Deep Learning Approach

Deep Multi-modal Learning with Cascade Consensus

References

Abavisani, M., Wu, L., Hu, S., Tetreault, J., Jaimes, A.: Multimodal categorization of crisis events in social media. IEEE (2020)
Google Scholar
Alam, F., Ofli, F., Imran, M.: Crisismmd: multimodal twitter datasets from natural disasters. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)
Google Scholar
Austin, V.B., Hale, S.A.B.: Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate (2021)
Google Scholar
Boididou, C., et al.: Verifying multimedia use at mediaeval (2016)
Google Scholar
Czodrowski, P.: Count on kappa. J. Comput.-Aided Molec. Des. 28, 1049–1055 (2014)
Article Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. IEEE (2009)
Google Scholar
Jin, P., Li, J., Mu, L., Zhou, J., Zhao, J.: Effective sentiment analysis for multimodal review data on the web. In: Qiu, M. (ed.) ICA3PP 2020. LNCS, vol. 12454, pp. 623–638. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60248-2_43
Chapter Google Scholar
Jin, Z., Cao, J., Guo, H., Zhang, Y., Luo, J.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. ACM (2017)
Google Scholar
Khattar, D., Goud, J.S., Gupta, M., Varma, V.: MVAE: multimodal variational autoencoder for fake news detection. ACM (2019)
Google Scholar
Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. Association for Computational Linguistics (2014)
Google Scholar
Kumari, K., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: Towards cyberbullying-free social media in smart cities: a unified multi-modal approach. Soft Comput. 24 (2020)
Google Scholar
Lin, D., Li, L., Cao, D., Lv, Y., Ke, X.: Multi-modality weakly labeled sentiment learning based on explicit emotion signal for Chinese microblog. Neurocomputing 272 (2018)
Google Scholar
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering, pp. 289–297. Curran Associates Inc. (2016)
Google Scholar
Madichetty, S., Muthukumarasamy, S., Jayadev, P.: Multi-modal classification of twitter data during disasters for humanitarian response. J. Ambient Intell. Hum. Comput. 12, 1022–10237 (2021)
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. Association for Computational Linguistics (2014)
Google Scholar
Pranesh, R.R., Shekhar, A., Kumar, A.: Exploring multimodal features and fusion strategies for analyzing disaster tweets (2021)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need, vol. 30. Curran Associates, Inc. (2017)
Google Scholar
Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? IEEE (2020)
Google Scholar
Xu, N., Mao, W.: Multisentinet. ACM (2017)
Google Scholar
Xue, J., Wang, Y., Tian, Y., Li, Y., Shi, L., Wei, L.: Detecting fake news by exploring the consistency of multimodal data. Inf. Process. Manag. 58, 102610 (2021)
Article Google Scholar
Zhang, C., Yang, Z., He, X., Deng, L.: Multimodal intelligence: representation learning, information fusion, and applications. IEEE J. Sel. Topics Signal Process. 14, 478–493 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Queensland University of Technology, Brisbane, Australia
Duoyi Zhang, Richi Nayak & Md Abul Bashar

Authors

Duoyi Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Richi Nayak
View author publications
You can also search for this author in PubMed Google Scholar
Md Abul Bashar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duoyi Zhang .

Editor information

Editors and Affiliations

Queensland University of Technology, Brisbane, QLD, Australia
Yue Xu
Western Sydney University, Parramatta, NSW, Australia
Rosalind Wang
University of Queensland, Herston, Australia
Anton Lord
RMIT University, Melbourne, VIC, Australia
Yee Ling Boo
Queensland University of Technology, Brisbane, QLD, Australia
Richi Nayak
Data61, CSIRO, Canberra, ACT, Australia
Yanchang Zhao
Australian National University, Canberra, ACT, Australia
Graham Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, D., Nayak, R., Bashar, M.A. (2021). Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification. In: Xu, Y., et al. Data Mining. AusDM 2021. Communications in Computer and Information Science, vol 1504. Springer, Singapore. https://doi.org/10.1007/978-981-16-8531-6_8

Download citation

DOI: https://doi.org/10.1007/978-981-16-8531-6_8
Published: 09 December 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8530-9
Online ISBN: 978-981-16-8531-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification