Abstract
When effectively used in deep learning models for classification, multi-modal data can provide rich and complementary information and can represent complex situations. An essential step in multi-modal classification is data fusion which aims to combine features from multiple modalities into a single joint representation. This study investigates how fusion mechanisms influence multi-modal data classification. We conduct experiments on four social media datasets and evaluate multi-modal models with several classification criteria. The results show that the quality of data and class distribution significantly influence the performance of the fusion strategies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abavisani, M., Wu, L., Hu, S., Tetreault, J., Jaimes, A.: Multimodal categorization of crisis events in social media. IEEE (2020)
Alam, F., Ofli, F., Imran, M.: Crisismmd: multimodal twitter datasets from natural disasters. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)
Austin, V.B., Hale, S.A.B.: Deciphering implicit hate: evaluating automated detection algorithms for multimodal hate (2021)
Boididou, C., et al.: Verifying multimedia use at mediaeval (2016)
Czodrowski, P.: Count on kappa. J. Comput.-Aided Molec. Des. 28, 1049–1055 (2014)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. IEEE (2009)
Jin, P., Li, J., Mu, L., Zhou, J., Zhao, J.: Effective sentiment analysis for multimodal review data on the web. In: Qiu, M. (ed.) ICA3PP 2020. LNCS, vol. 12454, pp. 623–638. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60248-2_43
Jin, Z., Cao, J., Guo, H., Zhang, Y., Luo, J.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. ACM (2017)
Khattar, D., Goud, J.S., Gupta, M., Varma, V.: MVAE: multimodal variational autoencoder for fake news detection. ACM (2019)
Kiela, D., Grave, E., Joulin, A., Mikolov, T.: Efficient large-scale multi-modal classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Kim, Y.: Convolutional neural networks for sentence classification. Association for Computational Linguistics (2014)
Kumari, K., Singh, J.P., Dwivedi, Y.K., Rana, N.P.: Towards cyberbullying-free social media in smart cities: a unified multi-modal approach. Soft Comput. 24 (2020)
Lin, D., Li, L., Cao, D., Lv, Y., Ke, X.: Multi-modality weakly labeled sentiment learning based on explicit emotion signal for Chinese microblog. Neurocomputing 272 (2018)
Lu, J., Yang, J., Batra, D., Parikh, D.: Hierarchical question-image co-attention for visual question answering, pp. 289–297. Curran Associates Inc. (2016)
Madichetty, S., Muthukumarasamy, S., Jayadev, P.: Multi-modal classification of twitter data during disasters for humanitarian response. J. Ambient Intell. Hum. Comput. 12, 1022–10237 (2021)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. Association for Computational Linguistics (2014)
Pranesh, R.R., Shekhar, A., Kumar, A.: Exploring multimodal features and fusion strategies for analyzing disaster tweets (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Vaswani, A., et al.: Attention is all you need, vol. 30. Curran Associates, Inc. (2017)
Wang, W., Tran, D., Feiszli, M.: What makes training multi-modal classification networks hard? IEEE (2020)
Xu, N., Mao, W.: Multisentinet. ACM (2017)
Xue, J., Wang, Y., Tian, Y., Li, Y., Shi, L., Wei, L.: Detecting fake news by exploring the consistency of multimodal data. Inf. Process. Manag. 58, 102610 (2021)
Zhang, C., Yang, Z., He, X., Deng, L.: Multimodal intelligence: representation learning, information fusion, and applications. IEEE J. Sel. Topics Signal Process. 14, 478–493 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, D., Nayak, R., Bashar, M.A. (2021). Exploring Fusion Strategies in Deep Learning Models for Multi-Modal Classification. In: Xu, Y., et al. Data Mining. AusDM 2021. Communications in Computer and Information Science, vol 1504. Springer, Singapore. https://doi.org/10.1007/978-981-16-8531-6_8
Download citation
DOI: https://doi.org/10.1007/978-981-16-8531-6_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-8530-9
Online ISBN: 978-981-16-8531-6
eBook Packages: Computer ScienceComputer Science (R0)