Shot-Based Hybrid Fusion for Movie Genre Classification

Bi, Tianyu; Jarnikov, Dimitri; Lukkien, Johan

doi:10.1007/978-3-031-06427-2_22

Tianyu Bi¹²,
Dimitri Jarnikov¹² &
Johan Lukkien¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13231))

Included in the following conference series:

International Conference on Image Analysis and Processing

2008 Accesses

Abstract

Multi-modal fusion methods for movie genre classification have shown to be superior over their single modality counterparts. However, it is still challenging to design a fusion strategy for real-world scenarios where missing data and weak labeling are common. Considering the heterogeneity in different modalities, most existing works design late fusion strategies that process and train models per modality, and combine the results at the decision level. A major drawback in such strategies is the potential loss of across-modality dependencies, which is important for understanding audiovisual contents. In this paper, we introduce a Shot-based Hybrid Fusion Network (SHFN) for movie genre classification. It consists of single-modal feature fusion networks for video and audio, a multi-modal feature fusion network working on a shot-basis, and finally a late fusion part for video-level decisions. An ablation study indicates the major contribution from video and the performance gain from the additional modality, audio. The experimental results on the LMTD-9 dataset demonstrate the effectiveness of our proposed method in movie genre classification. Our best model outperforms the state-of-the-art method by 5.7% on AUPRC(micro).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-label movie genre classification based on multimodal fusion

Article 15 July 2023

MFMGC: A Multi-modal Data Fusion Model for Movie Genre Classification

A multimodal approach for multi-label movie genre classification

Article 07 November 2020

References

Bi, T., Jarnikov, D., Lukkien, J.: Video representation fusion network for multi-label movie genre classification. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9386–9391. IEEE (2021)
Google Scholar
Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Chu, W.T., Guo, H.J.: Movie genre classification based on poster images with deep neural networks. In: Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes, pp. 39–45 (2017)
Google Scholar
Ertugrul, A.M., Karagoz, P.: Movie genre classification from plot summaries using bidirectional LSTM. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 248–251. IEEE (2018)
Google Scholar
Gadzicki, K., Khamsehashari, R., Zetzsche, C.: Early vs late fusion in multimodal convolutional neural networks. In: 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pp. 1–6. IEEE (2020)
Google Scholar
Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)
Google Scholar
Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surveys (CSUR) 47(3), 1–38 (2015)
Article Google Scholar
Mangai, U.G., Samanta, S., Das, S., Chowdhury, P.R.: A survey of decision fusion and feature fusion strategies for pattern classification. IETE Techn. Rev. 27(4), 293–307 (2010)
Article Google Scholar
Simões, G.S., Wehrmann, J., Barros, R.C., Ruiz, D.D.: Movie genre classification with convolutional neural networks. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 259–266. IEEE (2016)
Google Scholar
Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)
Google Scholar
Wehrmann, J., Barros, R.C.: Convolutions through time for multi-label movie genre classification. In: Proceedings of the Symposium on Applied Computing, pp. 114–119 (2017)
Google Scholar
Wehrmann, J., Barros, R.C.: Movie genre classification: a multi-label approach based on convolutions through time. Appl. Soft Comput. 61, 973–982 (2017)
Article Google Scholar
Wimmer, M., Schuller, B., Arsic, D., Radig, B., Rigoll, G.: Low-level fusion of audio and video feature for multi-modal emotion recognition. In: Proceedings of the 3rd International Conference on Computer Vision Theory and Applications VISAPP, Funchal, Madeira, Portugal, pp. 145–151 (2008)
Google Scholar
Wu, C.H., Lin, J.C., Wei, W.L.: Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans. Sign. Inf. Process. 3 (2014)
Google Scholar
Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2008)
Article Google Scholar
Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Eindhoven University of Technology, P.O. Box 513, 5600 MB, Eindhoven, The Netherlands
Tianyu Bi, Dimitri Jarnikov & Johan Lukkien

Authors

Tianyu Bi
View author publications
You can also search for this author in PubMed Google Scholar
Dimitri Jarnikov
View author publications
You can also search for this author in PubMed Google Scholar
Johan Lukkien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tianyu Bi .

Editor information

Editors and Affiliations

Boston University, Boston, MA, USA
Stan Sclaroff
National Research Council, Lecce, Italy
Cosimo Distante
National Research Council, Lecce, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni M. Farinella
Technische Universität München, Garching, Germany
Federico Tombari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bi, T., Jarnikov, D., Lukkien, J. (2022). Shot-Based Hybrid Fusion for Movie Genre Classification. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13231. Springer, Cham. https://doi.org/10.1007/978-3-031-06427-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-031-06427-2_22
Published: 15 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06426-5
Online ISBN: 978-3-031-06427-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics