Skip to main content

Shot-Based Hybrid Fusion for Movie Genre Classification

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2022 (ICIAP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13231))

Included in the following conference series:

  • 2008 Accesses

Abstract

Multi-modal fusion methods for movie genre classification have shown to be superior over their single modality counterparts. However, it is still challenging to design a fusion strategy for real-world scenarios where missing data and weak labeling are common. Considering the heterogeneity in different modalities, most existing works design late fusion strategies that process and train models per modality, and combine the results at the decision level. A major drawback in such strategies is the potential loss of across-modality dependencies, which is important for understanding audiovisual contents. In this paper, we introduce a Shot-based Hybrid Fusion Network (SHFN) for movie genre classification. It consists of single-modal feature fusion networks for video and audio, a multi-modal feature fusion network working on a shot-basis, and finally a late fusion part for video-level decisions. An ablation study indicates the major contribution from video and the performance gain from the additional modality, audio. The experimental results on the LMTD-9 dataset demonstrate the effectiveness of our proposed method in movie genre classification. Our best model outperforms the state-of-the-art method by 5.7% on AUPRC(micro).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bi, T., Jarnikov, D., Lukkien, J.: Video representation fusion network for multi-label movie genre classification. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9386–9391. IEEE (2021)

    Google Scholar 

  2. Carreira, J., Zisserman, A.: Quo Vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)

    Google Scholar 

  3. Chu, W.T., Guo, H.J.: Movie genre classification based on poster images with deep neural networks. In: Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes, pp. 39–45 (2017)

    Google Scholar 

  4. Ertugrul, A.M., Karagoz, P.: Movie genre classification from plot summaries using bidirectional LSTM. In: 2018 IEEE 12th International Conference on Semantic Computing (ICSC), pp. 248–251. IEEE (2018)

    Google Scholar 

  5. Gadzicki, K., Khamsehashari, R., Zetzsche, C.: Early vs late fusion in multimodal convolutional neural networks. In: 2020 IEEE 23rd International Conference on Information Fusion (FUSION), pp. 1–6. IEEE (2020)

    Google Scholar 

  6. Gemmeke, J.F., et al.: Audio set: an ontology and human-labeled dataset for audio events. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776–780. IEEE (2017)

    Google Scholar 

  7. Gibaja, E., Ventura, S.: A tutorial on multilabel learning. ACM Comput. Surveys (CSUR) 47(3), 1–38 (2015)

    Article  Google Scholar 

  8. Mangai, U.G., Samanta, S., Das, S., Chowdhury, P.R.: A survey of decision fusion and feature fusion strategies for pattern classification. IETE Techn. Rev. 27(4), 293–307 (2010)

    Article  Google Scholar 

  9. Simões, G.S., Wehrmann, J., Barros, R.C., Ruiz, D.D.: Movie genre classification with convolutional neural networks. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 259–266. IEEE (2016)

    Google Scholar 

  10. Wang, J., Yang, Y., Mao, J., Huang, Z., Huang, C., Xu, W.: CNN-RNN: a unified framework for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2285–2294 (2016)

    Google Scholar 

  11. Wehrmann, J., Barros, R.C.: Convolutions through time for multi-label movie genre classification. In: Proceedings of the Symposium on Applied Computing, pp. 114–119 (2017)

    Google Scholar 

  12. Wehrmann, J., Barros, R.C.: Movie genre classification: a multi-label approach based on convolutions through time. Appl. Soft Comput. 61, 973–982 (2017)

    Article  Google Scholar 

  13. Wimmer, M., Schuller, B., Arsic, D., Radig, B., Rigoll, G.: Low-level fusion of audio and video feature for multi-modal emotion recognition. In: Proceedings of the 3rd International Conference on Computer Vision Theory and Applications VISAPP, Funchal, Madeira, Portugal, pp. 145–151 (2008)

    Google Scholar 

  14. Wu, C.H., Lin, J.C., Wei, W.L.: Survey on audiovisual emotion recognition: databases, features, and data fusion strategies. APSIPA Trans. Sign. Inf. Process. 3 (2014)

    Google Scholar 

  15. Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans. Pattern Anal. Mach. Intell. 31(1), 39–58 (2008)

    Article  Google Scholar 

  16. Zhou, Z.H.: A brief introduction to weakly supervised learning. Natl. Sci. Rev. 5(1), 44–53 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianyu Bi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bi, T., Jarnikov, D., Lukkien, J. (2022). Shot-Based Hybrid Fusion for Movie Genre Classification. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13231. Springer, Cham. https://doi.org/10.1007/978-3-031-06427-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06427-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06426-5

  • Online ISBN: 978-3-031-06427-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics