Skip to main content
Log in

ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automatic movie trailer genre classification is a challenging task because trailers have more diverse content and high-level sequential semantic concepts within the movie storyline, which can help for multimedia search and personalized movie recommendation. Traditional methods generally extract the low-level features or consider the local sequential dependencies among trailer frames, ignoring the global high-level sequential semantic concepts. In this manuscript, we propose a novel and effective Attention based Spatio-temporal Sequential Framework (ASTS) for movie trailer genre classification. The proposed framework mainly consists of two modules, respectively the spatio-temporal descriptive module and the attention-based sequential module. The spatio-temporal descriptive module adopts some advanced convolution neural networks to extract the spatio-temporal features of key trailer frames, which can capture the local spatio-temporal semantic features. The attention-based sequential module is designed to process the extracted spatio-temporal feature representation sequence for capturing the global high-level sequential semantic concepts within the movie storyline. We crawl 14,415 labeled movie trailers from YouTube and integrate them into the public dataset MovieLens. Experiment results show that our proposed framework is superior to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. https://www.netflix.com/cn/

  2. https://www.imdb.com/

  3. https://github.com/Marinyyt/MovieTrailer-14k

  4. k is set as 30, which is larger than the most clip numbers in movie trailers. And for those trailers whose clips are larger than 30, we randomly selected 30 clips among them.

  5. Many strategies and methods can be adopted to extract the representative frames. In this paper, we use the interval sampling strategy and leave the exploration of sampling methods for future work.

  6. The crawled dataset is public. https://github.com/Marinyyt/MovieTrailer-14k

  7. We have performed experiments about the different settings for the parameters. Experiment results show that different Dh and Da have little effects on the performance of our method.

References

  1. Abualigah L, Qasim M (2019) Feature selection and enhanced krill herd algorithm for text document clustering. Springer, Berlin

    Book  Google Scholar 

  2. Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. arXiv:1705.07750

  3. Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6299–6308

  4. Chu WT, Guo HJ (2017) Movie genre classification based on poster images with deep neural networks, pp 39–45. https://doi.org/10.1145/3132515.3132516

  5. Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555

  6. Deldjoo Y, Elahi M, Quadrana M, Cremonesi P (2015) Toward building a content-based video recommendation system based on low-level features. https://doi.org/10.1007/978-3-319-27729-5

  7. Deldjoo Y, Elahi M, Cremonesi P, Garzotto F, Piazzolla P, Quadrana M (2016) Content-based video recommendation system based on stylistic visual features. Journal on Data Semantics 5:1–15. https://doi.org/10.1007/s13740-016-0060-9

    Article  Google Scholar 

  8. Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, IEEE, pp 248– 255

  9. Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2625–2634

  10. Everingham M, Van Gool L, Williams C K I, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  11. Harper F M, Konstan J A (2015) The movielens datasets: history and context. ACM Trans Interact Intell Syst 5(4):19:1–19:19

    Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  13. Huang H, Shih W, Hsu W (2007) A film classifier based on low-level visual features. In: 2007 IEEE 9th workshop on multimedia signal processing, pp 465–468

  14. Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  15. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. Computer Science

  16. Kundalia K, Patel Y, Shah M (2019) Multi-label movie genre detection from a movie poster using knowledge transfer learning. Augmented Human Research 5:11. https://doi.org/10.1007/s41133-019-0029-y

    Article  Google Scholar 

  17. Li Q, Qiu Z, Yao T, Mei T, Rui Y, Luo J (2016) Action recognition by learning deep multi-granular spatio-temporal video representation. In: Proceedings of the 2016 ACM on international conference on multimedia retrieval, pp 159–166

  18. Rasheed Z, Shah M (2002) Movie genre classification by exploiting audio-visual features of previews. In: Object recognition supported by user interaction for service robots, vol 2, pp 1086–1089

  19. Rasheed Z, Shah M (2002) Movie genre classification by exploiting audio-visual features of previews. In: International conference on pattern recognition

  20. Rasheed Z, Sheikh Y, Shah M (2005) On the use of computable features for film classification. IEEE Transactions on Circuits And Systems for Video Technology 15:52–64

    Article  Google Scholar 

  21. Schuster M, Paliwal K K (1997) Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45(11):2673–2681

    Article  Google Scholar 

  22. Simões G, Wehrmann J, Barros R, Ruiz D (2016) Movie genre classification with convolutional neural networks, pp 259–266. https://doi.org/10.1109/IJCNN.2016.7727207

  23. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576

  24. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 4489–4497

  25. Wang L, Xiong Y, Zhe W, Yu Q, Lin D, Tang X, Gool L V (2016) Temporal segment networks: towards good practices for deep action recognition. In: Eccv

  26. Wehrmann J, Barros R C (2017) Movie genre classification: a multi-label approach based on convolutions through time. Appl Soft Comput 61

  27. Wehrmann J, Barros R C, Simões G S, Paula T S, Ruiz DD (2017) (Deep) Learning from frames. In: Intelligent systems

  28. Zha S, Luisier F, Andrews W, Srivastava N, Salakhutdinov R (2015) Exploiting image-trained cnn architectures for unconstrained video classification. arXiv:1503.04144

  29. Zhou H, Hermans T, Karandikar A V, Rehg J M (2010) Movie genre classification via scene categorization. In: International conference on multimedia

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziyu Lu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, Y., Lu, Z., Li, Y. et al. ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification. Multimed Tools Appl 80, 9749–9764 (2021). https://doi.org/10.1007/s11042-020-10125-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-10125-y

Keywords

Navigation