ABSTRACT
In this paper, we explore the suitability of employing Convolutional Neural Networks (ConvNets) for multi-label movie trailer genre classification. Assigning genres to movies is a particularly challenging task because genre is an immaterial feature that is not physically present in a movie frame, so off-the-shelf image detection models cannot be easily adapted to this context. Moreover, multi-label classification is more challenging than single-label classification considering that one instance can be assigned to multiple classes at once. We propose a novel classification method that encapsulates an ultra-deep ConvNet with residual connections. Our approach extracts temporal information from image-based features prior to performing the mapping of trailers to genres. We compare our novel approach with the current state-of-the-art techniques for movie classification, which make use of well-known image descriptors and low-level handcrafted features. Results show that our method significantly outperforms the state-of-the-art in this task, improving the classification accuracy for all genres.
- D. Comaniciu and P. Meer. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603--619, May 2002. Google ScholarDigital Library
- I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, and Y. Bengio. Maxout networks. Journal of Machine Learning Research, 2013. Google ScholarDigital Library
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google Scholar
- Y.-F. Huang and S.-H. Wang. Movie genre classification using svm with audio and video features. In R. Huang, A. A. Ghorbani, G. Pasi, T. Yamaguchi, N. Y. Yen, and B. Jin, editors, AMT, volume 7669 of Lecture Notes in Computer Science, pages 1--10. Springer, 2012. Google ScholarDigital Library
- S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(1):221--231, 2013. Google ScholarDigital Library
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1725--1732. IEEE, 2014. Google ScholarDigital Library
- Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.Google Scholar
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012. Google ScholarDigital Library
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998. Google ScholarCross Ref
- D. McEnnis, C. McKay, I. Fujinaga, and P. Depalle. jaudio: An feature extraction library. In ISMIR, pages 600--603, 2005.Google Scholar
- A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145--175, 2001. Google ScholarDigital Library
- Z. Rasheed, Y. Sheikh, and M. Shah. On the use of computable features for film classification. Circuits and Systems for Video Technology, IEEE Transactions on, 15(1):52--64, 2005. Google ScholarDigital Library
- G. Simões, J. Wehrmann, R. C. Barros, and D. D. Ruiz. Movie genre classification with convolutional neural networks. In International Joint Conference on Neural Networks. IEEE, 2016. Google ScholarCross Ref
- K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.Google Scholar
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google Scholar
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 4489--4497. IEEE, 2015. Google ScholarDigital Library
- J. Wehrmann, R. C. Barros, G. Simões, T. S. Paula, and D. D. Ruiz. (deep) learning from frames. In Brazilian Conference on Intelligent Systems, 2016. Google ScholarCross Ref
- J. Wu and J. M. Rehg. Where am i: Place instance and category recognition using spatial pact. In CVPR, pages 1--8, 2008.Google Scholar
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in neural information processing systems, pages 487--495, 2014. Google ScholarDigital Library
- H. Zhou, T. Hermans, A. V. Karandikar, and J. M. Rehg. Movie genre classification via scene categorization. In Proceedings of the international conference on Multimedia, pages 747--750. ACM, 2010. Google ScholarDigital Library
Index Terms
- Convolutions through time for multi-label movie genre classification
Recommendations
Movie Genre in Multi-label Classification Using Semantic Extraction from Only Movie Poster
ICCCM '19: Proceedings of the 7th International Conference on Computer and Communications ManagementIn this paper, we present the framework, SEMPD (the Semantic Extraction of Movie poster based on fundamental of poster Design) for multi-label genre classification in the state of insufficient data, included only movie poster. In order to get manageable ...
Movie Genre Classification based on Poster Images with Deep Neural Networks
MUSA2 '17: Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective AttributesWe propose to achieve movie genre classification based only on movie poster images. A deep neural network is constructed to jointly describe visual appearance and object information, and classify a given movie poster image into genres. Because a movie ...
A multimodal approach for multi-label movie genre classification
AbstractMovie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is overgrowing, thanks to media ...
Comments