research-article

Convolutions through time for multi-label movie genre classification

Authors:
Jônatas Wehrmann

Pontifícia Universidade Católica do RS, Porto Alegre-RS, Brazil

Pontifícia Universidade Católica do RS, Porto Alegre-RS, Brazil
View Profile

,
Rodrigo C. Barros

Pontifícia Universidade Católica do RS, Porto Alegre-RS, Brazil

Pontifícia Universidade Católica do RS, Porto Alegre-RS, Brazil
View Profile

SAC '17: Proceedings of the Symposium on Applied ComputingApril 2017Pages 114–119https://doi.org/10.1145/3019612.3019641

Published:03 April 2017Publication History

SAC '17: Proceedings of the Symposium on Applied Computing

Pages 114–119

ABSTRACT

In this paper, we explore the suitability of employing Convolutional Neural Networks (ConvNets) for multi-label movie trailer genre classification. Assigning genres to movies is a particularly challenging task because genre is an immaterial feature that is not physically present in a movie frame, so off-the-shelf image detection models cannot be easily adapted to this context. Moreover, multi-label classification is more challenging than single-label classification considering that one instance can be assigned to multiple classes at once. We propose a novel classification method that encapsulates an ultra-deep ConvNet with residual connections. Our approach extracts temporal information from image-based features prior to performing the mapping of trailers to genres. We compare our novel approach with the current state-of-the-art techniques for movie classification, which make use of well-known image descriptors and low-level handcrafted features. Results show that our method significantly outperforms the state-of-the-art in this task, improving the classification accuracy for all genres.

References

D. Comaniciu and P. Meer. Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5):603--619, May 2002. Google ScholarDigital Library
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. C. Courville, and Y. Bengio. Maxout networks. Journal of Machine Learning Research, 2013. Google ScholarDigital Library
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google Scholar
Y.-F. Huang and S.-H. Wang. Movie genre classification using svm with audio and video features. In R. Huang, A. A. Ghorbani, G. Pasi, T. Yamaguchi, N. Y. Yen, and B. Jin, editors, AMT, volume 7669 of Lecture Notes in Computer Science, pages 1--10. Springer, 2012. Google ScholarDigital Library
S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(1):221--231, 2013. Google ScholarDigital Library
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1725--1732. IEEE, 2014. Google ScholarDigital Library
Y. Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.Google Scholar
D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097--1105, 2012. Google ScholarDigital Library
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278--2324, 1998. Google ScholarCross Ref
D. McEnnis, C. McKay, I. Fujinaga, and P. Depalle. jaudio: An feature extraction library. In ISMIR, pages 600--603, 2005.Google Scholar
A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. International journal of computer vision, 42(3):145--175, 2001. Google ScholarDigital Library
Z. Rasheed, Y. Sheikh, and M. Shah. On the use of computable features for film classification. Circuits and Systems for Video Technology, IEEE Transactions on, 15(1):52--64, 2005. Google ScholarDigital Library
G. Simões, J. Wehrmann, R. C. Barros, and D. D. Ruiz. Movie genre classification with convolutional neural networks. In International Joint Conference on Neural Networks. IEEE, 2016. Google ScholarCross Ref
K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.Google Scholar
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.Google Scholar
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 4489--4497. IEEE, 2015. Google ScholarDigital Library
J. Wehrmann, R. C. Barros, G. Simões, T. S. Paula, and D. D. Ruiz. (deep) learning from frames. In Brazilian Conference on Intelligent Systems, 2016. Google ScholarCross Ref
J. Wu and J. M. Rehg. Where am i: Place instance and category recognition using spatial pact. In CVPR, pages 1--8, 2008.Google Scholar
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In Advances in neural information processing systems, pages 487--495, 2014. Google ScholarDigital Library
H. Zhou, T. Hermans, A. V. Karandikar, and J. M. Rehg. Movie genre classification via scene categorization. In Proceedings of the international conference on Multimedia, pages 747--750. ACM, 2010. Google ScholarDigital Library

Index Terms

Convolutions through time for multi-label movie genre classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

Movie Genre in Multi-label Classification Using Semantic Extraction from Only Movie Poster
ICCCM '19: Proceedings of the 7th International Conference on Computer and Communications Management

In this paper, we present the framework, SEMPD (the Semantic Extraction of Movie poster based on fundamental of poster Design) for multi-label genre classification in the state of insufficient data, included only movie poster. In order to get manageable ...
Read More
Movie Genre Classification based on Poster Images with Deep Neural Networks
MUSA2 '17: Proceedings of the Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes

We propose to achieve movie genre classification based only on movie poster images. A deep neural network is constructed to jointly describe visual appearance and object information, and classify a given movie poster image into genres. Because a movie ...
Read More
A multimodal approach for multi-label movie genre classification
Abstract
Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. The number of movie consumers interested in taking advantage of automatic movie genre classification is overgrowing, thanks to media ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '17: Proceedings of the Symposium on Applied Computing
April 2017
2004 pages
ISBN:9781450344869
DOI:10.1145/3019612
Conference Chair:
Sung Y. Shin
South Dakota State University
,
Program Chairs:
Dongwan Shin
New Mexico Tech
,
Maria Lencastre
University of Pernambuco, Brazil
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolution through time
deep neural networks
movie genre classification
multi-label classification
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 21
  Total Citations
  View Citations
- 486
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Convolutions through time for multi-label movie genre classification

SAC '17: Proceedings of the Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Movie Genre in Multi-label Classification Using Semantic Extraction from Only Movie Poster

Movie Genre Classification based on Poster Images with Deep Neural Networks

A multimodal approach for multi-label movie genre classification