Movie genre classification using binary relevance, label powerset, and machine learning classifiers

Kumar, Sanjay; Kumar, Nikhil; Dev, Aditya; Naorem, Siraz

doi:10.1007/s11042-022-13211-5

Movie genre classification using binary relevance, label powerset, and machine learning classifiers

Published: 11 June 2022

Volume 82, pages 945–968, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Sanjay Kumar ORCID: orcid.org/0000-0002-8951-5996¹,
Nikhil Kumar¹,
Aditya Dev¹ &
…
Siraz Naorem¹

1381 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

Multi-label text classification (MLTC) is a technique to categorize texts into more than a single category and used extensively in various real-life problems. Such classifications problems are challenging and dependent on many factors and changes according to the problem. Movie genre classification is a popular multi-label text classification problem as movies may belong to multiple genres at the same time. The major factors used for movie genre classification are based on parameters like movie plot, title, summary, and subtitles. In recent years, some neural networks based approaches are proposed for solving such problems, which turns the solution into resource intensive and time consuming activities. In this paper, we propose a novel method of movie genre classification using a combination of problem transformation techniques, namely binary relevance (BR) and label powerset (LP), text vectorizers and machine learning classifier models. We perform binary relevance task (BR) that converts multi-label classification tasks into independent binary classification tasks whereas label powerset transforms a multi-label problem into a multiclass problem with one multiclass classifier trained on all unique label combinations found in the training data. Further, we apply text vectorizers namely, CV (Count Vectorizer) and TF-IDF (Term Frequency - Inverse Document Frequency) to tokenize the textual data to build a word vocabulary followed by employing various classifiers i.e., Logistic Regression (LR), Multinomial Naive Bayes (MNB), K-Nearest Neighbor (KNN), Support Vector Classifier (SVC) with the combination of different vectorizers and problem transformation methods. To test the effectiveness of these combinations, we use the k-fold cross-validation technique. We construct different combination using problem transformation approaches, text vectorizers and classifier models leading to overall 16 different combinations for classifying movies into appropriate genres. Finally, we evaluate the performance of each combination on publicly available IMDb datasets with target on 27 major parent genres using different performance measures and reveal that the best result is obtained using the combination comprising of label powerset (LP) as Problem transformation approach, TF-IDF as the text vectorizer and support vector classifier (SVC) as the machine learning classifier model with a commendable accuracy of 0.95 and F1-score of 0.86.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-label movie genre classification scheme based on the movie’s subtitles

Article 13 April 2022

A multimodal approach for multi-label movie genre classification

Article 07 November 2020

Comparison of Machine Learning Techniques for Multi-label Genre Classification

References

Berger MJ (2015) Large scale multi-label text classification with semantic word vectors. Technical report, Stanford University
Google Scholar
Bhowmik A, Kumar S, Bhat N (2019) Eye disease prediction from optical coherence tomography images with transfer. Learning engineering applications of neural networks. EANN, Communications in Computer and Information
Bhowmik A, Kumar S, Bhat N (2021) Evolution of automatic visual description techniques-a methodological survey. Multimedia Tools and Applications, pp 1–45
Cai L, Song Y, Liu T, Zhang K (2020) A hybrid BERT model that incorporates label semantics via adjustive attention for Multi-Label text classification. IEEE Access 8:152183–92
Article Google Scholar
Chu WT, Guo HJ (2017) Movie genre classification based on poster images with deep neural networks. In: proceedings of the workshop on multimodal understanding of social, affective and subjective attributes, pp 39–45
de Carvalho AC, Freitas AA (2009) A tutorial on multi-label classification techniques. Found Comput Intell 5:177–95
Google Scholar
Divya R, Kumari RS (2021) Genetic algorithm with logistic regression feature selection for Alzheimer’s disease classification. Neural Computing and Applications
Dong S (2021) Multi class SVM algorithm with active learning for network traffic classification. Expert Syst Appl 176:114885
Article Google Scholar
Dong S, Wang P, Abbas K (2021) A survey on deep learning and its applications. Comput Sci Rev 40:100379
Article MathSciNet MATH Google Scholar
Doshi P, Zadrozny W (2018) Movie genre detection using topological data analysis. In: International conference on statistical language and speech processing. Springer, Cham, pp 117–128
Ertugrul AM, Karagoz P (2018) Movie genre classification from plot summaries using bidirectional lstm. In: 2018 IEEE 12th International conference on semantic computing (ICSC). IEEE, pp 248–251
Ganda D, Buch R (2018) A survey on multi label classification. Recent Trends Program Lang 5(1):19–23
Google Scholar
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, pp 22–30
Hoang Q (2018) Predicting movie genres based on plot summaries. arXiv:1801.04813
https://scikit-learn.org/stable/modules/generated/sklearn.feature-extraction.text.CountVectorizer.html [Accessed: 20-Aug-2020] (2020)
Huang Y, Chen J, Zheng S, Xue Y, Hu X (2021) Hierarchical multi-attention networks for document classification. Int J Mach Learn Cybern 12(6):1639–47
Article Google Scholar
Imdb data ftp://ftp.fu-berlin.de/pub/misc/movies/database/
Jiang H, Xiao Y, Wang W (2020) Explaining a bag of words with hierarchical conceptual labels. World Wide Web, 1–21
Katyal S, Kumar S, Sakhuja R, Gupta S (2018) Object detection in foggy conditions by fusion of saliency map and YOLO. In: 2018 12th International conference on sensing technology (ICST), pp 154–159
Khurana G, Bawa NK (2021) Weed detection approach using feature extraction and KNN classification. In: Advances in electromechanical technologies. Springer, Singapore, pp 671–679
Kumar S, Kumar M (2019) Predicting customer churn using artificial neural network. In: International conference on engineering applications of neural networks, pp 299–306
Longato E, Acciaroli G, Facchinetti A, Maran A, Sparacino G (2020) Simple linear support vector machine classifier can distinguish impaired glucose tolerance versus type 2 diabetes using a reduced set of CGM-based glycemic variability indices. J Diabetes Sci Technol 14(2):297–302
Article Google Scholar
Loper E, Bird S (2002) Nltk: the natural language toolkit. arXiv:0205028
Mangolin RB, Pereira RM, Britto AS, Silla CN, Feltrim VD, Bertolini D, Costa YM (2020) A multimodal approach for multi-label movie genre classification. Multimed Tools Appl 7:1–26
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–30
MathSciNet MATH Google Scholar
Pobar M, Ivasic-Kos M (2017) Multi-label poster classification into genres using different problem transformation methods. In: International conference on computer analysis of images and patterns, pp 367–378
Portolese G, Domingues MA, Feltrim VD (2019) Exploring textual features for multi-label classification of portuguese film synopses. In: EPIA Conference on artificial intelligence. Springer, Cham, pp 669–681
Saputra AC, Sitepu AB, Sigit PW, Tetuko PG, Nugroho GC (2019) The classification of the movie genre based on synopsis of the Indonesian film. In: International conference of artificial intelligence and information technology (ICAIIT), pp 201–204
Sinha A, Ganguly J (2021) Categorization of videos based on text using multinomial naïve bayes classifier. In: Proceedings of international conference on frontiers in computing and systems, pp 299–308
Spolaôr N, Cherman EA, Monard MC, Lee HD (2013) A comparison of multi-label feature selection methods using the problem transformation approach. Electron Notes Theor Comput Sci 292:135–51
Article Google Scholar
Sun J, Zhu M, Jiang Y, Liu Y, Wu L (2021) Hierarchical attention model for personalized tag recommendation. J Assoc Inf Sci Technol 72:173–189
Article Google Scholar
Taiwiah CA, Sheng V (2013) A study on multi-label classification. Advances in data mining. Applications and theoretical aspects. Lect Notes Comput Sci 7987:137–50
Article Google Scholar
Ullman J (2011) Mining of massive datasets. Cambridge University Press
Wang T, Liu L, Liu N, Zhang H, Zhang L, Feng S (2020) A multi-label text classification method via dynamic semantic representation model and deep neural network. Appl Intell 50(8):2339–51
Article Google Scholar
Wehrmann J, Lopes MA, Barros RC (2018) Self-attention for synopsis-based multi-label movie genre classification. In: The 31th International FLAIRS conference
Xia Y, Chen K, Yang Y (2021) Multi-label classification with weighted classifier selection and stacked ensemble. Inf Sci 557:421–42
Article MathSciNet MATH Google Scholar
Yang P, Sun X, Li W, Ma S, Wu W, Wang H (2018) SGM: sequence generation model for multi-label classification. arXiv:1806.04822
Yong ZJ, Hoo WL (2020) Movie genre filtering for automated parental control. In: International conference on intelligent robotics and applications. Springer, Cham, pp 244–253
Yu Y, Lu Z, Li Y, Liu D (2021) ASTS: attention based spatio-temporal sequential framework for movie trailer genre classification. Multimed Tools Appl 80(7):9749–64
Article Google Scholar
Zhang ML, Li YK, Liu XY, Geng X (2018) Binary relevance for multi-label learning: an overview. Front Comput Sci 12(2):191–202
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Delhi Technological University, New Delhi, 110042, India
Sanjay Kumar, Nikhil Kumar, Aditya Dev & Siraz Naorem

Authors

Sanjay Kumar
View author publications
You can also search for this author inPubMed Google Scholar
Nikhil Kumar
View author publications
You can also search for this author inPubMed Google Scholar
Aditya Dev
View author publications
You can also search for this author inPubMed Google Scholar
Siraz Naorem
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sanjay Kumar.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest. The authors did not receive support from any organization for the submitted work

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, S., Kumar, N., Dev, A. et al. Movie genre classification using binary relevance, label powerset, and machine learning classifiers. Multimed Tools Appl 82, 945–968 (2023). https://doi.org/10.1007/s11042-022-13211-5

Download citation

Received: 18 January 2021
Revised: 22 February 2022
Accepted: 11 May 2022
Published: 11 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13211-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Movie genre classification using binary relevance, label powerset, and machine learning classifiers

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-label movie genre classification scheme based on the movie’s subtitles

A multimodal approach for multi-label movie genre classification

Comparison of Machine Learning Techniques for Multi-label Genre Classification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now