A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks

Chuttur, M. Y.; Nazurally, A.

doi:10.1007/s11042-022-12709-2

A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks

Published: 03 March 2022

Volume 81, pages 16881–16900, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

588 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Children are more than ever exposed to all kinds of video contents on the Internet. Consequently, several studies have proposed different techniques to detect videos that can be harmful for children. However, we note so far, that no attention has been given to the cartoon characters and the underlying language used. To address this gap, we propose to evaluate the effectiveness of using actual images of cartoon characters and the language used in cartoons in categorising videos as being appropriate or inappropriate for children. We do so through the development of a multi modal classifier, which makes use of the output from two deep learning networks: LSTM for text analysis and VGGNet for image analysis. More specifically, the LSTM network is used to process user comments and closed captions associated with a video and the VGGNet network is used to recognize cartoon image characters. The LSTM model was trained and tested on a dataset comprising about 290,000 labelled text records, while the VGGNet model was trained and tested on a manually annotated image dataset of 6000 cartoon characters. A testing accuracy of 94% was obtained for the LSTM network while a testing accuracy of 99% was obtained for the VGGNet network. Our proposed approach was further evaluated using 50 actual videos intended for children from YouTube. Here also, a good accuracy of 72% was obtained using LSTM alone, while a better accuracy of 78% was obtained using VGGNet alone and an accuracy of 76% was obtained using the combined output from the LSTM and VGGNet networks. We conclude that closed captions, user comments and images of cartoon characters are all useful in detecting unsafe videos for children and can be considered as essential parameters to include when developing multimedia filtering tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Extracting Multi-Language Text from Video into Editable Form

Detection and recognition of cursive text from video frames

Article Open access 28 August 2020

A Machine Learning Approach for Moderating Toxic Hinglish Comments of YouTube Videos

Notes

References

Bushman BJ, Huesmann LR (2006) Short-term and long-term effects of violent media on aggression in children and adults. Archives of pediatrics & adolescent medicine 160(4):348–352
Article Google Scholar
Izci B, Jones I, Özdemir TB, Alktebi L, and Bakir E (2019) “Youtube & young children: research, concerns and new directions,” Crianças, famílias e tecnologias. Que desafios? Que caminhos?, pp. 81–92
Neumann MM, Herodotou C (2020) Evaluating YouTube videos for young children. Educ Inf Technol 25:1–17
Article Google Scholar
Hermansson C, Zepernick J, “Children’s Film and Television: Contexts and New Directions,” in The Palgrave Handbook of Children’s Film and Television, Springer, 2019, pp. 1–33.
Hochscherf T, Laucht C (2019) Censorship, scripts, suppression, and selection: twentieth century-fox and the story of the Berlin airlift in the big lift and Es begann mit einem Kuß (it started with a kiss), 1950–1953. Film History 31(3):83–111
Article Google Scholar
Rosencrans S (1989) Fighting films: a first amendment analysis of censorship of violent motion pictures. Colum-VLA JL & Arts 14:451
Google Scholar
Drushel BE (2020) Of letters and lists: how the MPAA puts films recommended for LGBTQ adolescents out of reach. J Homosex 67(2):174–188
Article Google Scholar
Shafaei M, Samghabadi NS, Kar S, and Solorio T, “Age Suitability Rating: Predicting the MPAA Rating Based on Movie Dialogues,” in Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 1327–1335.
Feng GC (2019) A comparative study of the online film ratings of US and Chinese audiences: an analytical approach based on big data. Int Commun Gaz 81(3):283–302
Article Google Scholar
Gentile DA (2008) The rating systems for media products. Handbook of children, media, and development:527–551
Khan ML (2017) Social media engagement: what motivates user participation and consumption on YouTube? Comput Hum Behav 66:236–247
Article Google Scholar
Papadamou K et al., “Disturbed YouTube for kids: Characterizing and detecting inappropriate videos targeting young children,” in Proceedings of the International AAAI Conference on Web and Social Media, 2020, vol. 14, pp. 522–533.
Kasbekar A, Rana R, Shah V, and Joshi AR, “Detecting Offensive Text on Facebook Using Natural Language Processing and Machine Learning,” in Advanced Computing Technologies and Applications, Springer, 2020, pp. 645–656.
Holman EA, Garfin DR, Lubens P, Silver RC (2020) Media exposure to collective trauma, mental health, and functioning: does it matter what you see? Clin Psychol Sci 8(1):111–124
Article Google Scholar
Ishikawa A, Bollis E, and Avila S, “Combating the Elsagate Phenomenon: Deep Learning Architectures for Disturbing Cartoons,” in 2019 7th International Workshop on Biometrics and Forensics (IWBF), 2019, pp. 1–6
Tariq MU, Razi A, Badillo-Urquiola K, and Wisniewski P, “A Review of the Gaps and Opportunities of Nudity and Skin Detection Algorithmic Research for the Purpose of Combating Adolescent Sexting Behaviors,” in International Conference on Human-Computer Interaction, 2019, pp. 90–108.
Ries CX, Lienhart R (2014) A survey on visual adult image recognition. Multimed Tools Appl 69(3):661–688
Article Google Scholar
Alghowinem S, “A safer youtube kids: An extra layer of content filtering using automated multimodal analysis,” in Proceedings of SAI Intelligent Systems Conference, 2018, pp. 294–308.
Aggarwal N, Agrawal S, and Sureka A, “Mining YouTube metadata for detecting privacy invading harassment and misdemeanor videos,” in 2014 Twelfth Annual International Conference on Privacy, Security and Trust, 2014, pp. 84–93.
Eickhoff C and de Vries AP, “Identifying suitable YouTube videos for children,” 3rd Networked and electronic media summit (NEM), 2010.
Kaushal R, Saha S, Bajaj P, and Kumaraguru P, “KidsTube: Detection, characterization and analysis of child unsafe content & promoters on YouTube,” in 2016 14th Annual Conference on Privacy, Security and Trust (PST), 2016, pp. 157–164.
Bridges AJ, Wosnitzer R, Scharrer E, Sun C, Liberman R (2010) Aggression and sexual behavior in best-selling pornography videos: a content analysis update. Violence against women 16(10):1065–1085
Article Google Scholar
Chang JH, Bushman BJ (2019) Effect of exposure to gun violence in video games on children’s dangerous behavior with real guns: a randomized clinical trial. JAMA Netw Open 2(5):e194319–e194319
Article Google Scholar
Dillon KP, Bushman BJ (2017) Effects of exposure to gun violence in movies on children’s interest in real guns. JAMA Pediatr 171(11):1057–1062
Article Google Scholar
Gentile DA, Coyne S, Walsh DA (2011) Media violence, physical aggression, and relational aggression in school age children: a short-term longitudinal study. Aggress Behav 37(2):193–206
Article Google Scholar
Kennedy M (2020) ‘If the rise of the TikTok dance and e-girl aesthetic has taught us anything, it’s that teenage girls rule the internet right now’: TikTok celebrity, girls and the coronavirus crisis. Eur J Cult Stud 23(6):1069–1076
Article Google Scholar
Ali A and Senan N, “A review on violence video classification using convolutional neural networks,” in International Conference on Soft Computing and Data Mining, 2016, pp. 130–140.
Brezeale D, Cook DJ (2008) Automatic video classification: a survey of the literature. IEEE Trans Syst Man Cybern Part C Appl Rev 38(3):416–430
Article Google Scholar
Liang H, Sun X, Sun Y, Gao Y (2017) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12
Article Google Scholar
P. Rani, J. Kaur, and S. Kaswan, “Automatic Video Classification: A Review,” EAI Endorsed Transactions on Creative Technologies, vol. 7, no. 24, 2020.
M. Khan, M. A. Tahir, and Z. Ahmed, “Detection of violent content in cartoon videos using multimedia content detection techniques,” in 2018 IEEE 21st International Multi-Topic Conference (INMIC), 2018, pp. 1–5.
A. Ishikawa, E. Bollis, and S. Avila, “Combating the elsagate phenomenon: Deep learning architectures for disturbing cartoons,” arXiv preprint arXiv:1904.08910, 2019.
S. Singh, R. Kaushal, A. B. Buduru, and P. Kumaraguru, “KidsGUARD: fine grained approach for child unsafe video representation and detection,” in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, 2019, pp. 2104–2111.
A. Skowron, H. Wang, A. Wojna, and J. Bazan, “Multimodal classification: case studies,” in Transactions on Rough Sets V, Springer, 2006, pp. 224–239.
Ekenel HK, Semela T (2013) Multimodal genre classification of TV programs and YouTube videos. Multimed Tools Appl 63(2):547–567
Article Google Scholar
V. Ramanishka et al., “Multimodal video description,” in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 1092–1096.
R. Tahir, F. Ahmed, H. Saeed, S. Ali, F. Zaffar, and C. Wilson, “Bringing the kid back into YouTube kids: detecting inappropriate content on video streaming platforms,” in 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2019, pp. 464–469.
S. Reddy, N. Srikanth, and G. S. Sharvani, “Development of Kid-Friendly YouTube Access Model Using Deep Learning,” in Data Science and Security, Springer, 2020, pp. 243–250.
M. Buzzi, “Children and YouTube: access to safe content,” in Proceedings of the 9th ACM SIGCHI Italian Chapter International Conference on Computer-Human Interaction: Facing Complexity, 2011, pp. 125–131.
P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection in tweets,” in Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 759–760.
Huang C, Fu T, Chen H (2010) Text-based video content classification for online video-sharing sites. J Am Soc Inf Sci Technol 61(5):891–906
Article Google Scholar
Z. A. A. Ibrahim, S. Haidar, and I. Sbeity, “Large-scale Text-based Video Classification using Contextual Features,” European Journal of Electrical Engineering and Computer Science, vol. 3, no. 2, 2019.
“Toxic Comment Classification Challenge.” https://kaggle.com/c/jigsaw-toxic-comment-classification-challenge (accessed Feb. 14, 2021).
“Bad Bad Words.” https://kaggle.com/nicapotato/bad-bad-words (accessed Feb. 14, 2021).
“Tweets Dataset for Detection of Cyber-Trolls.” https://kaggle.com/dataturks/dataset-for-detection-of-cybertrolls (accessed Feb. 14, 2021).
“GitHub - t-davidson/hate-speech-and-offensive-language: Repository for the paper ‘Automated Hate Speech Detection and the Problem of Offensive Language’, ICWSM 2017.” https://github.com/t-davidson/hate-speech-and-offensive-language (accessed Feb. 14, 2021).
“Hate_speech_dataset.” https://kaggle.com/pandeyakshive97/hate-speech-dataset (accessed Feb. 14, 2021).
M. Afifi and M. S. Brown, “What else can fool deep learning? Addressing color constancy errors on deep neural network performance,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 243–252.
S. Dara and P. Tumma, “Feature extraction by using deep learning: A survey,” in 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 1795–1801.
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
A. Bisht, A. Singh, H. S. Bhadauria, and J. Virmani, “Detection of Hate Speech and Offensive Language in Twitter Data Using LSTM Model,” in Recent Trends in Image and Signal Processing in Computer Vision, Springer, 2020, pp. 243–264.
M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling,” 2012.
Book Google Scholar
Kassani SH, Kassani PH (2019) A comparative study of deep learning architectures on melanoma detection. Tissue Cell 58:76–83
Article Google Scholar
Elgendy M (2020) Deep learning for vision systems. Manning Publications
Google Scholar
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
X. Cao, Y. Zhang, M. Pan, and H. Zhou, “Experimental Study on Deep Learning Oriented to Learning Engagement Recognition,” in International Symposium on Computational Science and Computing, 2018, pp. 283–295.
Kaur T, Gandhi TK (2020) Deep convolutional neural networks with transfer learning for automated brain image classification. Mach Vis Appl 31(3):1–16
Article Google Scholar
U. Muhammad, W. Wang, S. P. Chattha, and S. Ali, “Pre-trained VGGNet architecture for remote-sensing image scene classification,” in 2018 24th International Conference on Pattern Recognition (ICPR), 2018, pp. 1622–1627.
Khan S, Islam N, Jan Z, Din IU, Rodrigues JJC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recogn Lett 125:1–6
Article Google Scholar
M. Lagunas and E. Garces, “Transfer Learning for Illustration Classification,” Spanish Computer Graphics Conference (CEIG), p. 9 pages, 2017, https://doi.org/10.2312/ceig.20171213.
C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in International conference on artificial neural networks, 2018, pp. 270–279.
M. Hussain, J. J. Bird, and D. R. Faria, “A study on cnn transfer learning for image classification,” in UK Workshop on computational Intelligence, 2018, pp. 191–202.
Y.-C. Su, T.-H. Chiu, C.-Y. Yeh, H.-F. Huang, and W. H. Hsu, “Transfer learning for video recognition with scarce training data for deep convolutional neural network,” arXiv preprint arXiv:1409.4127, 2014.
Soleymani R, Granger E, Fumera G (2020) F-measure curves: a tool to visualize classifier performance under imbalance. Pattern Recogn 100:107146
Article Google Scholar
Liu J et al (2016) 3D feature constrained reconstruction for low-dose CT imaging. IEEE Transactions on Circuits and Systems for Video Technology 28(5):1232–1247
Article Google Scholar
Yin X, Coatrieux JL, Zhao Q, Liu J, Yang W, Yang J, Quan G, Chen Y, Shu H, Luo L (2019) Domain progressive 3D residual convolution network to improve low-dose CT imaging. IEEE Trans Med Imaging 38(12):2903–2913
Article Google Scholar
Chen Y, Shi L, Feng Q, Yang J, Shu H, Luo L, Coatrieux JL, Chen W (2014) Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imaging 33(12):2271–2292
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Mauritius, Reduit, Moka, 80837, Mauritius
M. Y. Chuttur & A. Nazurally

Authors

M. Y. Chuttur
View author publications
You can also search for this author in PubMed Google Scholar
A. Nazurally
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Y. Chuttur.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chuttur, M.Y., Nazurally, A. A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks. Multimed Tools Appl 81, 16881–16900 (2022). https://doi.org/10.1007/s11042-022-12709-2

Download citation

Received: 28 November 2020
Revised: 14 February 2021
Accepted: 21 February 2022
Published: 03 March 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12709-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks

Abstract

Access this article

Similar content being viewed by others

Extracting Multi-Language Text from Video into Editable Form

Detection and recognition of cursive text from video frames

A Machine Learning Approach for Moderating Toxic Hinglish Comments of YouTube Videos

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks

Abstract

Access this article

Similar content being viewed by others

Extracting Multi-Language Text from Video into Editable Form

Detection and recognition of cursive text from video frames

A Machine Learning Approach for Moderating Toxic Hinglish Comments of YouTube Videos

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation