Skip to main content
Log in

A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Children are more than ever exposed to all kinds of video contents on the Internet. Consequently, several studies have proposed different techniques to detect videos that can be harmful for children. However, we note so far, that no attention has been given to the cartoon characters and the underlying language used. To address this gap, we propose to evaluate the effectiveness of using actual images of cartoon characters and the language used in cartoons in categorising videos as being appropriate or inappropriate for children. We do so through the development of a multi modal classifier, which makes use of the output from two deep learning networks: LSTM for text analysis and VGGNet for image analysis. More specifically, the LSTM network is used to process user comments and closed captions associated with a video and the VGGNet network is used to recognize cartoon image characters. The LSTM model was trained and tested on a dataset comprising about 290,000 labelled text records, while the VGGNet model was trained and tested on a manually annotated image dataset of 6000 cartoon characters. A testing accuracy of 94% was obtained for the LSTM network while a testing accuracy of 99% was obtained for the VGGNet network. Our proposed approach was further evaluated using 50 actual videos intended for children from YouTube. Here also, a good accuracy of 72% was obtained using LSTM alone, while a better accuracy of 78% was obtained using VGGNet alone and an accuracy of 76% was obtained using the combined output from the LSTM and VGGNet networks. We conclude that closed captions, user comments and images of cartoon characters are all useful in detecting unsafe videos for children and can be considered as essential parameters to include when developing multimedia filtering tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.motionpictures.org/film-ratings/

  2. https://colab.research.google.com/

  3. http://www.image-net.org/

  4. https://developers.google.com/youtube/v3/docs/comments

  5. https://developers.google.com/youtube/v3/docs/captions

  6. https://ffmpeg.org/

References

  1. Bushman BJ, Huesmann LR (2006) Short-term and long-term effects of violent media on aggression in children and adults. Archives of pediatrics & adolescent medicine 160(4):348–352

    Article  Google Scholar 

  2. Izci B, Jones I, Özdemir TB, Alktebi L, and Bakir E (2019) “Youtube & young children: research, concerns and new directions,” Crianças, famílias e tecnologias. Que desafios? Que caminhos?, pp. 81–92

  3. Neumann MM, Herodotou C (2020) Evaluating YouTube videos for young children. Educ Inf Technol 25:1–17

    Article  Google Scholar 

  4. Hermansson C, Zepernick J, “Children’s Film and Television: Contexts and New Directions,” in The Palgrave Handbook of Children’s Film and Television, Springer, 2019, pp. 1–33.

  5. Hochscherf T, Laucht C (2019) Censorship, scripts, suppression, and selection: twentieth century-fox and the story of the Berlin airlift in the big lift and Es begann mit einem Kuß (it started with a kiss), 1950–1953. Film History 31(3):83–111

    Article  Google Scholar 

  6. Rosencrans S (1989) Fighting films: a first amendment analysis of censorship of violent motion pictures. Colum-VLA JL & Arts 14:451

    Google Scholar 

  7. Drushel BE (2020) Of letters and lists: how the MPAA puts films recommended for LGBTQ adolescents out of reach. J Homosex 67(2):174–188

    Article  Google Scholar 

  8. Shafaei M, Samghabadi NS, Kar S, and Solorio T, “Age Suitability Rating: Predicting the MPAA Rating Based on Movie Dialogues,” in Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 1327–1335.

  9. Feng GC (2019) A comparative study of the online film ratings of US and Chinese audiences: an analytical approach based on big data. Int Commun Gaz 81(3):283–302

    Article  Google Scholar 

  10. Gentile DA (2008) The rating systems for media products. Handbook of children, media, and development:527–551

  11. Khan ML (2017) Social media engagement: what motivates user participation and consumption on YouTube? Comput Hum Behav 66:236–247

    Article  Google Scholar 

  12. Papadamou K et al., “Disturbed YouTube for kids: Characterizing and detecting inappropriate videos targeting young children,” in Proceedings of the International AAAI Conference on Web and Social Media, 2020, vol. 14, pp. 522–533.

  13. Kasbekar A, Rana R, Shah V, and Joshi AR, “Detecting Offensive Text on Facebook Using Natural Language Processing and Machine Learning,” in Advanced Computing Technologies and Applications, Springer, 2020, pp. 645–656.

  14. Holman EA, Garfin DR, Lubens P, Silver RC (2020) Media exposure to collective trauma, mental health, and functioning: does it matter what you see? Clin Psychol Sci 8(1):111–124

    Article  Google Scholar 

  15. Ishikawa A, Bollis E, and Avila S, “Combating the Elsagate Phenomenon: Deep Learning Architectures for Disturbing Cartoons,” in 2019 7th International Workshop on Biometrics and Forensics (IWBF), 2019, pp. 1–6

  16. Tariq MU, Razi A, Badillo-Urquiola K, and Wisniewski P, “A Review of the Gaps and Opportunities of Nudity and Skin Detection Algorithmic Research for the Purpose of Combating Adolescent Sexting Behaviors,” in International Conference on Human-Computer Interaction, 2019, pp. 90–108.

  17. Ries CX, Lienhart R (2014) A survey on visual adult image recognition. Multimed Tools Appl 69(3):661–688

    Article  Google Scholar 

  18. Alghowinem S, “A safer youtube kids: An extra layer of content filtering using automated multimodal analysis,” in Proceedings of SAI Intelligent Systems Conference, 2018, pp. 294–308.

  19. Aggarwal N, Agrawal S, and Sureka A, “Mining YouTube metadata for detecting privacy invading harassment and misdemeanor videos,” in 2014 Twelfth Annual International Conference on Privacy, Security and Trust, 2014, pp. 84–93.

  20. Eickhoff C and de Vries AP, “Identifying suitable YouTube videos for children,” 3rd Networked and electronic media summit (NEM), 2010.

  21. Kaushal R, Saha S, Bajaj P, and Kumaraguru P, “KidsTube: Detection, characterization and analysis of child unsafe content & promoters on YouTube,” in 2016 14th Annual Conference on Privacy, Security and Trust (PST), 2016, pp. 157–164.

  22. Bridges AJ, Wosnitzer R, Scharrer E, Sun C, Liberman R (2010) Aggression and sexual behavior in best-selling pornography videos: a content analysis update. Violence against women 16(10):1065–1085

    Article  Google Scholar 

  23. Chang JH, Bushman BJ (2019) Effect of exposure to gun violence in video games on children’s dangerous behavior with real guns: a randomized clinical trial. JAMA Netw Open 2(5):e194319–e194319

    Article  Google Scholar 

  24. Dillon KP, Bushman BJ (2017) Effects of exposure to gun violence in movies on children’s interest in real guns. JAMA Pediatr 171(11):1057–1062

    Article  Google Scholar 

  25. Gentile DA, Coyne S, Walsh DA (2011) Media violence, physical aggression, and relational aggression in school age children: a short-term longitudinal study. Aggress Behav 37(2):193–206

    Article  Google Scholar 

  26. Kennedy M (2020) ‘If the rise of the TikTok dance and e-girl aesthetic has taught us anything, it’s that teenage girls rule the internet right now’: TikTok celebrity, girls and the coronavirus crisis. Eur J Cult Stud 23(6):1069–1076

    Article  Google Scholar 

  27. Ali A and Senan N, “A review on violence video classification using convolutional neural networks,” in International Conference on Soft Computing and Data Mining, 2016, pp. 130–140.

  28. Brezeale D, Cook DJ (2008) Automatic video classification: a survey of the literature. IEEE Trans Syst Man Cybern Part C Appl Rev 38(3):416–430

    Article  Google Scholar 

  29. Liang H, Sun X, Sun Y, Gao Y (2017) Text feature extraction based on deep learning: a review. EURASIP J Wirel Commun Netw 2017(1):1–12

    Article  Google Scholar 

  30. P. Rani, J. Kaur, and S. Kaswan, “Automatic Video Classification: A Review,” EAI Endorsed Transactions on Creative Technologies, vol. 7, no. 24, 2020.

  31. M. Khan, M. A. Tahir, and Z. Ahmed, “Detection of violent content in cartoon videos using multimedia content detection techniques,” in 2018 IEEE 21st International Multi-Topic Conference (INMIC), 2018, pp. 1–5.

  32. A. Ishikawa, E. Bollis, and S. Avila, “Combating the elsagate phenomenon: Deep learning architectures for disturbing cartoons,” arXiv preprint arXiv:1904.08910, 2019.

  33. S. Singh, R. Kaushal, A. B. Buduru, and P. Kumaraguru, “KidsGUARD: fine grained approach for child unsafe video representation and detection,” in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, 2019, pp. 2104–2111.

  34. A. Skowron, H. Wang, A. Wojna, and J. Bazan, “Multimodal classification: case studies,” in Transactions on Rough Sets V, Springer, 2006, pp. 224–239.

  35. Ekenel HK, Semela T (2013) Multimodal genre classification of TV programs and YouTube videos. Multimed Tools Appl 63(2):547–567

    Article  Google Scholar 

  36. V. Ramanishka et al., “Multimodal video description,” in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 1092–1096.

  37. R. Tahir, F. Ahmed, H. Saeed, S. Ali, F. Zaffar, and C. Wilson, “Bringing the kid back into YouTube kids: detecting inappropriate content on video streaming platforms,” in 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2019, pp. 464–469.

  38. S. Reddy, N. Srikanth, and G. S. Sharvani, “Development of Kid-Friendly YouTube Access Model Using Deep Learning,” in Data Science and Security, Springer, 2020, pp. 243–250.

  39. M. Buzzi, “Children and YouTube: access to safe content,” in Proceedings of the 9th ACM SIGCHI Italian Chapter International Conference on Computer-Human Interaction: Facing Complexity, 2011, pp. 125–131.

  40. P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection in tweets,” in Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 759–760.

  41. Huang C, Fu T, Chen H (2010) Text-based video content classification for online video-sharing sites. J Am Soc Inf Sci Technol 61(5):891–906

    Article  Google Scholar 

  42. Z. A. A. Ibrahim, S. Haidar, and I. Sbeity, “Large-scale Text-based Video Classification using Contextual Features,” European Journal of Electrical Engineering and Computer Science, vol. 3, no. 2, 2019.

  43. “Toxic Comment Classification Challenge.” https://kaggle.com/c/jigsaw-toxic-comment-classification-challenge (accessed Feb. 14, 2021).

  44. “Bad Bad Words.” https://kaggle.com/nicapotato/bad-bad-words (accessed Feb. 14, 2021).

  45. “Tweets Dataset for Detection of Cyber-Trolls.” https://kaggle.com/dataturks/dataset-for-detection-of-cybertrolls (accessed Feb. 14, 2021).

  46. “GitHub - t-davidson/hate-speech-and-offensive-language: Repository for the paper ‘Automated Hate Speech Detection and the Problem of Offensive Language’, ICWSM 2017.” https://github.com/t-davidson/hate-speech-and-offensive-language (accessed Feb. 14, 2021).

  47. “Hate_speech_dataset.” https://kaggle.com/pandeyakshive97/hate-speech-dataset (accessed Feb. 14, 2021).

  48. M. Afifi and M. S. Brown, “What else can fool deep learning? Addressing color constancy errors on deep neural network performance,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 243–252.

  49. S. Dara and P. Tumma, “Feature extraction by using deep learning: A survey,” in 2018 Second International Conference on Electronics, Communication and Aerospace Technology (ICECA), 2018, pp. 1795–1801.

  50. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  51. A. Bisht, A. Singh, H. S. Bhadauria, and J. Virmani, “Detection of Hate Speech and Offensive Language in Twitter Data Using LSTM Model,” in Recent Trends in Image and Signal Processing in Computer Vision, Springer, 2020, pp. 243–264.

  52. M. Sundermeyer, R. Schlüter, and H. Ney, “LSTM neural networks for language modeling,” 2012.

    Book  Google Scholar 

  53. Kassani SH, Kassani PH (2019) A comparative study of deep learning architectures on melanoma detection. Tissue Cell 58:76–83

    Article  Google Scholar 

  54. Elgendy M (2020) Deep learning for vision systems. Manning Publications

    Google Scholar 

  55. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.

  56. X. Cao, Y. Zhang, M. Pan, and H. Zhou, “Experimental Study on Deep Learning Oriented to Learning Engagement Recognition,” in International Symposium on Computational Science and Computing, 2018, pp. 283–295.

  57. Kaur T, Gandhi TK (2020) Deep convolutional neural networks with transfer learning for automated brain image classification. Mach Vis Appl 31(3):1–16

    Article  Google Scholar 

  58. U. Muhammad, W. Wang, S. P. Chattha, and S. Ali, “Pre-trained VGGNet architecture for remote-sensing image scene classification,” in 2018 24th International Conference on Pattern Recognition (ICPR), 2018, pp. 1622–1627.

  59. Khan S, Islam N, Jan Z, Din IU, Rodrigues JJC (2019) A novel deep learning based framework for the detection and classification of breast cancer using transfer learning. Pattern Recogn Lett 125:1–6

    Article  Google Scholar 

  60. M. Lagunas and E. Garces, “Transfer Learning for Illustration Classification,” Spanish Computer Graphics Conference (CEIG), p. 9 pages, 2017, https://doi.org/10.2312/ceig.20171213.

  61. C. Tan, F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, “A survey on deep transfer learning,” in International conference on artificial neural networks, 2018, pp. 270–279.

  62. M. Hussain, J. J. Bird, and D. R. Faria, “A study on cnn transfer learning for image classification,” in UK Workshop on computational Intelligence, 2018, pp. 191–202.

  63. Y.-C. Su, T.-H. Chiu, C.-Y. Yeh, H.-F. Huang, and W. H. Hsu, “Transfer learning for video recognition with scarce training data for deep convolutional neural network,” arXiv preprint arXiv:1409.4127, 2014.

  64. Soleymani R, Granger E, Fumera G (2020) F-measure curves: a tool to visualize classifier performance under imbalance. Pattern Recogn 100:107146

    Article  Google Scholar 

  65. Liu J et al (2016) 3D feature constrained reconstruction for low-dose CT imaging. IEEE Transactions on Circuits and Systems for Video Technology 28(5):1232–1247

    Article  Google Scholar 

  66. Yin X, Coatrieux JL, Zhao Q, Liu J, Yang W, Yang J, Quan G, Chen Y, Shu H, Luo L (2019) Domain progressive 3D residual convolution network to improve low-dose CT imaging. IEEE Trans Med Imaging 38(12):2903–2913

    Article  Google Scholar 

  67. Chen Y, Shi L, Feng Q, Yang J, Shu H, Luo L, Coatrieux JL, Chen W (2014) Artifact suppressed dictionary learning for low-dose CT image processing. IEEE Trans Med Imaging 33(12):2271–2292

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Y. Chuttur.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chuttur, M.Y., Nazurally, A. A multi-modal approach to detect inappropriate cartoon video contents using deep learning networks. Multimed Tools Appl 81, 16881–16900 (2022). https://doi.org/10.1007/s11042-022-12709-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12709-2

Keywords

Navigation