skip to main content
10.1145/3284398.3284403acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

ToonNet: a cartoon image dataset and a DNN-based semantic classification system

Published: 02 December 2018 Publication History

Abstract

Cartoon-style pictures can be seen almost everywhere in our daily life. Numerous applications try to deal with cartoon pictures, a dataset of cartoon pictures will be valuable for these applications. In this paper, we first present ToonNet: a cartoon-style image recognition dataset. We construct our benchmark set by 4000 images in 12 different classes collected from the Internet with little manual filtration. We extend the basal dataset to 10000 images by adopting several methods, including snapshots of rendered 3D models with a cartoon shader, a 2D-3D-2D converting procedure using a cartoon-modeling method and a hand-drawing stylization filter. Then, we describe how to build an effective neural network for image semantic classification based on ToonNet. We present three techniques for building the Deep Neural Network (DNN), namely, IUS: Inputs Unified Stylization, stylizing the inputs to reduce the complexity of hand-drawn cartoon images; FIN: Feature Inserted Network, inserting intuitionistic and valuable global features into the network; NPN: Network Plus Network, using multiple single networks as a new mixed network. We show the efficacy and generality of our network strategies in our experiments. By utilizing these techniques, the classification accuracy can reach 78% (top-1) and 93%(top-3), which has an improvement of about 5% (top-1) compared with classical DNNs.

References

[1]
{n. d.}. Half Lambert. https://developer.valvesoftware.com/wiki/Half_Lambert.
[2]
Andrew D. Bagdanov. 2012. Color Attributes for Object Detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3306--3313. http://dl.acm.org/citation.cfm?id=2354409.2354951
[3]
Neeraj Bhargava, Prakriti Trivedi, Akanksha Toshniwal, and Himanshu Swarnkar. 2013. Iterative Region Merging and Object Retrieval Method Using Mean Shift Segmentation and Flood Fill Algorithm. In Third International Conference on Advances in Computing and Communications. 157--160.
[4]
Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. (2017), 527--536.
[5]
Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR abs/1605.07678 (2016). arXiv:1605.07678 http://arxiv.org/abs/1605.07678
[6]
L. C. Chen, G Papandreou, I Kokkinos, K Murphy, and A. L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40, 4 (2018), 834--848.
[7]
Dorin Comaniciu and Peter Meer. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 5 (2002), 603--619.
[8]
Jia Deng, Wei Dong, R. Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. Proc of IEEE Computer Vision & Pattern Recognition (2009), 248--255.
[9]
Hao Dong, Simiao Yu, Chao Wu, and Yike Guo. 2017. Semantic Image Synthesis via Adversarial Learning. CoRR abs/1707.06873 (2017). arXiv:1707.06873 http://arxiv.org/abs/1707.06873
[10]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (2010), 303--338.
[11]
Lele Feng, Xubo Yang, and Shuangjiu Xiao. 2017. MagicToon: A 2D-to-3D creative cartoon modeling system with mobile AR. In Virtual Reality. 195--204.
[12]
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.
[13]
Hinton GE and Salakhutdinov RR. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.
[14]
Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 Object Category Dataset. California Institute of Technology (2007).
[15]
David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. CoRR abs/1704.03477 (2017). arXiv:1704.03477 http://arxiv.org/abs/1704.03477
[16]
KaimingHe, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. (2016), 770--778.
[17]
Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computer Science 3, 4 (2012), págs. 212--223.
[18]
Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. CoRR abs/1703.06868 (2017). arXiv:1703.06868 http://arxiv.org/abs/1703.06868
[19]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167 (2015). arXiv:1502.03167 http://arxiv.org/abs/1502.03167
[20]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).
[21]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems. 1097--1105.
[22]
Yann Lecun and Corinna Cortes. 2010. The mnist database of handwritten digits. (2010).
[23]
Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network In Network. Computer Science (2013).
[24]
Yifan Liu, Zengchang Qin, Zhenbo Luo, and Hua Wang. 2017. Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. (2017).
[25]
Cewu Lu, Li Xu, and Jiaya Jia. 2012. Combining Sketch and Tone for Pencil Drawing Production. In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR '12). Eurographics Association, Goslar Germany, Germany, 65--73. http://dl.acm.org/citation.cfm?id=2330147.2330161
[26]
Jason Mitchell, Moby Francke, and Dhabih Eng. 2007. Illustrative rendering in Team Fortress 2. In International Symposium on Non-Photorealistic Animation and Rendering. 71--76.
[27]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2014. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2014), 211--252.
[28]
Jurgen Schmidhuber. 2012. Multi-column Deep Neural Networks for Image Classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3642--3649. http://dl.acm.org/citation.cfm?id=2354409.2354694
[29]
Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In European Conference on Computer Vision. 1--15.
[30]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556
[31]
C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015a. Going deeper with convolutions. 00 (June 2015), 1--9.
[32]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015b. Rethinking the Inception Architecture for Computer Vision. Computer Science (2015), 2818--2826.
[33]
Jun Yu and Hock-Soon Seah. 2011. Fuzzy Diffusion Distance Learning for Cartoon Similarity Estimation. J. Comput. Sci. Technol. 26, 2 (March 2011), 203--216.

Cited By

View all
  • (2025)Deep Learning in Cartoon Moderation: Distinguishing Child-Friendly Content with CNN ArchitecturesCognitive Computing and Cyber Physical Systems10.1007/978-3-031-77075-3_20(241-256)Online publication date: 9-Feb-2025
  • (2024)Bridge the gap between practical application scenarios and cartoon character detection: A benchmark dataset and deep learning modelDisplays10.1016/j.displa.2024.10279384(102793)Online publication date: Sep-2024
  • (2024)CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application ScenariosDigital Multimedia Communications10.1007/978-981-97-3626-3_27(369-381)Online publication date: 21-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VRCAI '18: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry
December 2018
200 pages
ISBN:9781450360876
DOI:10.1145/3284398
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cartoon image recognition
  2. image dataset
  3. machine learning

Qualifiers

  • Research-article

Conference

VRCAI '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)34
  • Downloads (Last 6 weeks)5
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Deep Learning in Cartoon Moderation: Distinguishing Child-Friendly Content with CNN ArchitecturesCognitive Computing and Cyber Physical Systems10.1007/978-3-031-77075-3_20(241-256)Online publication date: 9-Feb-2025
  • (2024)Bridge the gap between practical application scenarios and cartoon character detection: A benchmark dataset and deep learning modelDisplays10.1016/j.displa.2024.10279384(102793)Online publication date: Sep-2024
  • (2024)CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application ScenariosDigital Multimedia Communications10.1007/978-981-97-3626-3_27(369-381)Online publication date: 21-Jun-2024
  • (2022)Graph Jigsaw Learning for Cartoon Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2022.317795231(3961-3972)Online publication date: 2022
  • (2021)An Evaluation of Traditional and CNN-Based Feature Descriptors for Cartoon Pornography DetectionIEEE Access10.1109/ACCESS.2021.30643929(39910-39925)Online publication date: 2021
  • (2021)Understanding cartoon emotion using integrated deep neural network on large datasetNeural Computing and Applications10.1007/s00521-021-06003-934:24(21481-21501)Online publication date: 21-Apr-2021
  • (2020)Dense feature pyramid network for cartoon dog parsingThe Visual Computer10.1007/s00371-020-01887-5Online publication date: 9-Jul-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media