research-article

ToonNet: a cartoon image dataset and a DNN-based semantic classification system

Authors:

Xubo YangAuthors Info & Claims

VRCAI '18: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

Article No.: 30, Pages 1 - 8

https://doi.org/10.1145/3284398.3284403

Published: 02 December 2018 Publication History

Abstract

Cartoon-style pictures can be seen almost everywhere in our daily life. Numerous applications try to deal with cartoon pictures, a dataset of cartoon pictures will be valuable for these applications. In this paper, we first present ToonNet: a cartoon-style image recognition dataset. We construct our benchmark set by 4000 images in 12 different classes collected from the Internet with little manual filtration. We extend the basal dataset to 10000 images by adopting several methods, including snapshots of rendered 3D models with a cartoon shader, a 2D-3D-2D converting procedure using a cartoon-modeling method and a hand-drawing stylization filter. Then, we describe how to build an effective neural network for image semantic classification based on ToonNet. We present three techniques for building the Deep Neural Network (DNN), namely, IUS: Inputs Unified Stylization, stylizing the inputs to reduce the complexity of hand-drawn cartoon images; FIN: Feature Inserted Network, inserting intuitionistic and valuable global features into the network; NPN: Network Plus Network, using multiple single networks as a new mixed network. We show the efficacy and generality of our network strategies in our experiments. By utilizing these techniques, the classification accuracy can reach 78% (top-1) and 93%(top-3), which has an improvement of about 5% (top-1) compared with classical DNNs.

References

[1]

{n. d.}. Half Lambert. https://developer.valvesoftware.com/wiki/Half_Lambert.

[2]

Andrew D. Bagdanov. 2012. Color Attributes for Object Detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3306--3313. http://dl.acm.org/citation.cfm?id=2354409.2354951

Digital Library

[3]

Neeraj Bhargava, Prakriti Trivedi, Akanksha Toshniwal, and Himanshu Swarnkar. 2013. Iterative Region Merging and Object Retrieval Method Using Mean Shift Segmentation and Flood Fill Algorithm. In Third International Conference on Advances in Computing and Communications. 157--160.

Digital Library

[4]

Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. (2017), 527--536.

[5]

Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR abs/1605.07678 (2016). arXiv:1605.07678 http://arxiv.org/abs/1605.07678

[6]

L. C. Chen, G Papandreou, I Kokkinos, K Murphy, and A. L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40, 4 (2018), 834--848.

[7]

Dorin Comaniciu and Peter Meer. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 5 (2002), 603--619.

Digital Library

[8]

Jia Deng, Wei Dong, R. Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. Proc of IEEE Computer Vision & Pattern Recognition (2009), 248--255.

[9]

Hao Dong, Simiao Yu, Chao Wu, and Yike Guo. 2017. Semantic Image Synthesis via Adversarial Learning. CoRR abs/1707.06873 (2017). arXiv:1707.06873 http://arxiv.org/abs/1707.06873

[10]

Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (2010), 303--338.

Digital Library

[11]

Lele Feng, Xubo Yang, and Shuangjiu Xiao. 2017. MagicToon: A 2D-to-3D creative cartoon modeling system with mobile AR. In Virtual Reality. 195--204.

[12]

Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.

[13]

Hinton GE and Salakhutdinov RR. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.

[14]

Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 Object Category Dataset. California Institute of Technology (2007).

[15]

David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. CoRR abs/1704.03477 (2017). arXiv:1704.03477 http://arxiv.org/abs/1704.03477

[16]

KaimingHe, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. (2016), 770--778.

[17]

Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computer Science 3, 4 (2012), págs. 212--223.

[18]

Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. CoRR abs/1703.06868 (2017). arXiv:1703.06868 http://arxiv.org/abs/1703.06868

[19]

Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167 (2015). arXiv:1502.03167 http://arxiv.org/abs/1502.03167

Digital Library

[20]

Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).

[21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems. 1097--1105.

Digital Library

[22]

Yann Lecun and Corinna Cortes. 2010. The mnist database of handwritten digits. (2010).

[23]

Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network In Network. Computer Science (2013).

[24]

Yifan Liu, Zengchang Qin, Zhenbo Luo, and Hua Wang. 2017. Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. (2017).

[25]

Cewu Lu, Li Xu, and Jiaya Jia. 2012. Combining Sketch and Tone for Pencil Drawing Production. In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR '12). Eurographics Association, Goslar Germany, Germany, 65--73. http://dl.acm.org/citation.cfm?id=2330147.2330161

Digital Library

[26]

Jason Mitchell, Moby Francke, and Dhabih Eng. 2007. Illustrative rendering in Team Fortress 2. In International Symposium on Non-Photorealistic Animation and Rendering. 71--76.

Digital Library

[27]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2014. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2014), 211--252.

Digital Library

[28]

Jurgen Schmidhuber. 2012. Multi-column Deep Neural Networks for Image Classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3642--3649. http://dl.acm.org/citation.cfm?id=2354409.2354694

Digital Library

[29]

Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In European Conference on Computer Vision. 1--15.

Digital Library

[30]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556

[31]

C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015a. Going deeper with convolutions. 00 (June 2015), 1--9.

[32]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015b. Rethinking the Inception Architecture for Computer Vision. Computer Science (2015), 2818--2826.

[33]

Jun Yu and Hock-Soon Seah. 2011. Fuzzy Diffusion Distance Learning for Cartoon Similarity Estimation. J. Comput. Sci. Technol. 26, 2 (March 2011), 203--216.

Digital Library

Cited By

Mallidi SBellapukonda SGollapudi NMalaka KSreeshanth PPolisetti S(2025)Deep Learning in Cartoon Moderation: Distinguishing Child-Friendly Content with CNN ArchitecturesCognitive Computing and Cyber Physical Systems10.1007/978-3-031-77075-3_20(241-256)Online publication date: 9-Feb-2025
https://doi.org/10.1007/978-3-031-77075-3_20
Qi ZPan DNiu TYing ZShi P(2024)Bridge the gap between practical application scenarios and cartoon character detection: A benchmark dataset and deep learning modelDisplays10.1016/j.displa.2024.10279384(102793)Online publication date: Sep-2024
https://doi.org/10.1016/j.displa.2024.102793
Qi ZPan DNiu TYing ZShi P(2024)CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application ScenariosDigital Multimedia Communications10.1007/978-981-97-3626-3_27(369-381)Online publication date: 21-Jun-2024
https://doi.org/10.1007/978-981-97-3626-3_27
Show More Cited By

Index Terms

ToonNet: a cartoon image dataset and a DNN-based semantic classification system
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Recommendations

A New Microorganism Dataset for Image Segmentation and Classification Evaluation
ISICDM 2020: The Fourth International Symposium on Image Computing and Digital Medicine

Environmental Microorganism Data Set Fifth Version (EMDS-5) is a microscopic image dataset including original Environmental Microorganism (EM) images and two sets of Ground Truth (GT) images. The GT image sets include a single-object GT image set and a ...
Research on Fine-grained Classification of Scene Images Fused with Multimodality
ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and Computing

Millions of datasets initiatives enable machine learning algorithms to reach near-human classification performance at natural scene image classification. Text can be ubiquitous and indispensable in urban and artificial environments to achieve complete ...
Automatically selecting shots for action movie trailers
MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrieval

Movie trailers, or previews, are an important method of advertising movies. They are extensively shown before movies in cinemas, as well as on television and increasingly, over the Internet. Making a trailer is a creative process, in which a number of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

VRCAI '18: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry

December 2018

200 pages

ISBN:9781450360876

DOI:10.1145/3284398

Conference Chairs:
Koji Mikami
Tokyo University of Technology, Japan
,
Zhigeng Pan
Hangzhou Normal University, China
,
Matt Adcock
Australian National University, Australia
,
Daniel Thalmann
EPFL, Switzerland
,
Program Chairs:
Xubo Yang
Shanghai Jiao Tong University, China
,
Tomoki Itamiya
Aichi University of Technology, Japan
,
Enhua Wu
IOS/CAS & University of Macau, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

VRCAI '18

Sponsor:

SIGGRAPH

VRCAI '18: International Conference on Virtual Reality Continuum and its Applications in Industry

December 2 - 3, 2018

Tokyo, Japan

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
351
Total Downloads

Downloads (Last 12 months)34
Downloads (Last 6 weeks)5

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mallidi SBellapukonda SGollapudi NMalaka KSreeshanth PPolisetti S(2025)Deep Learning in Cartoon Moderation: Distinguishing Child-Friendly Content with CNN ArchitecturesCognitive Computing and Cyber Physical Systems10.1007/978-3-031-77075-3_20(241-256)Online publication date: 9-Feb-2025
https://doi.org/10.1007/978-3-031-77075-3_20
Qi ZPan DNiu TYing ZShi P(2024)Bridge the gap between practical application scenarios and cartoon character detection: A benchmark dataset and deep learning modelDisplays10.1016/j.displa.2024.10279384(102793)Online publication date: Sep-2024
https://doi.org/10.1016/j.displa.2024.102793
Qi ZPan DNiu TYing ZShi P(2024)CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application ScenariosDigital Multimedia Communications10.1007/978-981-97-3626-3_27(369-381)Online publication date: 21-Jun-2024
https://doi.org/10.1007/978-981-97-3626-3_27
Li YLao LCui ZShan SYang J(2022)Graph Jigsaw Learning for Cartoon Face RecognitionIEEE Transactions on Image Processing10.1109/TIP.2022.317795231(3961-3972)Online publication date: 2022
https://doi.org/10.1109/TIP.2022.3177952
Aldahoul NKarim HAbdullah MWazir AFauzi MTan MMansor SLyn H(2021)An Evaluation of Traditional and CNN-Based Feature Descriptors for Cartoon Pornography DetectionIEEE Access10.1109/ACCESS.2021.30643929(39910-39925)Online publication date: 2021
https://doi.org/10.1109/ACCESS.2021.3064392
Jain NGupta VShubham SMadan AChaudhary ASantosh K(2021)Understanding cartoon emotion using integrated deep neural network on large datasetNeural Computing and Applications10.1007/s00521-021-06003-934:24(21481-21501)Online publication date: 21-Apr-2021
https://doi.org/10.1007/s00521-021-06003-9
Wan JMougeot GYang X(2020)Dense feature pyramid network for cartoon dog parsingThe Visual Computer10.1007/s00371-020-01887-5Online publication date: 9-Jul-2020
https://doi.org/10.1007/s00371-020-01887-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten