ABSTRACT
Cartoon-style pictures can be seen almost everywhere in our daily life. Numerous applications try to deal with cartoon pictures, a dataset of cartoon pictures will be valuable for these applications. In this paper, we first present ToonNet: a cartoon-style image recognition dataset. We construct our benchmark set by 4000 images in 12 different classes collected from the Internet with little manual filtration. We extend the basal dataset to 10000 images by adopting several methods, including snapshots of rendered 3D models with a cartoon shader, a 2D-3D-2D converting procedure using a cartoon-modeling method and a hand-drawing stylization filter. Then, we describe how to build an effective neural network for image semantic classification based on ToonNet. We present three techniques for building the Deep Neural Network (DNN), namely, IUS: Inputs Unified Stylization, stylizing the inputs to reduce the complexity of hand-drawn cartoon images; FIN: Feature Inserted Network, inserting intuitionistic and valuable global features into the network; NPN: Network Plus Network, using multiple single networks as a new mixed network. We show the efficacy and generality of our network strategies in our experiments. By utilizing these techniques, the classification accuracy can reach 78% (top-1) and 93%(top-3), which has an improvement of about 5% (top-1) compared with classical DNNs.
- {n. d.}. Half Lambert. https://developer.valvesoftware.com/wiki/Half_Lambert.Google Scholar
- Andrew D. Bagdanov. 2012. Color Attributes for Object Detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3306--3313. http://dl.acm.org/citation.cfm?id=2354409.2354951 Google ScholarDigital Library
- Neeraj Bhargava, Prakriti Trivedi, Akanksha Toshniwal, and Himanshu Swarnkar. 2013. Iterative Region Merging and Object Retrieval Method Using Mean Shift Segmentation and Flood Fill Algorithm. In Third International Conference on Advances in Computing and Communications. 157--160. Google ScholarDigital Library
- Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. (2017), 527--536.Google Scholar
- Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR abs/1605.07678 (2016). arXiv:1605.07678 http://arxiv.org/abs/1605.07678Google Scholar
- L. C. Chen, G Papandreou, I Kokkinos, K Murphy, and A. L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40, 4 (2018), 834--848.Google ScholarCross Ref
- Dorin Comaniciu and Peter Meer. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 5 (2002), 603--619. Google ScholarDigital Library
- Jia Deng, Wei Dong, R. Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. Proc of IEEE Computer Vision & Pattern Recognition (2009), 248--255.Google ScholarCross Ref
- Hao Dong, Simiao Yu, Chao Wu, and Yike Guo. 2017. Semantic Image Synthesis via Adversarial Learning. CoRR abs/1707.06873 (2017). arXiv:1707.06873 http://arxiv.org/abs/1707.06873Google Scholar
- Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (2010), 303--338. Google ScholarDigital Library
- Lele Feng, Xubo Yang, and Shuangjiu Xiao. 2017. MagicToon: A 2D-to-3D creative cartoon modeling system with mobile AR. In Virtual Reality. 195--204.Google Scholar
- Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.Google Scholar
- Hinton GE and Salakhutdinov RR. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google Scholar
- Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 Object Category Dataset. California Institute of Technology (2007).Google Scholar
- David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. CoRR abs/1704.03477 (2017). arXiv:1704.03477 http://arxiv.org/abs/1704.03477Google Scholar
- KaimingHe, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. (2016), 770--778.Google Scholar
- Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computer Science 3, 4 (2012), págs. 212--223.Google Scholar
- Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. CoRR abs/1703.06868 (2017). arXiv:1703.06868 http://arxiv.org/abs/1703.06868Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167 (2015). arXiv:1502.03167 http://arxiv.org/abs/1502.03167Google ScholarDigital Library
- Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google Scholar
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarDigital Library
- Yann Lecun and Corinna Cortes. 2010. The mnist database of handwritten digits. (2010).Google Scholar
- Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network In Network. Computer Science (2013).Google Scholar
- Yifan Liu, Zengchang Qin, Zhenbo Luo, and Hua Wang. 2017. Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. (2017).Google Scholar
- Cewu Lu, Li Xu, and Jiaya Jia. 2012. Combining Sketch and Tone for Pencil Drawing Production. In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR '12). Eurographics Association, Goslar Germany, Germany, 65--73. http://dl.acm.org/citation.cfm?id=2330147.2330161 Google ScholarDigital Library
- Jason Mitchell, Moby Francke, and Dhabih Eng. 2007. Illustrative rendering in Team Fortress 2. In International Symposium on Non-Photorealistic Animation and Rendering. 71--76. Google ScholarDigital Library
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2014. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2014), 211--252. Google ScholarDigital Library
- Jurgen Schmidhuber. 2012. Multi-column Deep Neural Networks for Image Classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3642--3649. http://dl.acm.org/citation.cfm?id=2354409.2354694 Google ScholarDigital Library
- Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In European Conference on Computer Vision. 1--15. Google ScholarDigital Library
- Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556Google Scholar
- C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015a. Going deeper with convolutions. 00 (June 2015), 1--9.Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015b. Rethinking the Inception Architecture for Computer Vision. Computer Science (2015), 2818--2826.Google Scholar
- Jun Yu and Hock-Soon Seah. 2011. Fuzzy Diffusion Distance Learning for Cartoon Similarity Estimation. J. Comput. Sci. Technol. 26, 2 (March 2011), 203--216. Google ScholarDigital Library
Index Terms
- ToonNet: a cartoon image dataset and a DNN-based semantic classification system
Recommendations
A New Microorganism Dataset for Image Segmentation and Classification Evaluation
ISICDM 2020: The Fourth International Symposium on Image Computing and Digital MedicineEnvironmental Microorganism Data Set Fifth Version (EMDS-5) is a microscopic image dataset including original Environmental Microorganism (EM) images and two sets of Ground Truth (GT) images. The GT image sets include a single-object GT image set and a ...
Research on Fine-grained Classification of Scene Images Fused with Multimodality
ICMLC '23: Proceedings of the 2023 15th International Conference on Machine Learning and ComputingMillions of datasets initiatives enable machine learning algorithms to reach near-human classification performance at natural scene image classification. Text can be ubiquitous and indispensable in urban and artificial environments to achieve complete ...
Automatically selecting shots for action movie trailers
MIR '06: Proceedings of the 8th ACM international workshop on Multimedia information retrievalMovie trailers, or previews, are an important method of advertising movies. They are extensively shown before movies in cinemas, as well as on television and increasingly, over the Internet. Making a trailer is a creative process, in which a number of ...
Comments