skip to main content
10.1145/3284398.3284403acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

ToonNet: a cartoon image dataset and a DNN-based semantic classification system

Authors Info & Claims
Published:02 December 2018Publication History

ABSTRACT

Cartoon-style pictures can be seen almost everywhere in our daily life. Numerous applications try to deal with cartoon pictures, a dataset of cartoon pictures will be valuable for these applications. In this paper, we first present ToonNet: a cartoon-style image recognition dataset. We construct our benchmark set by 4000 images in 12 different classes collected from the Internet with little manual filtration. We extend the basal dataset to 10000 images by adopting several methods, including snapshots of rendered 3D models with a cartoon shader, a 2D-3D-2D converting procedure using a cartoon-modeling method and a hand-drawing stylization filter. Then, we describe how to build an effective neural network for image semantic classification based on ToonNet. We present three techniques for building the Deep Neural Network (DNN), namely, IUS: Inputs Unified Stylization, stylizing the inputs to reduce the complexity of hand-drawn cartoon images; FIN: Feature Inserted Network, inserting intuitionistic and valuable global features into the network; NPN: Network Plus Network, using multiple single networks as a new mixed network. We show the efficacy and generality of our network strategies in our experiments. By utilizing these techniques, the classification accuracy can reach 78% (top-1) and 93%(top-3), which has an improvement of about 5% (top-1) compared with classical DNNs.

References

  1. {n. d.}. Half Lambert. https://developer.valvesoftware.com/wiki/Half_Lambert.Google ScholarGoogle Scholar
  2. Andrew D. Bagdanov. 2012. Color Attributes for Object Detection. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3306--3313. http://dl.acm.org/citation.cfm?id=2354409.2354951 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Neeraj Bhargava, Prakriti Trivedi, Akanksha Toshniwal, and Himanshu Swarnkar. 2013. Iterative Region Merging and Object Retrieval Method Using Mean Shift Segmentation and Flood Fill Algorithm. In Third International Conference on Advances in Computing and Communications. 157--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. (2017), 527--536.Google ScholarGoogle Scholar
  5. Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An Analysis of Deep Neural Network Models for Practical Applications. CoRR abs/1605.07678 (2016). arXiv:1605.07678 http://arxiv.org/abs/1605.07678Google ScholarGoogle Scholar
  6. L. C. Chen, G Papandreou, I Kokkinos, K Murphy, and A. L. Yuille. 2018. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis & Machine Intelligence 40, 4 (2018), 834--848.Google ScholarGoogle ScholarCross RefCross Ref
  7. Dorin Comaniciu and Peter Meer. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 24, 5 (2002), 603--619. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jia Deng, Wei Dong, R. Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. Proc of IEEE Computer Vision & Pattern Recognition (2009), 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  9. Hao Dong, Simiao Yu, Chao Wu, and Yike Guo. 2017. Semantic Image Synthesis via Adversarial Learning. CoRR abs/1707.06873 (2017). arXiv:1707.06873 http://arxiv.org/abs/1707.06873Google ScholarGoogle Scholar
  10. Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) Challenge. International Journal of Computer Vision 88, 2 (2010), 303--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lele Feng, Xubo Yang, and Shuangjiu Xiao. 2017. MagicToon: A 2D-to-3D creative cartoon modeling system with mobile AR. In Virtual Reality. 195--204.Google ScholarGoogle Scholar
  12. Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. In IEEE Conference on Computer Vision and Pattern Recognition. 2414--2423.Google ScholarGoogle Scholar
  13. Hinton GE and Salakhutdinov RR. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.Google ScholarGoogle Scholar
  14. Gregory Griffin, Alex Holub, and Pietro Perona. 2007. Caltech-256 Object Category Dataset. California Institute of Technology (2007).Google ScholarGoogle Scholar
  15. David Ha and Douglas Eck. 2017. A Neural Representation of Sketch Drawings. CoRR abs/1704.03477 (2017). arXiv:1704.03477 http://arxiv.org/abs/1704.03477Google ScholarGoogle Scholar
  16. KaimingHe, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. (2016), 770--778.Google ScholarGoogle Scholar
  17. Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. Computer Science 3, 4 (2012), págs. 212--223.Google ScholarGoogle Scholar
  18. Xun Huang and Serge J. Belongie. 2017. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. CoRR abs/1703.06868 (2017). arXiv:1703.06868 http://arxiv.org/abs/1703.06868Google ScholarGoogle Scholar
  19. Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR abs/1502.03167 (2015). arXiv:1502.03167 http://arxiv.org/abs/1502.03167Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images. (2009).Google ScholarGoogle Scholar
  21. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet classification with deep convolutional neural networks. In International Conference on Neural Information Processing Systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yann Lecun and Corinna Cortes. 2010. The mnist database of handwritten digits. (2010).Google ScholarGoogle Scholar
  23. Min Lin, Qiang Chen, and Shuicheng Yan. 2013. Network In Network. Computer Science (2013).Google ScholarGoogle Scholar
  24. Yifan Liu, Zengchang Qin, Zhenbo Luo, and Hua Wang. 2017. Auto-painter: Cartoon Image Generation from Sketch by Using Conditional Generative Adversarial Networks. (2017).Google ScholarGoogle Scholar
  25. Cewu Lu, Li Xu, and Jiaya Jia. 2012. Combining Sketch and Tone for Pencil Drawing Production. In Proceedings of the Symposium on Non-Photorealistic Animation and Rendering (NPAR '12). Eurographics Association, Goslar Germany, Germany, 65--73. http://dl.acm.org/citation.cfm?id=2330147.2330161 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jason Mitchell, Moby Francke, and Dhabih Eng. 2007. Illustrative rendering in Team Fortress 2. In International Symposium on Non-Photorealistic Animation and Rendering. 71--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, and Michael Bernstein. 2014. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision 115, 3 (2014), 211--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jurgen Schmidhuber. 2012. Multi-column Deep Neural Networks for Image Classification. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (CVPR '12). IEEE Computer Society, Washington, DC, USA, 3642--3649. http://dl.acm.org/citation.cfm?id=2354409.2354694 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jamie Shotton, John Winn, Carsten Rother, and Antonio Criminisi. 2006. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation. In European Conference on Computer Vision. 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556Google ScholarGoogle Scholar
  31. C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015a. Going deeper with convolutions. 00 (June 2015), 1--9.Google ScholarGoogle Scholar
  32. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2015b. Rethinking the Inception Architecture for Computer Vision. Computer Science (2015), 2818--2826.Google ScholarGoogle Scholar
  33. Jun Yu and Hock-Soon Seah. 2011. Fuzzy Diffusion Distance Learning for Cartoon Similarity Estimation. J. Comput. Sci. Technol. 26, 2 (March 2011), 203--216. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ToonNet: a cartoon image dataset and a DNN-based semantic classification system

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          VRCAI '18: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry
          December 2018
          200 pages
          ISBN:9781450360876
          DOI:10.1145/3284398

          Copyright © 2018 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 December 2018

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate51of107submissions,48%

          Upcoming Conference

          SIGGRAPH '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader