Skip to main content
Log in

Dense feature pyramid network for cartoon dog parsing

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

While traditional cartoon character drawings are simple for humans to create, it remains a highly challenging task for machines to interpret. Parsing is a way to alleviate the issue with fine-grained semantic segmentation of images. Although well studied on naturalistic images, research toward cartoon parsing is very sparse. Due to the lack of available dataset and the diversity of artwork styles, the difficulty of the cartoon character parsing task is greater than the well-known human parsing task. In this paper, we study one type of cartoon instance: cartoon dogs. We introduce a novel dataset toward cartoon dog parsing and create a new deep convolutional neural network (DCNN) to tackle the problem. Our dataset contains 965 precisely annotated cartoon dog images with seven semantic part labels. Our new model, called dense feature pyramid network (DFPnet), makes use of recent popular techniques on semantic segmentation to efficiently handle cartoon dog parsing. We achieve a mIoU of 68.39%, a Mean Accuracy of 79.4% and a Pixel Accuracy of 93.5% on our cartoon dog validation set. Our method outperforms state-of-the-art models of similar tasks trained on our dataset: CE2P for single human parsing and Mask R-CNN for instance segmentation. We hope this work can be used as a starting point for future research toward digital artwork understanding with DCNN. Our DFPnet and dataset will be publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. de Juan, C.N., Bodenheimer, B.: In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 06, p. 223232. Eurographics Association, Goslar, DEU (2006)

  2. Entem, E., Parakkat, A.D., Barthe, L., Muthuganapathy, R., Cani, M.P.: Automatic structuring of organic shapes from a single drawing. Computers and Graphics 81, 125–139 (2019)

    Article  Google Scholar 

  3. Weng, C., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3D character animation from a single photo. In: CoRR (2018). arxiv:1812.02246

  4. Entem, E., Barthe, L., Cani, M.P., Cordier, F., van de Panne, M.: Modeling 3D animals from a side-view sketch. Comput Graph 46(C), 221–230 (2015). https://doi.org/10.1016/j.cag.2014.09.037

    Article  Google Scholar 

  5. Feng, L., Yang, X., Xiao, S.: In: 2017 IEEE Virtual Reality (VR), pp. 195–204 (2017). https://doi.org/10.1109/VR.2017.7892247

  6. Yang, L., Song, Q., Wang, Z., Jiang, M.: Parsing R-CNN for instance-level human analysis. In: CoRR (2018). arXiv:1811.12596

  7. Liu, T., Ruan, T., Huang, Z., Wei, Y., Wei, S., Zhao, Y., Huang, T.: Devil in the details: towards accurate single and multiple human parsing. In: CoRR (2018). arXiv:1809.05996

  8. LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541 (1989). https://doi.org/10.1162/neco.1989.1.4.541

    Article  Google Scholar 

  9. Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using Manga109 annotations. In: CoRR (2018). arXiv:1803.08670

  10. Gurin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J., Louis, G., Ogier, J., Revel, A.: In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1145–1149 (2013). https://doi.org/10.1109/ICDAR.2013.232

  11. Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pp. 891–898. IEEE Computer Society, Washington, DC, USA, (2014). https://doi.org/10.1109/CVPR.2014.119

  12. Everingham, M., Eslami, S.M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98 (2015). https://doi.org/10.1007/s11263-014-0733-5

    Article  Google Scholar 

  13. Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CoRR (2016). arXiv:1604.01685

  14. Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: CoRR (2014). arXiv:1405.0312

  15. Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481 (2017). https://doi.org/10.1109/TPAMI.2016.2644615

    Article  Google Scholar 

  16. Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with Atrous separable convolution for semantic image segmentation. In: CoRR (2018). arXiv:1802.02611

  17. Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2012). https://doi.org/10.1109/CVPR.2012.6248101

  18. Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.L.: Detect what you can: detecting and representing objects using holistic models and body parts. In: CoRR (2014). arXiv:1406.2031

  19. Liang, X., Gong, K., Shen, X., Lin, L.: Look into person: joint body parsing and pose estimation network and a new benchmark. In: CoRR (2018). arXiv:1804.01984

  20. Gong, K., Liang, X., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CoRR (2017). arXiv:1703.05446

  21. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CoRR (2016). arxiv:1612.03144

  22. Ghiasi, G., Fowlkes, C.C.: Laplacian reconstruction and refinement for semantic segmentation. In: CoRR (2016). arXiv:1605.02264

  23. Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. Ib: CoRR (2017). arXiv:1706.05587

  24. Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. In: CoRR (2016). arXiv:1606.00915

  25. Wu, H., Wu, Y., Zhang, S., Li, P., Wen, Z.: In: 2016 IEEE International Conference on Signal and Image Processing (ICSIP), pp. 277–281. (2016). https://doi.org/10.1109/SIPROCESS.2016.7888267

  26. Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: CoRR (2018). arXiv:1808.00157

  27. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: CoRR (2015). arXiv:1506.01497

  28. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: CoRR (2017). arXiv:1703.06870

  29. Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: In: International Conference on Computer Vision (2019). arXiv:1908.07678

  30. Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. In: CoRR (2017). arXiv:1711.07971

  31. Buades, A., Coll, B., Morel, J.: In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2(2), pp. 60–65. (2005). https://doi.org/10.1109/CVPR.2005.38

  32. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: CoRR (2014). arXiv:1406.4729

  33. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: In: CVPR (2017)

  34. Zhao, J., Li, J., Cheng, Y., Zhou, L., Sim, T., Yan, S., Feng, J.: Understanding humans in crowded scenes: deep nested adversarial learning and a new benchmark for multi-human parsing. In: CoRR (2018). arXiv:1804.03287

  35. Long, J., Shelhamer, E., Darrell, T.: In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965

  36. Lin, G., Milan, A., Shen, C., Reid, I.D.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: CoRR (2016). arXiv:1611.06612

  37. Matsui, Y., Ito, K., Aramaki, Y., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using Manga109 dataset. In: CoRR (2015). arXiv:1510.04389

  38. Zhou, Y., Jin, Y., Luo, A., Chan, S., Xiao, X., Yang, X.: In: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, VRCAI ’18, pp. 30:1–30:8. ACM, New York (2018). https://doi.org/10.1145/3284398.3284403

  39. Dutta, A., Zisserman, A.: In: Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, ACM, New York (2019). https://doi.org/10.1145/3343031.3350535

  40. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CoRR (2015). arXiv:1512.03385

  41. Ioffe, S., Szegedy, C.: In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Volume 37, ICML’15, pp. 448–456. (JMLR.org, 2015). http://dl.acm.org/citation.cfm?id=3045118.3045167

  42. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. In: CoRR (2015). arXiv:1505.00853

  43. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: CoRR (2015). arXiv:1505.04597

  44. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, R. Garnett (eds) Advances in Neural Information Processing Systems 32, Curran Associates Inc, pp. 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  45. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-fei, L.: In: CVPR (2009)

  46. Abdulla, W.:. Mask R-CNN for object detection and instance segmentation on keras and tensorflow (2017). https://github.com/matterport/Mask_RCNN

  47. Papandreou, G., Zhu, T., Chen, L., Gidaris, S., Tompson, J., Murphy, K.: PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: CoRR (2018). arXiv:1803.08225

  48. Li, Y., Bian, X., Chang, M., Wen, L., Lyu, S.: In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2018). https://doi.org/10.1109/AVSS.2018.8639428

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jerome Wan.

Ethics declarations

Funding

This work was partially supported by the National Key Research and Development Program of China (2018YFB1004902) and the National Natural Science Foundation of China (61772329, 61373085).

Conflict of interest

The authors declare that they have no conflict of interest.

Availability of data and material

The data that support the findings of this study will be openly available at https://github.com/Jer7/Dense-Feature-Pyramid-network

Code availability

The code that supports the findings of this study will be openly available in dense feature pyramid network at https://github.com/Jer7/Dense-Feature-Pyramid-network

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wan, J., Mougeot, G. & Yang, X. Dense feature pyramid network for cartoon dog parsing. Vis Comput 36, 2471–2483 (2020). https://doi.org/10.1007/s00371-020-01887-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-020-01887-5

Keywords

Navigation