Dense feature pyramid network for cartoon dog parsing

Wan, Jerome; Mougeot, Guillaume; Yang, Xubo

doi:10.1007/s00371-020-01887-5

Dense feature pyramid network for cartoon dog parsing

Original article
Published: 09 July 2020

Volume 36, pages 2471–2483, (2020)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Jerome Wan¹,
Guillaume Mougeot¹ &
Xubo Yang¹

405 Accesses
2 Citations
Explore all metrics

Abstract

While traditional cartoon character drawings are simple for humans to create, it remains a highly challenging task for machines to interpret. Parsing is a way to alleviate the issue with fine-grained semantic segmentation of images. Although well studied on naturalistic images, research toward cartoon parsing is very sparse. Due to the lack of available dataset and the diversity of artwork styles, the difficulty of the cartoon character parsing task is greater than the well-known human parsing task. In this paper, we study one type of cartoon instance: cartoon dogs. We introduce a novel dataset toward cartoon dog parsing and create a new deep convolutional neural network (DCNN) to tackle the problem. Our dataset contains 965 precisely annotated cartoon dog images with seven semantic part labels. Our new model, called dense feature pyramid network (DFPnet), makes use of recent popular techniques on semantic segmentation to efficiently handle cartoon dog parsing. We achieve a mIoU of 68.39%, a Mean Accuracy of 79.4% and a Pixel Accuracy of 93.5% on our cartoon dog validation set. Our method outperforms state-of-the-art models of similar tasks trained on our dataset: CE2P for single human parsing and Mask R-CNN for instance segmentation. We hope this work can be used as a starting point for future research toward digital artwork understanding with DCNN. Our DFPnet and dataset will be publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Special perceptual parsing for Chinese landscape painting scene understanding: a semantic segmentation approach

Article 27 December 2023

Deep Learning Does Not Generalize Well to Recognizing Cats and Dogs in Chinese Paintings

Insights from a Large-Scale Database of Material Depictions in Paintings

References

de Juan, C.N., Bodenheimer, B.: In: Proceedings of the 2006 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA 06, p. 223232. Eurographics Association, Goslar, DEU (2006)
Entem, E., Parakkat, A.D., Barthe, L., Muthuganapathy, R., Cani, M.P.: Automatic structuring of organic shapes from a single drawing. Computers and Graphics 81, 125–139 (2019)
Article Google Scholar
Weng, C., Curless, B., Kemelmacher-Shlizerman, I.: Photo wake-up: 3D character animation from a single photo. In: CoRR (2018). arxiv:1812.02246
Entem, E., Barthe, L., Cani, M.P., Cordier, F., van de Panne, M.: Modeling 3D animals from a side-view sketch. Comput Graph 46(C), 221–230 (2015). https://doi.org/10.1016/j.cag.2014.09.037
Article Google Scholar
Feng, L., Yang, X., Xiao, S.: In: 2017 IEEE Virtual Reality (VR), pp. 195–204 (2017). https://doi.org/10.1109/VR.2017.7892247
Yang, L., Song, Q., Wang, Z., Jiang, M.: Parsing R-CNN for instance-level human analysis. In: CoRR (2018). arXiv:1811.12596
Liu, T., Ruan, T., Huang, Z., Wei, Y., Wei, S., Zhao, Y., Huang, T.: Devil in the details: towards accurate single and multiple human parsing. In: CoRR (2018). arXiv:1809.05996
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541 (1989). https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using Manga109 annotations. In: CoRR (2018). arXiv:1803.08670
Gurin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J., Louis, G., Ogier, J., Revel, A.: In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1145–1149 (2013). https://doi.org/10.1109/ICDAR.2013.232
Mottaghi, R., Chen, X., Liu, X., Cho, N.G., Lee, S.W., Fidler, S., Urtasun, R., Yuille, A.: In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’14, pp. 891–898. IEEE Computer Society, Washington, DC, USA, (2014). https://doi.org/10.1109/CVPR.2014.119
Everingham, M., Eslami, S.M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98 (2015). https://doi.org/10.1007/s11263-014-0733-5
Article Google Scholar
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. In: CoRR (2016). arXiv:1604.01685
Lin, T., Maire, M., Belongie, S.J., Bourdev, L.D., Girshick, R.B., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: CoRR (2014). arXiv:1405.0312
Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder–decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481 (2017). https://doi.org/10.1109/TPAMI.2016.2644615
Article Google Scholar
Chen, L., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder–decoder with Atrous separable convolution for semantic image segmentation. In: CoRR (2018). arXiv:1802.02611
Yamaguchi, K., Kiapour, M.H., Ortiz, L.E., Berg, T.L.: In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3570–3577 (2012). https://doi.org/10.1109/CVPR.2012.6248101
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.L.: Detect what you can: detecting and representing objects using holistic models and body parts. In: CoRR (2014). arXiv:1406.2031
Liang, X., Gong, K., Shen, X., Lin, L.: Look into person: joint body parsing and pose estimation network and a new benchmark. In: CoRR (2018). arXiv:1804.01984
Gong, K., Liang, X., Shen, X., Lin, L.: Look into person: self-supervised structure-sensitive learning and a new benchmark for human parsing. In: CoRR (2017). arXiv:1703.05446
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: CoRR (2016). arxiv:1612.03144
Ghiasi, G., Fowlkes, C.C.: Laplacian reconstruction and refinement for semantic segmentation. In: CoRR (2016). arXiv:1605.02264
Chen, L., Papandreou, G., Schroff, F., Adam, H.: Rethinking Atrous convolution for semantic image segmentation. Ib: CoRR (2017). arXiv:1706.05587
Chen, L., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFS. In: CoRR (2016). arXiv:1606.00915
Wu, H., Wu, Y., Zhang, S., Li, P., Wen, Z.: In: 2016 IEEE International Conference on Signal and Image Processing (ICSIP), pp. 277–281. (2016). https://doi.org/10.1109/SIPROCESS.2016.7888267
Gong, K., Liang, X., Li, Y., Chen, Y., Yang, M., Lin, L.: Instance-level human parsing via part grouping network. In: CoRR (2018). arXiv:1808.00157
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: CoRR (2015). arXiv:1506.01497
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: CoRR (2017). arXiv:1703.06870
Zhu, Z., Xu, M., Bai, S., Huang, T., Bai, X.: In: International Conference on Computer Vision (2019). arXiv:1908.07678
Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. In: CoRR (2017). arXiv:1711.07971
Buades, A., Coll, B., Morel, J.: In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2(2), pp. 60–65. (2005). https://doi.org/10.1109/CVPR.2005.38
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: CoRR (2014). arXiv:1406.4729
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: In: CVPR (2017)
Zhao, J., Li, J., Cheng, Y., Zhou, L., Sim, T., Yan, S., Feng, J.: Understanding humans in crowded scenes: deep nested adversarial learning and a new benchmark for multi-human parsing. In: CoRR (2018). arXiv:1804.03287
Long, J., Shelhamer, E., Darrell, T.: In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965
Lin, G., Milan, A., Shen, C., Reid, I.D.: RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: CoRR (2016). arXiv:1611.06612
Matsui, Y., Ito, K., Aramaki, Y., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using Manga109 dataset. In: CoRR (2015). arXiv:1510.04389
Zhou, Y., Jin, Y., Luo, A., Chan, S., Xiao, X., Yang, X.: In: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and Its Applications in Industry, VRCAI ’18, pp. 30:1–30:8. ACM, New York (2018). https://doi.org/10.1145/3284398.3284403
Dutta, A., Zisserman, A.: In: Proceedings of the 27th ACM International Conference on Multimedia, MM ’19, ACM, New York (2019). https://doi.org/10.1145/3343031.3350535
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CoRR (2015). arXiv:1512.03385
Ioffe, S., Szegedy, C.: In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Volume 37, ICML’15, pp. 448–456. (JMLR.org, 2015). http://dl.acm.org/citation.cfm?id=3045118.3045167
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. In: CoRR (2015). arXiv:1505.00853
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: CoRR (2015). arXiv:1505.04597
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chintala, S.: In: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’ Alché-Buc, E. Fox, R. Garnett (eds) Advances in Neural Information Processing Systems 32, Curran Associates Inc, pp. 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-fei, L.: In: CVPR (2009)
Abdulla, W.:. Mask R-CNN for object detection and instance segmentation on keras and tensorflow (2017). https://github.com/matterport/Mask_RCNN
Papandreou, G., Zhu, T., Chen, L., Gidaris, S., Tompson, J., Murphy, K.: PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. In: CoRR (2018). arXiv:1803.08225
Li, Y., Bian, X., Chang, M., Wen, L., Lyu, S.: In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2018). https://doi.org/10.1109/AVSS.2018.8639428

Download references

Author information

Authors and Affiliations

Shanghai Jiaotong University, Shanghai, China
Jerome Wan, Guillaume Mougeot & Xubo Yang

Authors

Jerome Wan
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Mougeot
View author publications
You can also search for this author in PubMed Google Scholar
Xubo Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jerome Wan.

Ethics declarations

Funding

This work was partially supported by the National Key Research and Development Program of China (2018YFB1004902) and the National Natural Science Foundation of China (61772329, 61373085).

Conflict of interest

The authors declare that they have no conflict of interest.

Availability of data and material

The data that support the findings of this study will be openly available at https://github.com/Jer7/Dense-Feature-Pyramid-network

Code availability

The code that supports the findings of this study will be openly available in dense feature pyramid network at https://github.com/Jer7/Dense-Feature-Pyramid-network

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wan, J., Mougeot, G. & Yang, X. Dense feature pyramid network for cartoon dog parsing. Vis Comput 36, 2471–2483 (2020). https://doi.org/10.1007/s00371-020-01887-5

Download citation

Published: 09 July 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s00371-020-01887-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dense feature pyramid network for cartoon dog parsing

Abstract

Access this article

Similar content being viewed by others

Special perceptual parsing for Chinese landscape painting scene understanding: a semantic segmentation approach

Deep Learning Does Not Generalize Well to Recognizing Cats and Dogs in Chinese Paintings

Insights from a Large-Scale Database of Material Depictions in Paintings

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Conflict of interest

Availability of data and material

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dense feature pyramid network for cartoon dog parsing

Abstract

Access this article

Similar content being viewed by others

Special perceptual parsing for Chinese landscape painting scene understanding: a semantic segmentation approach

Deep Learning Does Not Generalize Well to Recognizing Cats and Dogs in Chinese Paintings

Insights from a Large-Scale Database of Material Depictions in Paintings

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Conflict of interest

Availability of data and material

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation