research-article

CartoonNet: Cartoon Parsing with Semantic Consistency and Structure Correlation

Authors:

Yu-Pei SongAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 729 - 737

https://doi.org/10.1145/3664647.3680879

Published: 28 October 2024 Publication History

Abstract

Cartoon parsing is an important task for cartoon-centric applications, which segments the body parts of cartoon images. Due to the complex appearances, abstract drawing styles, and irregular structures of cartoon characters, cartoon parsing remains a challenging task. In this paper, a novel approach, named CartoonNet, is proposed for cartoon parsing, in which semantic consistency and structure correlation are integrated to address the visual diversity and structural complexity for cartoon parsing. A memory-based semantic consistency module is designed to learn the diverse appearances exhibited by cartoon characters. The memory bank stores features of diverse samples and retrieves the samples related to new samples for consistency, which aims to improve the semantic reasoning capability of the network. A self-attention mechanism is employed to conduct consistency learning among diverse body parts belong to the retrieved samples and new samples. To capture the intricate structural information of cartoon images, a structure correlation module is proposed. Leveraging graph attention networks and a main body-aware mechanism, the proposed approach enables structural correlation, allowing it to parse cartoon images with complex structures. Experiments conducted on cartoon parsing and human parsing datasets demonstrate the effectiveness of the proposed method, which outperforms the state-of-the-art approaches for cartoon parsing and achieves competitive performance on human parsing.

References

[1]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 12 (2017), 2481--2495.

[2]

Shaked Brody, Uri Alon, and Eran Yahav. 2022. How Attentive are Graph Attention Networks?. In International Conference on Learning Representations.

[3]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 4 (2017), 834--848.

[4]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In European Conference on Computer Vision. 833--851.

[5]

Weihua Chen, Xianzhe Xu, Jian Jia, Hao Luo, Yaohua Wang, Fan Wang, Rong Jin, and Xiuyu Sun. 2023. Beyond Appearance: A Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks. In IEEE Conference on Computer Vision and Pattern Recognition. 15050--15061.

[6]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The Cityscapes Dataset for Semantic Urban Scene Understanding. In IEEE Conference on Computer Vision and Pattern Recognition. 3213--3223.

[7]

Jun Fu, Jing Liu, Haijie Tian, Zhiwei Fang, and Hanqing Lu. 2019. Dual Attention Network for Scene Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 3146--3154.

[8]

Ke Gong, Xiaodan Liang, Dongyu Zhang, Xiaohui Shen, and Liang Lin. 2017. Look into Person: Self-Supervised Structure-Sensitive Learning and a New Benchmark for Human Parsing. In IEEE Conference on Computer Vision and Pattern Recognition. 6757--6765.

[9]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. In IEEE International Conference on Computer Vision. 2980--2988.

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[11]

Adrián Javaloy, Pablo Sánchez-Martín, Amit Levi, and Isabel Valera. 2023. Learnable Graph Convolutional Attention Networks. In International Conference on Learning Representations.

[12]

Ruyi Ji, Dawei Du, Libo Zhang, Longyin Wen, Yanjun Wu, Chen Zhao, Feiyue Huang, and Siwei Lyu. 2020. Learning Semantic Neural Tree for Human Parsing. In European Conference on Computer Vision. 205--221.

[13]

Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, and Kwanghoon Sohn. 2022. Pin the Memory: Learning to Generalize Semantic Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 4340--4350.

[14]

Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2022. Self-Correction for Human Parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 6 (2022), 3260--3271.

[15]

Kunliang Liu, Ouk Choi, Jianming Wang, and Wonjun Hwang. 2022. CDGNet: Class Distribution Guided Network for Human Parsing. In IEEE Conference on Computer Vision and Pattern Recognition. 4463--4472.

[16]

Yunan Liu, Liang Zhao, Shanshan Zhang, and Jian Yang. 2020. Hybrid Resolution Network Using Edge Guided Region Mutual Information Loss for Human Parsing. In ACM International Conference on Multimedia. 1670--1678.

[17]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully Convolutional Networks for Semantic Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 3431--3440.

[18]

Yawei Luo, Zhedong Zheng, Liang Zheng, Tao Guan, Junqing Yu, and Yi Yang. 2018. Macro-Micro Adversarial Network for Human Parsing. In European Conference on Computer Vision. 424--440.

[19]

Xuecheng Nie, Jiashi Feng, and Shuicheng Yan. 2018. Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation. In European Conference on Computer Vision. 519--534.

Digital Library

[20]

Tomas Pfister, James Charles, and Andrew Zisserman. 2015. Flowing ConvNets for Human Pose Estimation in Videos. In IEEE International Conference on Computer Vision. 1913--1921.

[21]

Jian-Jun Qiao, Zhi-Qi Cheng, Xiao Wu, Wei Li, and Ji Zhang. 2022. Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation. In ACM International Conference on Multimedia. 6300--6308.

[22]

Jian-Jun Qiao, Xiao Wu, Jun-Yan He, Wei Li, and Qiang Peng. 2022. SWNet: A Deep Learning Based Approach for Splashed Water Detection on Road. IEEE Transactions on Intelligent Transportation Systems, Vol. 23, 4 (2022), 3012--3025.

Digital Library

[23]

Jian-Jun Qiao, Jie Zhang, Xiao Wu, Yu-Pei Song, and Wei Li. 2023. CPNet: Cartoon Parsing with Pixel and Part Correlation. In ACM International Conference on Multimedia. 6888--6897.

[24]

Tiago Ramalho and Marta Garnelo. 2019. Adaptive Posterior Learning: few-shot learning with a surprise-based memory module. In International Conference on Learning Representations.

[25]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention. 234--241.

[26]

Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, and Yao Zhao. 2019. Devil in the Details: Towards Accurate Single and Multiple Human Parsing. In Association for the Advancement of Artificial Intelligence. 4814--4821.

[27]

Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations.

[28]

Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. 2019. Deep High-Resolution Representation Learning for Human Pose Estimation. In IEEE Conference on Computer Vision and Pattern Recognition. 5693--5703.

[29]

Shixiang Tang, Cheng Chen, Qingsong Xie, Meilin Chen, Yizhou Wang, Yuanzheng Ci, Lei Bai, Feng Zhu, Haiyang Yang, Li Yi, Rui Zhao, and Wanli Ouyang. 2023. HumanBench: Towards General Human-Centric Perception with Projector Assisted Pretraining. In IEEE Conference on Computer Vision and Pattern Recognition. 21970--21982.

[30]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems. 5998--6008.

[31]

Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. In International Conference on Learning Representations.

[32]

Jerome Wan, Guillaume Mougeot, and Xubo Yang. 2020. Dense Feature Pyramid Network for Cartoon Dog Parsing. The Visual Computer, Vol. 36, 10 (2020), 2471--2483.

Digital Library

[33]

Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, and Bin Xiao. 2021. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 43, 10 (2021), 3349--3364.

[34]

Wenguan Wang, Zhijie Zhang, Siyuan Qi, Jianbing Shen, Yanwei Pang, and Ling Shao. 2019. Learning Compositional Neural Information Fusion for Human Parsing. In IEEE International Conference on Computer Vision. 5702--5712.

[35]

Wenguan Wang, Hailong Zhu, Jifeng Dai, Yanwei Pang, Jianbing Shen, and Ling Shao. 2020. Hierarchical Human Parsing with Typed Part-Relation Reasoning. In IEEE Conference on Computer Vision and Pattern Recognition. 8926--8936.

[36]

Zhonghua Wu, Xiangxi Shi, Guosheng Lin, and Jianfei Cai. 2021. Learning Meta-class Memory for Few-Shot Semantic Segmentation. In IEEE International Conference on Computer Vision. 497--506.

[37]

Guo-Sen Xie, Huan Xiong, Jie Liu, Yazhou Yao, and Ling Shao. 2021. Few-Shot Semantic Segmentation with Cyclic Memory Network. In IEEE International Conference on Computer Vision. 7273--7282.

[38]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation. In European Conference on Computer Vision. 334--349.

[39]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Learning a Discriminative Feature Network for Semantic Segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 1857--1866.

[40]

Yuhui Yuan, Xilin Chen, and Jingdong Wang. 2020. Object-Contextual Representations for Semantic Segmentation. In IEEE International Conference on Computer Vision. 173--190.

[41]

Dan Zeng, Yuhang Huang, Qian Bao, Junjie Zhang, Chi Su, and Wu Liu. 2021. Neural Architecture Search for Joint Human Parsing and Pose Estimation. In IEEE International Conference on Computer Vision. 11365--11374.

[42]

Ji Zhang, Xiao Wu, Zhi-Qi Cheng, Qi He, and Wei Li. 2023. Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment. In ACM International Conference on Multimedia. 8515--8524.

[43]

Xiaomei Zhang, Yingying Chen, Bingke Zhu, Jinqiao Wang, and Ming Tang. 2020. Blended Grammar Network for Human Parsing. In European Conference on Computer Vision. 189--205.

[44]

Ziwei Zhang, Chi Su, Liang Zheng, and Xiaodong Xie. 2020. Correlating Edge, Pose with Parsing. In IEEE Conference on Computer Vision and Pattern Recognition. 8897--8906.

Index Terms

CartoonNet: Cartoon Parsing with Semantic Consistency and Structure Correlation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Image segmentation

Recommendations

CPNet: Cartoon Parsing with Pixel and Part Correlation
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Cartoon parsing, the task of segmenting constituent parts such as heads, arms, and legs of cartoon characters, holds substantial significance for applications in the animation industry and emerging metaverse. Nonetheless, this domain presents ...
LLLR parsing
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing

The idea of an LLLR parsing is presented. An LLLR(k) parser can be constructed for any LR(k) grammar but it produces the left parse of the input string in linear time (in respect to the length of the derivation) without backtracking. If used as a basis ...
Phrase structure parsing with dependency structure
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics: Posters

In this paper we present a novel phrase structure parsing approach with the help of dependency structure. Different with existing phrase parsers, in our approach the inference procedure is guided by dependency structure, which makes the parsing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Key R&D Program of Guangxi Zhuang Autonomous Region, China
National Natural Science Foundation of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
47
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)5

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten