CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application Scenarios

Qi, Zelu; Pan, Da; Niu, Tianyi; Ying, Zefeng; Shi, Ping

doi:10.1007/978-981-97-3626-3_27

Zelu Qi¹⁰,
Da Pan¹⁰,
Tianyi Niu¹⁰,
Zefeng Ying¹⁰ &
…
Ping Shi¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2067))

Included in the following conference series:

International Forum on Digital TV and Wireless Multimedia Communications

342 Accesses

Abstract

Deep learning’s achievements in computer vision have poised cartoon character detection (CCD) as a promising tool for intellectual property protection. However, due to the lack of suitable cartoon character datasets, CCD is still a less explored field and there are many issues need to be addressed to meet the demands of practical applications such as merchandise, advertising, and patent examination. In this paper, we introduce CCDaS, a comprehensive benchmark dataset comprising 55,608 images of 524 renowned cartoon characters from 227 works, including cartoons, games, and merchandise. To our knowledge, CCDaS is the most extensive CCD dataset tailored for real-world applications. Alongside, we also provide a CCD algorithm that can achieve accurate detection of animated images in complex practical application scenarios, called multi-path YOLO (MP-YOLO). Experimental results show that our MP-YOLO achieves better detection results on the CCDaS dataset. Comparative and ablation studies further validate the effectiveness of our CCD dataset and algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CaltechFN: Distorted and Partially Occluded Digits

Annotation-Free Character Detection in Historical Vietnamese Stele Images

Character Prediction in TV Series via a Semantic Projection Network

References

Khan, F.S., Anwer, R.M., Weijer, J.V.D., Bagdanov, A.D., Lopez, A.M.: Color attributes for object detection. In: Proceedings /CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2012)
Google Scholar
Zhou, Y., Jin, Y., Luo, A., Chan, S., Xiao, X., Yang, X.: Toonnet: a cartoon image dataset and a DNN-based semantic classification system. In: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry, pp. 1–8 (2018)
Google Scholar
Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appli. 76(20), 21811–21838 (2017)
Article Google Scholar
Zheng, Y., et al.: Cartoon face recognition: a benchmark dataset. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2264–2272 (2020)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. 28 (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Jackson, P.T., Abarghouei, A.A., Bonner, S., Breckon, T.P., Obara, B.: Style augmentation: data augmentation via style randomization. In: CVPR Workshops, vol. 6, pp. 10–11 (2019)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar
Li, C., et al.: YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Wang, C.Y., Bochkovskiy, A., Liao, H.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv e-prints (2022)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)
Google Scholar
Sun, P., et al.: Sparse R-CNN: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Xiong, Y., et al.: Mobiledets: searching for object detection architectures for mobile accelerators. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3825–3834 (2021)
Google Scholar
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3490–3499. IEEE Computer Society (2021)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Mehta, S., Rastegari, M.: Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021)
Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: Local features coupling global representations for visual recognition (2021)
Google Scholar
Carion, N., et al.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Li, Y., Wu, C.Y., Fan, H., Mangalam, K., Xiong, B., Malik, J., Feichtenhofer, C.: Improved multiscale vision transformers for classification and detection. arXiv preprint arXiv:2112.01526 (2021)
He, L., Todorovic, S.: DESTR: object detection with split transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9377–9386 (2022)
Google Scholar
Bai, Y., Chen, Z., Fu, Z., Peng, L., Liang, P., Cheng, E.: Curveformer: 3d lane detection by curve propagation with curve queries and attention. arXiv preprint arXiv:2209.07989 (2022)
Mishra, A., Rai, S.N., Mishra, A., Jawahar, C.V.: IIIT-CFW: a benchmark database of cartoon faces in the wild. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 35–47. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_3
Chapter Google Scholar
Nguyen, N.V., Rigaud, C., Burie, J.C.: Comic characters detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 41–46. IEEE (2017)
Google Scholar
Chu, W.T., Li, W.W.: Manga facenet: face detection in manga based on deep neural network. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 412–415 (2017)
Google Scholar
Jha, S., Agarwal, N., Agarwal, S.: Bringing cartoons to life: Towards improved cartoon face detection and recognition systems. arXiv preprint arXiv:1804.01753 (2018)
Zhang, B., Li, J., Wang, Y., Cui, Z., Xia, Y., Wang, C., Li, J., Huang, F.: Acfd: asymmetric cartoon face detector. arXiv preprint arXiv:2007.00899 (2020)
Li, Z.: Comparison and analysis of two cartoon face recognition. In: 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pp. 478–482. IEEE (2021)
Google Scholar
Li, Y., Lao, L., Cui, Z., Shan, S., Yang, J.: Graph jigsaw learning for cartoon face recognition. IEEE Trans. Image Process. 31, 3961–3972 (2022)
Article PubMed Google Scholar
Wang, Y.: Animation character detection algorithm based on clustering and cascaded SSD. Sci. Program. 2022 (2022)
Google Scholar
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Glenn., J.: YOLOv5 release v6.1. (2022). https://github.com/ultralytics/yolov5/releases/tag/v6

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Communication University of China, Beijing, China
Zelu Qi, Da Pan, Tianyi Niu, Zefeng Ying & Ping Shi

Authors

Zelu Qi
View author publications
You can also search for this author in PubMed Google Scholar
Da Pan
View author publications
You can also search for this author in PubMed Google Scholar
Tianyi Niu
View author publications
You can also search for this author in PubMed Google Scholar
Zefeng Ying
View author publications
You can also search for this author in PubMed Google Scholar
Ping Shi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ping Shi .

Editor information

Editors and Affiliations

Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Shanghai Jiao Tong University, Shanghai, China
Jun Zhou
Communication University of China, Beijing, China
Long Ye
Shanghai Jiao Tong University, Shanghai, China
Hua Yang
Shanghai University, Shanghai, China
Ping An
Shanghai Jiao Tong University, Shanghai, China
Xiaokang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qi, Z., Pan, D., Niu, T., Ying, Z., Shi, P. (2024). CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application Scenarios. In: Zhai, G., Zhou, J., Ye, L., Yang, H., An, P., Yang, X. (eds) Digital Multimedia Communications. IFTC 2023. Communications in Computer and Information Science, vol 2067. Springer, Singapore. https://doi.org/10.1007/978-981-97-3626-3_27

Download citation

DOI: https://doi.org/10.1007/978-981-97-3626-3_27
Published: 21 June 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-3625-6
Online ISBN: 978-981-97-3626-3
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)

Publish with us

Policies and ethics

CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application Scenarios