Abstract
Deep learning’s achievements in computer vision have poised cartoon character detection (CCD) as a promising tool for intellectual property protection. However, due to the lack of suitable cartoon character datasets, CCD is still a less explored field and there are many issues need to be addressed to meet the demands of practical applications such as merchandise, advertising, and patent examination. In this paper, we introduce CCDaS, a comprehensive benchmark dataset comprising 55,608 images of 524 renowned cartoon characters from 227 works, including cartoons, games, and merchandise. To our knowledge, CCDaS is the most extensive CCD dataset tailored for real-world applications. Alongside, we also provide a CCD algorithm that can achieve accurate detection of animated images in complex practical application scenarios, called multi-path YOLO (MP-YOLO). Experimental results show that our MP-YOLO achieves better detection results on the CCDaS dataset. Comparative and ablation studies further validate the effectiveness of our CCD dataset and algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Khan, F.S., Anwer, R.M., Weijer, J.V.D., Bagdanov, A.D., Lopez, A.M.: Color attributes for object detection. In: Proceedings /CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2012)
Zhou, Y., Jin, Y., Luo, A., Chan, S., Xiao, X., Yang, X.: Toonnet: a cartoon image dataset and a DNN-based semantic classification system. In: Proceedings of the 16th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry, pp. 1–8 (2018)
Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appli. 76(20), 21811–21838 (2017)
Zheng, Y., et al.: Cartoon face recognition: a benchmark dataset. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2264–2272 (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst. 28 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Jackson, P.T., Abarghouei, A.A., Bonner, S., Breckon, T.P., Obara, B.: Style augmentation: data augmentation via style randomization. In: CVPR Workshops, vol. 6, pp. 10–11 (2019)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Li, C., et al.: YOLOv6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976 (2022)
Wang, C.Y., Bochkovskiy, A., Liao, H.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv e-prints (2022)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)
Sun, P., et al.: Sparse R-CNN: End-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Xiong, Y., et al.: Mobiledets: searching for object detection architectures for mobile accelerators. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3825–3834 (2021)
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: task-aligned one-stage object detection. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3490–3499. IEEE Computer Society (2021)
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inform. Process. Syst. 30 (2017)
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Mehta, S., Rastegari, M.: Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. arXiv preprint arXiv:2110.02178 (2021)
Peng, Z., Huang, W., Gu, S., Xie, L., Wang, Y., Jiao, J., Ye, Q.: Conformer: Local features coupling global representations for visual recognition (2021)
Carion, N., et al.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Li, Y., Wu, C.Y., Fan, H., Mangalam, K., Xiong, B., Malik, J., Feichtenhofer, C.: Improved multiscale vision transformers for classification and detection. arXiv preprint arXiv:2112.01526 (2021)
He, L., Todorovic, S.: DESTR: object detection with split transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9377–9386 (2022)
Bai, Y., Chen, Z., Fu, Z., Peng, L., Liang, P., Cheng, E.: Curveformer: 3d lane detection by curve propagation with curve queries and attention. arXiv preprint arXiv:2209.07989 (2022)
Mishra, A., Rai, S.N., Mishra, A., Jawahar, C.V.: IIIT-CFW: a benchmark database of cartoon faces in the wild. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 35–47. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_3
Nguyen, N.V., Rigaud, C., Burie, J.C.: Comic characters detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 3, pp. 41–46. IEEE (2017)
Chu, W.T., Li, W.W.: Manga facenet: face detection in manga based on deep neural network. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 412–415 (2017)
Jha, S., Agarwal, N., Agarwal, S.: Bringing cartoons to life: Towards improved cartoon face detection and recognition systems. arXiv preprint arXiv:1804.01753 (2018)
Zhang, B., Li, J., Wang, Y., Cui, Z., Xia, Y., Wang, C., Li, J., Huang, F.: Acfd: asymmetric cartoon face detector. arXiv preprint arXiv:2007.00899 (2020)
Li, Z.: Comparison and analysis of two cartoon face recognition. In: 2021 2nd International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), pp. 478–482. IEEE (2021)
Li, Y., Lao, L., Cui, Z., Shan, S., Yang, J.: Graph jigsaw learning for cartoon face recognition. IEEE Trans. Image Process. 31, 3961–3972 (2022)
Wang, Y.: Animation character detection algorithm based on clustering and cascaded SSD. Sci. Program. 2022 (2022)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Glenn., J.: YOLOv5 release v6.1. (2022). https://github.com/ultralytics/yolov5/releases/tag/v6
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Qi, Z., Pan, D., Niu, T., Ying, Z., Shi, P. (2024). CCDaS: A Benchmark Dataset for Cartoon Character Detection in Application Scenarios. In: Zhai, G., Zhou, J., Ye, L., Yang, H., An, P., Yang, X. (eds) Digital Multimedia Communications. IFTC 2023. Communications in Computer and Information Science, vol 2067. Springer, Singapore. https://doi.org/10.1007/978-981-97-3626-3_27
Download citation
DOI: https://doi.org/10.1007/978-981-97-3626-3_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-3625-6
Online ISBN: 978-981-97-3626-3
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)