research-article

An End-to-End Quadrilateral Regression Network for Comic Panel Extraction

Authors:

Ling CaiAuthors Info & Claims

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 887 - 895

https://doi.org/10.1145/3240508.3240555

Published: 15 October 2018 Publication History

Abstract

Comic panel extraction, i.e., decomposing a comic page image into panels, has become a fundamental technique for meeting many practical needs of mobile comic reading such as comic content adaptation and comic animating. Most of existing approaches are based on handcrafted low-level visual patterns and heuristics rules, thus having limited ability to deal with irregular comic panels. Only one existing method is based on deep learning and achieves better experimental results, but its architecture is redundant and its time efficiency is not good. To address these problems, we propose an end-to-end, two-stage quadrilateral regressing network architecture for comic panel detection, which inherits the architecture of Faster R-CNN. At the first stage, we propose a quadrilateral region proposal network for generating panel proposals, based on a newly proposed quadrilateral regression method. At the second stage, we classify the proposals and refine their shapes with the proposed quadrilateral regression method again. Extensive experimental results demonstrate that the proposed method significantly outperforms the existing comic panel detection methods on multiple datasets by F1-score and page accuracy.

References

[1]

Kohei Arai and Herman Tolle. 2010. Automatic e-comic content adaptation. International Journal of Ubiquitous Computing, Vol. 1, 1 (2010), 1--11.

[2]

Ying Cao, Xufang Pang, Antoni B Chan, and Rynson WH Lau. 2017. Dynamic Manga: Animating Still Manga via Camera Movement. IEEE Transactions on Multimedia, Vol. 19, 1 (2017), 160--172.

Digital Library

[3]

Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, Vol. 32, 9 (2010), 1627--1645.

Digital Library

[4]

Azuma Fujimoto, Toru Ogawa, Kazuyoshi Yamamoto, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2016. Manga109 dataset and creation of metadata. In Proceedings of the 1st International Workshop on coMics ANalysis, Processing and Understanding. ACM, 2.

Digital Library

[5]

Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 1440--1448.

Digital Library

[6]

Eunjung Han, Kirak Kim, HwangKyu Yang, and Keechul Jung. 2007. Frame segmentation used mlp-based xy recursive for mobile cartoon content. In International Conference on Human-Computer Interaction. Springer, 872--881.

Digital Library

[7]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017a. Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.

[8]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[9]

Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. 2017b. Deep Direct Regression for Multi-oriented Scene Text Detection. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 745--753.

[10]

Luyuan Li, Yongtao Wang, Liangcai Gao, Zhi Tang, and Ching Y Suen. 2014. Comic2CEBX: A system for automatic comic content adaptation. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press, 299--308.

Digital Library

[11]

Luyuan Li, Yongtao Wang, Ching Y Suen, Zhi Tang, and Dong Liu. 2015. A tree conditional random field model for panel detection in comic images. Pattern Recognition, Vol. 48, 7 (2015), 2129--2140.

Digital Library

[12]

Luyuan Li, Yongtao Wang, Zhi Tang, and Dong Liu. 2013. Comic image understanding based on polygon detection. In IS&T/SPIE Electronic Imaging . International Society for Optics and Photonics, 86580B--86580B.

[13]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In CVPR, Vol. 1. 4.

[14]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal loss for dense object detection. Proceedings of the IEEE conference on computer vision and pattern recognition (2018).

[15]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. Springer, 740--755.

[16]

Dong Liu, Yongtao Wang, Zhi Tang, Luyuan Li, and Liangcai Gao. 2013. Automatic comic page image understanding based on edge segment analysis. In IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 90210J--90210J.

[17]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.

[18]

Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 8.

[19]

Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, Vol. 76, 20 (2017), 21811--21838.

Digital Library

[20]

Xufang Pang, Ying Cao, Rynson WH Lau, and Antoni B Chan. 2014. A robust panel extraction method for manga. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 1125--1128.

Digital Library

[21]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W .

[22]

Christophe Ponsard and Vincent Fries. 2009. Enhancing the accessibility for all of digital comic books. vol. I 5 (2009), 127--144.

[23]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.

[24]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition .

[25]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.

Digital Library

[26]

Christophe Rigaud, Nam Le Thanh, J-C Burie, J-M Ogier, Motoi Iwata, Eiki Imazu, and Koichi Kise. 2015. Speech balloon and speaker association for comics and manga understanding. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 351--355.

Digital Library

[27]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.

Digital Library

[28]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.

[29]

Takamasa Tanaka, Kenji Shoji, Fubito Toyama, and Juichi Miyamichi. 2007. Layout Analysis of Tree-Structured Scene Frames in Comic Images. In IJCAI, Vol. 7. 2885--2890.

Digital Library

[30]

Yongtao Wang, Yafeng Zhou, and Zhi Tang. 2015. Comic frame extraction via line segments combination. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 856--860.

Digital Library

[31]

Masashi Yamada, Rahmat Budiarto, Mamoru Endo, and Shinya Miyazaki. 2004. Comic image decomposition for reading comics on cellular phones. IEICE transactions on information and systems, Vol. 87, 6 (2004), 1370--1376.

[32]

He Zheqi, Zhou Yafeng, Wang Yongtao, and Tang Zhi. 2017. SReN: Shape Regression Network for Comic Storyboard Extraction. In AAAI.

[33]

Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An Efficient and Accurate Scene Text Detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2642--2651.

Cited By

Li XZeng CYao YQian JZhang HZhang SYang S(2025)A Novel Quadrilateral Contour Disentangled Algorithm for Industrial Instrument Reading DetectionEntropy10.3390/e2702012227:2(122)Online publication date: 24-Jan-2025
https://doi.org/10.3390/e27020122
Sachdeva RZisserman A(2024)The Manga Whisperer: Automatically Generating Transcriptions for Comics2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01232(12967-12976)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01232
Sharma RKukreja V(2024)Image segmentation, classification and recognition methods for comics: A decade systematic literature reviewEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107715131(107715)Online publication date: May-2024
https://doi.org/10.1016/j.engappai.2023.107715
Show More Cited By

Index Terms

An End-to-End Quadrilateral Regression Network for Comic Panel Extraction
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

A Robust Panel Extraction Method for Manga
MM '14: Proceedings of the 22nd ACM international conference on Multimedia

Automatically extracting frames/panels from digital comic pages is crucial for techniques that facilitate comic reading on mobile devices with limited display areas. However, automatic panel extraction for manga, i.e., Japanese comics, can be especially ...
Automatic panel extraction of color comic images
PCM'07: Proceedings of the multimedia 8th Pacific Rim conference on Advances in multimedia information processing

In this paper, an automatic approach for detecting and extracting panels in a color comic image is proposed. Panel extraction is challenging because the background color, the background pixel locations, the panel shapes and the panel layout are not ...
Film comic reflecting camera-works
MMM'12: Proceedings of the 18th international conference on Advances in Multimedia Modeling

We propose a novel technique for automatically creating film comics reflecting the camera-works of an original movie. Camera-works are one of the most important effects contributing to the mise en scene of the movie. A skilled director can use the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '18: Proceedings of the 26th ACM international conference on Multimedia

October 2018

2167 pages

ISBN:9781450356657

DOI:10.1145/3240508

General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China

Conference

MM '18

Sponsor:

SIGMM

MM '18: ACM Multimedia Conference

October 22 - 26, 2018

Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
354
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)3

Reflects downloads up to 25 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li XZeng CYao YQian JZhang HZhang SYang S(2025)A Novel Quadrilateral Contour Disentangled Algorithm for Industrial Instrument Reading DetectionEntropy10.3390/e2702012227:2(122)Online publication date: 24-Jan-2025
https://doi.org/10.3390/e27020122
Sachdeva RZisserman A(2024)The Manga Whisperer: Automatically Generating Transcriptions for Comics2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01232(12967-12976)Online publication date: 16-Jun-2024
https://doi.org/10.1109/CVPR52733.2024.01232
Sharma RKukreja V(2024)Image segmentation, classification and recognition methods for comics: A decade systematic literature reviewEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107715131(107715)Online publication date: May-2024
https://doi.org/10.1016/j.engappai.2023.107715
Rishu Kukreja V(2024)Decoding comics: a systematic literature review on recognition, segmentation, and classification techniques with emphasis on computer vision and non-computer visionMultimedia Tools and Applications10.1007/s11042-024-20214-xOnline publication date: 1-Oct-2024
https://doi.org/10.1007/s11042-024-20214-x
Sachdeva RShin GZisserman A(2024)Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character NamesComputer Vision – ACCV 202410.1007/978-981-96-0908-6_4(63-80)Online publication date: 7-Dec-2024
https://doi.org/10.1007/978-981-96-0908-6_4
Théodose RBurie J(2024)VisEmoComic: Visual Emotion Recognition in Comics ImagePattern Recognition10.1007/978-3-031-78495-8_18(281-296)Online publication date: 4-Dec-2024
https://doi.org/10.1007/978-3-031-78495-8_18
Vivoli ECampaioli INardoni MBiondi NBertini MKaratzas D(2024)Comics Datasets Framework: Mix of Comics Datasets for Detection BenchmarkingDocument Analysis and Recognition – ICDAR 2024 Workshops10.1007/978-3-031-70645-5_11(154-167)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.1007/978-3-031-70645-5_11
Sharma RKukreja V(2023)Detecting the Past: Advancements in Comic Panel Detection for Cultural Heritage Preservation2023 4th International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI60145.2023.10629627(529-532)Online publication date: 25-Oct-2023
https://doi.org/10.1109/ICDABI60145.2023.10629627
Sharma RKukreja V(2023)Fusing Deep Learning and Ensemble Techniques for Comic Character Emotion Recognition2023 4th International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI60145.2023.10629621(160-164)Online publication date: 25-Oct-2023
https://doi.org/10.1109/ICDABI60145.2023.10629621
Théodose RBurie J(2023)KangaiSet: A Dataset for Visual Emotion Recognition on MangaDocument Analysis and Recognition – ICDAR 2023 Workshops10.1007/978-3-031-41498-5_9(120-134)Online publication date: 15-Aug-2023
https://doi.org/10.1007/978-3-031-41498-5_9
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten