skip to main content
10.1145/3240508.3240555acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

An End-to-End Quadrilateral Regression Network for Comic Panel Extraction

Published: 15 October 2018 Publication History

Abstract

Comic panel extraction, i.e., decomposing a comic page image into panels, has become a fundamental technique for meeting many practical needs of mobile comic reading such as comic content adaptation and comic animating. Most of existing approaches are based on handcrafted low-level visual patterns and heuristics rules, thus having limited ability to deal with irregular comic panels. Only one existing method is based on deep learning and achieves better experimental results, but its architecture is redundant and its time efficiency is not good. To address these problems, we propose an end-to-end, two-stage quadrilateral regressing network architecture for comic panel detection, which inherits the architecture of Faster R-CNN. At the first stage, we propose a quadrilateral region proposal network for generating panel proposals, based on a newly proposed quadrilateral regression method. At the second stage, we classify the proposals and refine their shapes with the proposed quadrilateral regression method again. Extensive experimental results demonstrate that the proposed method significantly outperforms the existing comic panel detection methods on multiple datasets by F1-score and page accuracy.

References

[1]
Kohei Arai and Herman Tolle. 2010. Automatic e-comic content adaptation. International Journal of Ubiquitous Computing, Vol. 1, 1 (2010), 1--11.
[2]
Ying Cao, Xufang Pang, Antoni B Chan, and Rynson WH Lau. 2017. Dynamic Manga: Animating Still Manga via Camera Movement. IEEE Transactions on Multimedia, Vol. 19, 1 (2017), 160--172.
[3]
Pedro F Felzenszwalb, Ross B Girshick, David McAllester, and Deva Ramanan. 2010. Object detection with discriminatively trained part-based models. IEEE transactions on pattern analysis and machine intelligence, Vol. 32, 9 (2010), 1627--1645.
[4]
Azuma Fujimoto, Toru Ogawa, Kazuyoshi Yamamoto, Yusuke Matsui, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2016. Manga109 dataset and creation of metadata. In Proceedings of the 1st International Workshop on coMics ANalysis, Processing and Understanding. ACM, 2.
[5]
Ross Girshick. 2015. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 1440--1448.
[6]
Eunjung Han, Kirak Kim, HwangKyu Yang, and Keechul Jung. 2007. Frame segmentation used mlp-based xy recursive for mobile cartoon content. In International Conference on Human-Computer Interaction. Springer, 872--881.
[7]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017a. Mask r-cnn. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2980--2988.
[8]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[9]
Wenhao He, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu. 2017b. Deep Direct Regression for Multi-oriented Scene Text Detection. In Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 745--753.
[10]
Luyuan Li, Yongtao Wang, Liangcai Gao, Zhi Tang, and Ching Y Suen. 2014. Comic2CEBX: A system for automatic comic content adaptation. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press, 299--308.
[11]
Luyuan Li, Yongtao Wang, Ching Y Suen, Zhi Tang, and Dong Liu. 2015. A tree conditional random field model for panel detection in comic images. Pattern Recognition, Vol. 48, 7 (2015), 2129--2140.
[12]
Luyuan Li, Yongtao Wang, Zhi Tang, and Dong Liu. 2013. Comic image understanding based on polygon detection. In IS&T/SPIE Electronic Imaging . International Society for Optics and Photonics, 86580B--86580B.
[13]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In CVPR, Vol. 1. 4.
[14]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal loss for dense object detection. Proceedings of the IEEE conference on computer vision and pattern recognition (2018).
[15]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In European Conference on Computer Vision. Springer, 740--755.
[16]
Dong Liu, Yongtao Wang, Zhi Tang, Luyuan Li, and Liangcai Gao. 2013. Automatic comic page image understanding based on edge segment analysis. In IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, 90210J--90210J.
[17]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21--37.
[18]
Yuliang Liu and Lianwen Jin. 2017. Deep matching prior network: Toward tighter multi-oriented text detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 8.
[19]
Yusuke Matsui, Kota Ito, Yuji Aramaki, Azuma Fujimoto, Toru Ogawa, Toshihiko Yamasaki, and Kiyoharu Aizawa. 2017. Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications, Vol. 76, 20 (2017), 21811--21838.
[20]
Xufang Pang, Ying Cao, Rynson WH Lau, and Antoni B Chan. 2014. A robust panel extraction method for manga. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 1125--1128.
[21]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In NIPS-W .
[22]
Christophe Ponsard and Vincent Fries. 2009. Enhancing the accessibility for all of digital comic books. vol. I 5 (2009), 127--144.
[23]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 779--788.
[24]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, Faster, Stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition .
[25]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems. 91--99.
[26]
Christophe Rigaud, Nam Le Thanh, J-C Burie, J-M Ogier, Motoi Iwata, Eiki Imazu, and Koichi Kise. 2015. Speech balloon and speaker association for comics and manga understanding. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 351--355.
[27]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.
[28]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations.
[29]
Takamasa Tanaka, Kenji Shoji, Fubito Toyama, and Juichi Miyamichi. 2007. Layout Analysis of Tree-Structured Scene Frames in Comic Images. In IJCAI, Vol. 7. 2885--2890.
[30]
Yongtao Wang, Yafeng Zhou, and Zhi Tang. 2015. Comic frame extraction via line segments combination. In Document Analysis and Recognition (ICDAR), 2015 13th International Conference on. IEEE, 856--860.
[31]
Masashi Yamada, Rahmat Budiarto, Mamoru Endo, and Shinya Miyazaki. 2004. Comic image decomposition for reading comics on cellular phones. IEICE transactions on information and systems, Vol. 87, 6 (2004), 1370--1376.
[32]
He Zheqi, Zhou Yafeng, Wang Yongtao, and Tang Zhi. 2017. SReN: Shape Regression Network for Comic Storyboard Extraction. In AAAI.
[33]
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. EAST: An Efficient and Accurate Scene Text Detector. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2642--2651.

Cited By

View all
  • (2025)A Novel Quadrilateral Contour Disentangled Algorithm for Industrial Instrument Reading DetectionEntropy10.3390/e2702012227:2(122)Online publication date: 24-Jan-2025
  • (2024)The Manga Whisperer: Automatically Generating Transcriptions for Comics2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01232(12967-12976)Online publication date: 16-Jun-2024
  • (2024)Image segmentation, classification and recognition methods for comics: A decade systematic literature reviewEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107715131(107715)Online publication date: May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. comics processing
  2. panel extraction
  3. quadrilateral object detection

Qualifiers

  • Research-article

Funding Sources

  • National Natural Science Foundation of China

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)3
Reflects downloads up to 25 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)A Novel Quadrilateral Contour Disentangled Algorithm for Industrial Instrument Reading DetectionEntropy10.3390/e2702012227:2(122)Online publication date: 24-Jan-2025
  • (2024)The Manga Whisperer: Automatically Generating Transcriptions for Comics2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52733.2024.01232(12967-12976)Online publication date: 16-Jun-2024
  • (2024)Image segmentation, classification and recognition methods for comics: A decade systematic literature reviewEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.107715131(107715)Online publication date: May-2024
  • (2024)Decoding comics: a systematic literature review on recognition, segmentation, and classification techniques with emphasis on computer vision and non-computer visionMultimedia Tools and Applications10.1007/s11042-024-20214-xOnline publication date: 1-Oct-2024
  • (2024)Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character NamesComputer Vision – ACCV 202410.1007/978-981-96-0908-6_4(63-80)Online publication date: 7-Dec-2024
  • (2024)VisEmoComic: Visual Emotion Recognition in Comics ImagePattern Recognition10.1007/978-3-031-78495-8_18(281-296)Online publication date: 4-Dec-2024
  • (2024)Comics Datasets Framework: Mix of Comics Datasets for Detection BenchmarkingDocument Analysis and Recognition – ICDAR 2024 Workshops10.1007/978-3-031-70645-5_11(154-167)Online publication date: 30-Aug-2024
  • (2023)Detecting the Past: Advancements in Comic Panel Detection for Cultural Heritage Preservation2023 4th International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI60145.2023.10629627(529-532)Online publication date: 25-Oct-2023
  • (2023)Fusing Deep Learning and Ensemble Techniques for Comic Character Emotion Recognition2023 4th International Conference on Data Analytics for Business and Industry (ICDABI)10.1109/ICDABI60145.2023.10629621(160-164)Online publication date: 25-Oct-2023
  • (2023)KangaiSet: A Dataset for Visual Emotion Recognition on MangaDocument Analysis and Recognition – ICDAR 2023 Workshops10.1007/978-3-031-41498-5_9(120-134)Online publication date: 15-Aug-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media