Comic MTL: optimized multi-task learning for comic book image analysis

Nguyen, Nhu-Van; Rigaud, Christophe; Burie, Jean-Christophe

doi:10.1007/s10032-019-00330-3

Comic MTL: optimized multi-task learning for comic book image analysis

Special Issue Paper
Published: 17 July 2019

Volume 22, pages 265–284, (2019)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Nhu-Van Nguyen ORCID: orcid.org/0000-0003-2271-6918¹,
Christophe Rigaud¹ &
Jean-Christophe Burie¹

909 Accesses
18 Citations
1 Altmetric
Explore all metrics

Abstract

Comic book image analysis methods often propose multiple algorithms or models for multiple tasks like panel and character (body and face) detection, balloon segmentation, text recognition, etc. In this work, we aim to reduce the processing time for comic book image analysis by proposing one model that can learn multiple tasks called Comic MTL instead of using one model per task. In addition to detection and segmentation tasks, we integrate the relation analysis task for balloons and characters into the Comic MTL model. The experiments are carried out on DCM772 and eBDtheque public datasets that contain the annotations for panels, balloons, characters and also the associations between balloon and character. We show that the Comic MTL model can detect the associations between balloons and their speakers (comic characters) and handle other tasks like panel and character detection and also balloons segmentation with promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi-task Model for Comic Book Image Analysis

BCBId: first Bangla comic dataset and its applications

Article 15 September 2022

Arpita Dutta, Samit Biswas & Amit Kumar Das

Text-aware balloon extraction from manga

Article 11 April 2015

Xueting Liu, Chengze Li, … Xuemiao Xu

Notes

References

Arai, K., Tolle, H.: Method for automatic e-comic scene frame extraction for reading comic on mobile devices. In: 7th International Conference on Information Technology: New Generations, ITNG, pp. 370–375. IEEE Computer Society, Washington DC, USA (2010)
Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. 4(6), 669–676 (2011)
Google Scholar
Aramaki, Y., Matsui, Y., Yamasaki, T., Aizawa, K.: Text detection in manga by combining connected-component-based and region-based classifications. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 2901–2905 (2016)
Augereau, O., Iwata, M., Kise, K.: A survey of comics research in computer science. J. Imaging 4, 87 (2018)
Article Google Scholar
Baxter, J.: A model of inductive bias learning. J. Artif. Int. Res. 12(1), 149–198 (2000). http://dl.acm.org/citation.cfm?id=1622248.1622254
Bingel, J., Sogaard, A.: Identifying beneficial task relations for multi-task learning in deep neural networks. In: EACL (2017)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997). https://doi.org/10.1023/A:1007379606734
Article MathSciNet Google Scholar
Chu, W.T., Cheng, W.C.: Manga-specific features and latent style model for manga style analysis. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1332–1336 (2016)
Chu, W.T., Li, W.W.: Manga facenet: Face detection in manga based on deep neural network. In: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, pp. 412–415. ACM (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR09 (2009)
Everingham, M., Eslami, S.M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)
Article Google Scholar
Fujino, S., Mori, N., Matsumoto, K.: Recognizing the order of four-scene comics by evolutionary deep learning. In: Distributed Computing and Artificial Intelligence, pp. 136–144 (2015)
Guérin, C., Rigaud, C., Mercier, A., Ammar-Boudjelal, F., Bertet, K., Bouju, A., Burie, J.C., Louis, G., Ogier, J.M., Revel, A.: eBDtheque: A representative database of comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1145–1149 (2013)
Hashimoto, K., Xiong, C., Tsuruoka, Y., Socher, R.: A joint many-task model: Growing a neural network for multiple nlp tasks. In: EMNLP (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. CoRR abs/1703.06870 (2017)
He, Z., Zhou, Y., Wang, Y., Tang, Z.: Sren: Shape regression network for comic storyboard extraction. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4–9, 2017, San Francisco, California, pp. 4937–4938 (2017)
He, Z., Zhou, Y., Wang, Y., Wang, S., Lu, X., Tang, Z., Cai, L.: An end-to-end quadrilateral regression network for comic panel extraction. In: ACM Multimedia (2018)
Ho, A.K.N., Burie, J.C., Ogier, J.M.: Panel and Speech Balloon Extraction from Comic Books. 2012 10th IAPR International Workshop on Document Analysis Systems pp. 424–428 (2012)
Huang, Z., Li, J., Siniscalchi, S.M., Chen, I.F., Wu, J., Lee, C.H.: Rapid adaptation for deep neural networks through multi-task learning. In: INTERSPEECH (2015)
In, Y., Oie, T., Higuchi, M., Kawasaki, S., Koike, A., Murakami, H.: Fast frame decomposition and sorting by contour tracing for mobile phone comic images. Int. J. Syst. Appl. Eng. Dev. 5(2), 216–223 (2011)
Google Scholar
Kaiser, L., Gomez, A.N., Shazeer, N., Vaswani, A., Parmar, N., Jones, L., Uszkoreit, J.: One model to learn them all. CoRR abs/1706.05137 (2017)
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 7482–7491 (2018)
Khan, F.S., Anwer, R.M., van de Weijer, J., Bagdanov, A.D., Vanrell, M., Lopez, A.M.: Color attributes for object detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3306–3313 (2012)
Li, L., Wang, Y., Tang, Z., Gao, L.: Automatic comic page segmentation based on polygon detection. Multimed. Tools Appl. 69(1), 171–197 (2014)
Article Google Scholar
Liu, X., Li, C., Zhu, H., Wong, T.T., Xu, X.: Text-aware balloon extraction from manga. Vis. Comput. 32(4), 501–511 (2016)
Article Google Scholar
Liu, X., Wang, Y., Tang, Z.: A clump splitting based method to localize speech balloons in comics. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 901–905 (2015)
Matsui, Y., Ito, K., Aramaki, Y., Yamasaki, T., Aizawa, K.: Sketch-based manga retrieval using Manga109 dataset. CoRR abs/1510.04389 (2015)
Nguyen, N., Rigaud, C., Burie, J.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)
Article Google Scholar
Nguyen, N.V., Rigaud, C., Burie, J.: Comic characters detection using deep learning. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, MANPU 2017, Kyoto, Japan, November 9–15, 2017, pp. 41–46 (2017)
Nguyen, N.V., Rigaud, C., Burie, J.C.: Digital comics image indexing based on deep learning. J. Imaging 4(7), 89 (2018)
Article Google Scholar
Obispo, S.L., Kuboi, T.: Element detection in Japanese comic book panels (2014)
Ogawa, T., Otsubo, A., Narita, R., Matsui, Y., Yamasaki, T., Aizawa, K.: Object detection for comics using manga109 annotations. CoRR abs/1803.08670 (2018). arXiv:1803.08670
Pang, X., Cao, Y., Lau, R.W., Chan, A.B.: A robust panel extraction method for manga. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM ’14, pp. 1125–1128. ACM, New York (2014)
Plank, B., Alonso, H.M.: When is multitask learning effective? semantic sequence prediction under varying data conditions. In: EACL (2017)
Ponsard, C., Ramdoyal, R., Dziamski, D.: An OCR-enabled digital comic books viewer. In: Computers Helping People with Special Needs, pp. 471–478. Springer (2012)
Qin, X., Zhou, Y., He, Z., Wang, Y., Tang, Z.: A faster r-cnn based method for comic characters face detection. 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 01, pp. 1074–1080 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 91–99. Curran Associates Inc, Red Hook (2015)
Google Scholar
Rigaud, C., Burie, J., Ogier, J.: Segmentation-free speech text recognition for comic books. In: 2nd International Workshop on coMics Analysis, Processing, and Understanding, 2017, Kyoto, Japan, November 9-15, pp. 29–34 (2017)
Rigaud, C., Burie, J.C., Ogier, J.M.: Text-independent speech balloon segmentation for comics and manga. In: Graphic Recognition. Current Trends and Challenges: 11th International Workshop, GREC 2015, Nancy, France, pp. 133–147. Cham (2017)
Rigaud, C., Guérin, C., Karatzas, D., Burie, J.C., Ogier, J.M.: Knowledge-driven understanding of images in comic books. Int. J. Doc. Anal. Recogn. 18(3), 199–221 (2015)
Article Google Scholar
Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: An active contour model for speech balloon detection in comics. In: Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1240–1244 (2013)
Rigaud, C., Karatzas, D., Van de Weijer, J., Burie, J.C., Ogier, J.M.: Automatic text localisation in scanned comic books. In: Proceedings of the 8th International Conference on Computer Vision Theory and Applications (VISAPP) (2013)
Rigaud, C., Thanh, N.L., Burie, J.., Ogier, J.., Iwata, M., Imazu, E., Kise, K.: Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 351–355 (2015)
Rigaud, C., Tsopze, N., Burie, J.C., Ogier, J.M.: Robust frame and text extraction from comic books. In: Graphics Recognition. New Trends and Challenges, vol. 7423, pp. 129–138. Springer, Berlin (2013)
Stommel, M., Merhej, L.I., Müller, M.G.: Segmentation-free detection of comic panels. In: Computer Vision and Graphics, pp. 633–640. Springer (2012)
Sun, W., Burie, J.C., Ogier, J.M., Kise, K.: Specific comic character detection using local feature matching. In: 12th International Conference on Document Analysis and Recognition, pp. 275–279. Washington, DC (2013)
Tanaka, T., Shoji, K., Toyama, F., Miyamichi, J.: Layout analysis of tree-structured scene frames in comic images. In: IJCAI’07, pp. 2885–2890 (2007)
Yamada, M., Budiarto, R., Endo, M., Miyazaki, S.: Comic image decomposition for reading comics on cellular phones. IEICE Trans. 87–D(6), 1370–1376 (2004)
Google Scholar
Zamir, A.R., Sax, A., Shen, W.B., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: Disentangling task transfer learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 3712–3722 (2018)
Zhao, W., Wang, B., Ye, J., Yang, M., Zhao, Z., Luo, R., Qiao, Y.: A multi-task learning approach for image captioning. In: IJCAI (2018)

Download references

Acknowledgements

This work is supported by the Research National Agency (ANR) in the framework of the 2017 LabCom program (ANR 17-LCV2-0006-01), the CPER NUMERIC program funded by the Region Nouvelle Aquitaine, CDA, Charente Maritime French Department, La Rochelle conurbation authority (CDA) and the European Union through the FEDER funding.

Author information

Authors and Affiliations

Laboratoire L3i, SAIL joint Laboratory, Université de La Rochelle, 17042, La Rochelle CEDEX 1, France
Nhu-Van Nguyen, Christophe Rigaud & Jean-Christophe Burie

Authors

Nhu-Van Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Rigaud
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Christophe Burie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nhu-Van Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, NV., Rigaud, C. & Burie, JC. Comic MTL: optimized multi-task learning for comic book image analysis. IJDAR 22, 265–284 (2019). https://doi.org/10.1007/s10032-019-00330-3

Download citation

Received: 15 November 2018
Revised: 15 February 2019
Accepted: 20 June 2019
Published: 17 July 2019
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10032-019-00330-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Comic MTL: optimized multi-task learning for comic book image analysis

Abstract

Access this article

Similar content being viewed by others

Multi-task Model for Comic Book Image Analysis

BCBId: first Bangla comic dataset and its applications

Text-aware balloon extraction from manga

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Comic MTL: optimized multi-task learning for comic book image analysis

Abstract

Access this article

Similar content being viewed by others

Multi-task Model for Comic Book Image Analysis

BCBId: first Bangla comic dataset and its applications

Text-aware balloon extraction from manga

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation