Abstract
Comic books enjoy great popularity around the world. More and more people choose to read comic books on digital devices, especially on mobile ones. However, the screen size of most mobile devices is not big enough to display an entire comic page directly. As a consequence, without any reflow or adaption to the original books, users often find that the texts on comic pages are hard to recognize when reading comics on mobile devices. Given the positions of speech balloons, it becomes quite easy to do further processing on texts to make them easier to read on mobile devices. Because the texts on a comic page often come along with surrounding speech balloons. Therefore, it is important to devise an effective method to localize speech balloons in comics. However, only a few studies have been done in this direction. In this paper, we propose a Regions with Convolutional Neural Network (R-CNN) based method to localize speech balloons in comics. Experimental results have demonstrated that the proposed method can localize the speech balloons in comics effectively and accurately.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rigaud, C., Burie, J., Ogier, J., Karatzas, D., Weijer, J.: An active contour model for speech balloon detection in comics. In: International Conference on Document Analysis and Recognition, Washington, DC, pp. 1240–1244 (2013)
Arai, K., Tolle, H.: Method for real time text extraction of digital manga comic. Int. J. Image Process. 4(6), 669676 (2011)
Ho, A.N., Burie, J., Ogier, J.: Panel and speech balloon extraction from comic books. In: International Workshop on Document Analysis Systems, Gold Cost, QLD, pp. 424–428 (2012)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Computer Vision and Pattern Recognition, Columbus, OH, pp. 580–587 (2014)
Gu, C., Lim, J.J., Arbelaez, P., Malik, J.: Recognition using regions. In: Computer Vision and Pattern Recognition, Miami, FL, pp. 1030–1037 (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, South Lake Tahoe, Nevada, pp. 1097–1105 (2012)
Girshick, R.: GitHub, May 2014. https://github.com/rbgirshick/rcnn
Uijlings, J.R.R., van de Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vision 104(2), 154–171 (2013)
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)
Jia, Y.: Caffe: an open source convolutional architecture for fast feature embedding, May 2013. http://caffe.berkeleyvision.org/
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2009)
Sung, K.-K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 39–51 (1998)
Acknowledgement
This work is supported by National Natural Science Foundation of China (Grant 61300061) and Beijing Natural Science Foundation (4132033).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, Y., Liu, X., Tang, Z. (2016). An R-CNN Based Method to Localize Speech Balloons in Comics. In: Tian, Q., Sebe, N., Qi, GJ., Huet, B., Hong, R., Liu, X. (eds) MultiMedia Modeling. MMM 2016. Lecture Notes in Computer Science(), vol 9516. Springer, Cham. https://doi.org/10.1007/978-3-319-27671-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-27671-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27670-0
Online ISBN: 978-3-319-27671-7
eBook Packages: Computer ScienceComputer Science (R0)