Skip to main content
Log in

Automatic dewarping of camera-captured comic document images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

People often capture document images using the cameras attached to smart mobile phones. As a result, different types of distorted images gets generated. Warping is one of the major problems found on those distorted document images. Most existing techniques work based on the text lines present in the documents. On the other hand, comic documents contain fewer text lines. So, existing dewarping methods fail to perform with great accuracy in comic document images. Here, we propose a novel dewarping technique for warped comic document images. First, a simple mathematical model is proposed for warping generation in comic documents. Here, we show that warping depends on some factors. We estimate those factors from the boundaries of the panels present in a comic document image. Finally, based on those factors, we dewarp the document image. Unlike, the existing methods, the proposed approach can rectify warping in both horizontal and vertical direction. Nevertheless, the proposed approach can dewarp document images having multiple folds. We also evaluate the proposed approach, and the results are quite encouraging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Augereau O, Iwata M, Kise K (2018) A survey of comics research in computer science. J Imag 4(7):87

    Article  Google Scholar 

  2. Bera SK, Ghosh S, Bhowmik S, Sarkar R, Nasipuri M (2021) A non-parametric binarization method based on ensemble of clustering algorithms. Multimed Tools Appl 80(5):7653–7673. https://doi.org/10.1007/s11042-020-09836-z

    Article  Google Scholar 

  3. Brown MS, Seales WB (2004) Image restoration of arbitrarily warped documents. IEEE Trans Pattern Anal Mach Intell 26(10):1295–1306. https://doi.org/10.1109/TPAMI.2004.87

    Article  Google Scholar 

  4. Cao H, Ding X, Liu C (2003) A cylindrical surface model to rectify the bound document image. In: Proceedings Ninth IEEE International Conference on Computer Vision, pp 228–2331, DOI https://doi.org/10.1109/ICCV.2003.1238346, (to appear in print)

  5. Cao Y, Pang X, Chan AB, Lau RW (2016) Dynamic manga: animating still manga via camera movement. IEEE Trans Multimed 19(1):160–172

    Article  Google Scholar 

  6. Chu W-T, Yu C-H, Wang H-H (2014) Optimized comics-based storytelling for temporal image sequences. IEEE Trans Multimed 17(2):201–215

    Article  Google Scholar 

  7. Das S (2019) A statistical tool based binarization method for document images. Multimed Tools Appl 78:27449–27462. https://doi.org/10.1007/s11042-019-07857-x

    Article  Google Scholar 

  8. Das S, Ma K, Shu Z, Samaras D, Shilkrot R (2019) Dewarpnet: Single-image document unwarping with stacked 3d and 2d regression networks. In: The IEEE international conference on computer vision (ICCV)

  9. Dubray D, Laubrock J (2019) Deep CNN-based speech balloon detection and segmentation for comic books. arXiv:1902.08137

  10. Dutta A, Biswas S (2019) Cnn based extraction of panels/characters from bengali comic book page images. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol 1, pp 38–43. IEEE

  11. Dutta A, Biswas S, Das AK (2021) Cnn-based segmentation of speech balloons and narrative text boxes from comic book page images. In: International Journal on Document Analysis and Recognition (IJDAR). pp 1–14

  12. Dutta A, Garai A, Biswas S (2018) Segmentation of meaningful text-regions from camera captured document images. In: 2018 Fifth International Conference on Emerging Applications of Information Technology (EAIT), pp 1–4. IEEE

  13. Ezaki H, Uchida S, Asano A, Sakoe H (2005) Dewarping of document image by global optimization. In: Eighth international conference on document analysis and recognition (ICDAR’05), pp 302–3061, DOI https://doi.org/10.1109/ICDAR.2005.87, (to appear in print)

  14. Fu B, Li W, Wu M, Li R, Xu Z (2012) A document rectification approach dealing with both perspective distortion and warping based on text flow curve fitting. Int J Image Graph 12(01):1250002. https://doi.org/10.1142/S0219467812500027

    Article  MathSciNet  Google Scholar 

  15. Garai A, Biswas S, Mandal S (2021) A theoretical justification of warping generation for dewarping using cnn. Pattern Recogn 107621:109. https://doi.org/10.1016/j.patcog.2020.107621

    Article  Google Scholar 

  16. Garai A, Biswas S, Mandal S, Chaudhuri BB (2017) Automatic dewarping of camera captured born-digital bangla document images. In: 2017 Ninth international conference on advances in pattern recognition (ICAPR), pp 1–6, DOI https://doi.org/10.1109/ICAPR.2017.8593157, (to appear in print)

  17. Garai A, Biswas S, Mandal S, Chaudhuri BB (2020) Automatic rectification of warped bangla document images. IET Image Process 14:74–839

    Article  Google Scholar 

  18. Garai A, Biswas S, Mandal S, Chaudhuri BB (2021) Dewarping of document images: a semi-cnn based approach. Multimed Tool Appl

  19. Gatos B, Pratikakis I, Ntirogiannis K (2007) Segmentation based recovery of arbitrarily warped document images. In: Ninth international conference on document analysis and recognition (ICDAR 2007), vol 2, pp 989–993, DOI https://doi.org/10.1109/ICDAR.2007.4377063, (to appear in print)

  20. He Y, Pan P, Xie S, Sun J, Naoi S (2013) A book dewarping system by boundary-based 3d surface reconstruction. In: 2013 12Th international conference on document analysis and recognition, pp 403–407, DOI https://doi.org/10.1109/ICDAR.2013.88, (to appear in print)

  21. He Z, Zhou Y, Wang Y, Wang S, Lu X, Tang Z, Cai L (2018) An end-to-end quadrilateral regression network for comic panel extraction. In: Proceedings of the 26th ACM International Conference on Multimedia, pp 887–895

  22. Herranz L, Calic J, Martínez JM, Mrak M (2012) Scalable comic-like video summaries and layout disturbance. IEEE Trans Multimed 14(4):1290–1297

    Article  Google Scholar 

  23. Ke M, Zhixin S, Bai X, Jue W, Dimitris S (2018) Docunet: document image unwarping via a stacked u-net. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition

  24. Kil T, Seo W, Koo HI, Cho NI (2017) Robust document image dewarping method using text-lines and line segments. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 01, pp 865–870, DOI https://doi.org/10.1109/ICDAR.2017.146, (to appear in print)

  25. Kim BS, Koo HI, Cho NI (2015) Document dewarping via text-line based optimization. Pattern Recogn 48(11):3600–3614. https://doi.org/10.1016/j.patcog.2015.04.026

    Article  Google Scholar 

  26. Lee S-H, Kim D, Jadhav S, Lee S (2017) A restoration method for distorted comics to improve comic contents identification. Int J Doc Anal Recognit 20(4):223–240. https://doi.org/10.1007/s10032-017-0291-9

    Article  Google Scholar 

  27. Li L, Wang Y, Gao L, Tang Z, Suen CY (2014) Comic2cebx: a system for automatic comic content adaptation. In: IEEE/ACM Joint Conference on Digital Libraries, pp 299–308. IEEE

  28. Liang J, DeMenthon D, Doermann D (2008) Geometric rectification of camera-captured document images. IEEE Trans Pattern Anal Mach Intell 30(4):591–605. https://doi.org/10.1109/TPAMI.2007.70724

    Article  Google Scholar 

  29. Liang J, DeMenthon D, Doermann D (2008) Geometric rectification of camera-captured document images. IEEE Trans Pattern Anal Mach intell 30(4):591–605

    Article  Google Scholar 

  30. Liu X, Meng G, Fan B, Xiang S, Pan C (2020) Geometric rectification of document images using adversarial gated unwarping network. Pattern Recogn 107576:108. https://doi.org/10.1016/j.patcog.2020.107576

    Article  Google Scholar 

  31. Liu C, Zhang Y, Wang B, Ding X (2015) Restoring camera-captured distorted document images. Int J Document Anal Recogn (IJDAR) 18 (2):111–124. https://doi.org/10.1007/s10032-014-0233-8

    Article  Google Scholar 

  32. Lu S, Tan CL (2006) Document flattening through grid modeling and regularization. In: 18Th international conference on pattern recognition (ICPR’06), vol 1, pp 971–974, DOI https://doi.org/10.1109/ICPR.2006.458https://doi.org/10.1109/ICPR.2006.458, (to appear in print)

  33. Matsui Y, Yamasaki T, Aizawa K (2011) Interactive manga retargeting. In: SIGGRAPH Posters, p 35

  34. Meng G, Pan C, Xiang S, Duan J, Zheng N (2012) Metric rectification of curved document images. IEEE Trans Pattern Anal Mach Intell 34 (4):707–722. https://doi.org/10.1109/TPAMI.2011.151

    Article  Google Scholar 

  35. Meng G, Su Y, Wu Y, Xiang S, Pan C (2018) Exploiting vector fields for geometric rectification of distorted document images. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision – ECCV 2018. Springer, Cham, pp 180–195

  36. Nguyen N-V, Rigaud C, Burie J-C (2017) Comic characters detection using deep learning. In: ICDAR, 2017, vol 3, pp 41–46. IEEE

  37. Nguyen N-V, Rigaud C, Burie J-C (2019) Comic MTL: optimized multi-task learning for comic book image analysis. Int J Document Anal Recogn (IJDAR) 22(3):265–284

    Article  Google Scholar 

  38. Oh T, Choi N, Kim D, Lee S (2015) Low-complexity and robust comic fingerprint method for comic identification. Signal Process: Image Commun 39:1–16

    Google Scholar 

  39. Ohk H, Seo H, Kang K, Kim S, Choi D (2011) A restoration method for distorted image scanned from a bound book. In: Color Imaging XVI: Displaying, Processing, Hardcopy, and Applications, vol 7866, p 78661. International Society for Optics and Photonics

  40. Pang X, Cao Y, Lau RW, Chan AB (2014) A robust panel extraction method for manga. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp 1125–1128. ACM

  41. Pratikakis I, Zagoris K, Karagiannis X, Tsochatzidis L, Mondal T, Marthot-Santaniello I (2019) Icdar 2019 competition on document image binarization (dibco 2019). In: 2019 International conference on document analysis and recognition (ICDAR), pp 1547–1556

  42. Qin X, Zhou Y, He Z, Wang Y, Tang Z (2017) A faster r-cnn based method for comic characters face detection. In: ICDAR, vol 1, pp 1074–1080. IEEE

  43. Rigaud C, Burie J-C, Ogier J-M, Karatzas D, Van de Weijer J (2013) An active contour model for speech balloon detection in comics. In: 2013 12th International Conference on Document Analysis and Recognition, pp 1240–1244. IEEE

  44. Rigaud C, Guérin C, Karatzas D, Burie J-C, Ogier J-M (2015) Knowledge-driven understanding of images in comic books. IJDAR 18 (3):199–221

    Article  Google Scholar 

  45. Rigaud C, Le Thanh N, Burie J-C, Ogier J-M, Iwata M, Imazu E, Kise K (2015) Speech balloon and speaker association for comics and manga understanding. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp 351–355. IEEE

  46. Stamatopoulos N, Gatos B, Pratikakis I, Perantonis SJ (2011) Goal-oriented rectification of camera-based document images. IEEE Trans Image Process 20(4):910–920. https://doi.org/10.1109/TIP.2010.2080280

    Article  MathSciNet  MATH  Google Scholar 

  47. Sun W, Burie J-C, Ogier J-M, Kise K (2013) Specific comic character detection using local feature matching. In: ICDAR, pp 275–279. IEEE

  48. Sun W, Kise K (2011) Similar manga retrieval using visual vocabulary based on regions of interest. In: 2011 International Conference on Document Analysis and Recognition, pp 1075–1079. IEEE

  49. Ulges A, Lampert CH, Breuel T (2005) Document image dewarping using robust estimation of curled text lines. In: Eighth international conference on document analysis and recognition (ICDAR’05), pp 1001–10052, DOI https://doi.org/10.1109/ICDAR.2005.90, (to appear in print)

  50. Wang M, Hong R, Yuan X-T, Yan S, Chua T-S (2012) Movie2comics: towards a lively video content presentation. IEEE Trans Multimed 14 (3):858–870

    Article  Google Scholar 

  51. Wang F, Nagano H, Kashino K, Igarashi T (2016) Visualizing video sounds with sound word animation to enrich user experience. IEEE Trans Multimed 19(2):418–429

    Article  Google Scholar 

  52. Wang Y, Zhou Y, Tang Z (2015) Comic frame extraction via line segments combination. In: ICDAR, pp 856–860. IEEE

  53. Yang P, Antonacopoulos A, Clausner C, Pletschacher S, Qi J (2017) Effective geometric restoration of distorted historical document for large-scale digitisation. IET Image Process 11:841–85312

    Article  Google Scholar 

  54. You S, Matsushita Y, Sinha S, Bou Y, Ikeuchi K (2017) Multiview rectification of folded documents. IEEE Trans Pattern Anal Mach Intell PP(99):1–1. https://doi.org/10.1109/TPAMI.2017.2675980

    Article  Google Scholar 

  55. Zhang L, Tan CL (2006) Restoringwarped document images using shape-from-shading and surface interpolation. In: 18Th international conference on pattern recognition (ICPR’06), vol 1, pp 642–645, DOI https://doi.org/10.1109/ICPR.2006.997, (to appear in print)

  56. Zhao J, Shi C, Jia F, Wang Y, Xiao B (2018) An effective binarization method for disturbed camera-captured document images. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR), pp 339–344

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arpan Garai.

Ethics declarations

This work is not funded in any funding agencies. To the best of our knowledge, this work do not have any financial and/or non-financial conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Garai, A., Dutta, A. & Biswas, S. Automatic dewarping of camera-captured comic document images. Multimed Tools Appl 82, 1537–1552 (2023). https://doi.org/10.1007/s11042-022-13234-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13234-y

Keywords

Navigation