Skip to main content

Recognition and Link Prediction of Onomatopoeia Texts with Arbitrary Shapes

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2024 (ICDAR 2024)

Abstract

The onomatopoeia texts in the Japanese comic, with its arbitrary shapes diverse backgrounds and complex layouts, are a challenging and worthwhile subject of study. On the one hand, when recognizing onomatopoeia text images, using existing mainstream text recognition methods may lead to the inability to achieve the expected recognition results. This may be caused by these methods not taking into account the unique characteristics of onomatopoeia words. On the other hand, truncated text which is a part of a complete onomatopoeia word text but not adjacent to other parts on a page of the comic has no meaning. It is only when these truncated texts of a complete onomatopoeia word are linked together that their original meaning can be understood. So, a new method named M4C-COO was proposed to predict the link by researchers but the issue of class imbalance between truncated texts and non-truncated texts was ignored. To solve these problems, in this paper, a new recognition method exploiting the characteristics of onomatopoeia texts was devised; focal loss (FL) was introduced to predict the link and, furthermore, a completely novel loss function based on the focal loss (FB) was proposed. Finally, through experiments, the effectiveness of the works was demonstrated, achieving the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)

    Article  MATH  Google Scholar 

  2. Baek, J., et al.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4715–4723. IEEE (2019)

    Google Scholar 

  3. Baek, J., Matsui, Y., Aizawa, K.: Coo: Comic onomatopoeia dataset for recognizing arbitrary or truncated texts. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 267–283. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_16

    Chapter  Google Scholar 

  4. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)

  5. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  6. BOOKSTEIN, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)

    Google Scholar 

  7. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084. IEEE (2017)

    Google Scholar 

  8. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  9. Du, Y., et la.: SVTR: scene text recognition with a single visual model. arXiv preprint arXiv:2205.00159 (2022)

  10. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7098–7107. IEEE (2021)

    Google Scholar 

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)

    Google Scholar 

  12. Hu, R., Singh, A., Darrell, T., Rohrbach, M.: Iterative answer prediction with pointer-augmented multimodal transformers for TextVQA. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9992–10002. IEEE (2020)

    Google Scholar 

  13. Huang, Y., Sun, Z., Jin, L., Luo, C.: EPAN: effective parts attention network for scene text recognition. Neurocomputing 376, 202–213 (2020)

    Article  MATH  Google Scholar 

  14. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025 (2015)

    Google Scholar 

  15. Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2D self-attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 546–547 (2020)

    Google Scholar 

  16. Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8610–8617. AAAI (2019)

    Google Scholar 

  17. Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask TextSpotter v3: segmentation proposal network for robust scene text spotting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 706–722. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_41

    Chapter  Google Scholar 

  18. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988. IEEE (2017)

    Google Scholar 

  19. Louis, J.B., Burie, J.C.: Detection of buried complex text. Case of onomatopoeia in comics books. In: Coustaty, M., Fornés, A. (eds.) ICDAR 2023. LNCS, vol. 14193, pp. 177–191. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41498-5_13

    Chapter  MATH  Google Scholar 

  20. Louis, J.B., Burie, J.C., Revel, A.: Can deep learning approaches detect complex text? Case of onomatopoeia in comics albums. In: Rousseau, J.J., Kapralos, B. (eds.) ICPR 2022. Lecture Notes in Computer Science, vol. 13644, pp. 48–60. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-37742-6_4

    Chapter  Google Scholar 

  21. Lu, N., et al.: Master: multi-aspect non-local network for scene text recognition. Pattern Recogn. 117, 107980 (2021)

    Article  MATH  Google Scholar 

  22. Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76, 21811–21838 (2017)

    Article  Google Scholar 

  23. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)

    Google Scholar 

  24. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  MATH  Google Scholar 

  25. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176. IEEE (2016)

    Google Scholar 

  26. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  27. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112 (2014)

    Google Scholar 

  28. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)

    Google Scholar 

  29. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700 (2015)

    Google Scholar 

Download references

Acknowledgements

This study is supported by the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2019ZD14, the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05, the fund of supporting the reform and development of local universities (Disciplinary Construction) and construction project of “Inner Mongolia Science and Technology Achievement Transfer and Transformation Demonstration Zone, University Collaborative Innovation Base, and University Entrepreneurship Training Base” (Supercomputing Power Project: 21300-231510).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, J., Wei, H., Wang, Y. (2024). Recognition and Link Prediction of Onomatopoeia Texts with Arbitrary Shapes. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14806. Springer, Cham. https://doi.org/10.1007/978-3-031-70543-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70543-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70542-7

  • Online ISBN: 978-3-031-70543-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics