Recognition and Link Prediction of Onomatopoeia Texts with Arbitrary Shapes

Ma, Jingtao; Wei, Hongxi; Wang, Yiming

doi:10.1007/978-3-031-70543-4_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14806))

Included in the following conference series:

International Conference on Document Analysis and Recognition

317 Accesses

Abstract

The onomatopoeia texts in the Japanese comic, with its arbitrary shapes diverse backgrounds and complex layouts, are a challenging and worthwhile subject of study. On the one hand, when recognizing onomatopoeia text images, using existing mainstream text recognition methods may lead to the inability to achieve the expected recognition results. This may be caused by these methods not taking into account the unique characteristics of onomatopoeia words. On the other hand, truncated text which is a part of a complete onomatopoeia word text but not adjacent to other parts on a page of the comic has no meaning. It is only when these truncated texts of a complete onomatopoeia word are linked together that their original meaning can be understood. So, a new method named M4C-COO was proposed to predict the link by researchers but the issue of class imbalance between truncated texts and non-truncated texts was ignored. To solve these problems, in this paper, a new recognition method exploiting the characteristics of onomatopoeia texts was devised; focal loss (FL) was introduced to predict the link and, furthermore, a completely novel loss function based on the focal loss (FB) was proposed. Finally, through experiments, the effectiveness of the works was demonstrated, achieving the state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

A Handwritten Text Detection Model Based on Cascade Feature Fusion Network Improved by FCOS

Scene Text Recognition Via k-NN Attention-Based Decoder and Margin-Based Softmax Loss

References

Almazán, J., Gordo, A., Fornés, A., Valveny, E.: Word spotting and recognition with embedded attributes. IEEE Trans. Pattern Anal. Mach. Intell. 36(12), 2552–2566 (2014)
Article MATH Google Scholar
Baek, J., et al.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4715–4723. IEEE (2019)
Google Scholar
Baek, J., Matsui, Y., Aizawa, K.: Coo: Comic onomatopoeia dataset for recognizing arbitrary or truncated texts. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13688, pp. 267–283. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19815-1_16
Chapter Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
BOOKSTEIN, F.L.: Principal warps: thin-plate splines and the decomposition of deformations. IEEE Trans. Pattern Anal. Mach. Intell. 11(6), 567–585 (1989)
Google Scholar
Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084. IEEE (2017)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Du, Y., et la.: SVTR: scene text recognition with a single visual model. arXiv preprint arXiv:2205.00159 (2022)
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7098–7107. IEEE (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Google Scholar
Hu, R., Singh, A., Darrell, T., Rohrbach, M.: Iterative answer prediction with pointer-augmented multimodal transformers for TextVQA. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9992–10002. IEEE (2020)
Google Scholar
Huang, Y., Sun, Z., Jin, L., Luo, C.: EPAN: effective parts attention network for scene text recognition. Neurocomputing 376, 202–213 (2020)
Article MATH Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2017–2025 (2015)
Google Scholar
Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2D self-attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 546–547 (2020)
Google Scholar
Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 8610–8617. AAAI (2019)
Google Scholar
Liao, M., Pang, G., Huang, J., Hassner, T., Bai, X.: Mask TextSpotter v3: segmentation proposal network for robust scene text spotting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 706–722. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_41
Chapter Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988. IEEE (2017)
Google Scholar
Louis, J.B., Burie, J.C.: Detection of buried complex text. Case of onomatopoeia in comics books. In: Coustaty, M., Fornés, A. (eds.) ICDAR 2023. LNCS, vol. 14193, pp. 177–191. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41498-5_13
Chapter MATH Google Scholar
Louis, J.B., Burie, J.C., Revel, A.: Can deep learning approaches detect complex text? Case of onomatopoeia in comics albums. In: Rousseau, J.J., Kapralos, B. (eds.) ICPR 2022. Lecture Notes in Computer Science, vol. 13644, pp. 48–60. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-37742-6_4
Chapter Google Scholar
Lu, N., et al.: Master: multi-aspect non-local network for scene text recognition. Pattern Recogn. 117, 107980 (2021)
Article MATH Google Scholar
Matsui, Y., et al.: Sketch-based manga retrieval using manga109 dataset. Multimedia Tools Appl. 76, 21811–21838 (2017)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article MATH Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176. IEEE (2016)
Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, vol. 27, pp. 3104–3112 (2014)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2692–2700 (2015)
Google Scholar

Download references

Acknowledgements

This study is supported by the Project for Science and Technology of Inner Mongolia Autonomous Region under Grant 2019GG281, the Natural Science Foundation of Inner Mongolia Autonomous Region under Grant 2019ZD14, the Program for Young Talents of Science and Technology in Universities of Inner Mongolia Autonomous Region under Grant NJYT-20-A05, the fund of supporting the reform and development of local universities (Disciplinary Construction) and construction project of “Inner Mongolia Science and Technology Achievement Transfer and Transformation Demonstration Zone, University Collaborative Innovation Base, and University Entrepreneurship Training Base” (Supercomputing Power Project: 21300-231510).

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, 010010, China
Jingtao Ma, Hongxi Wei & Yiming Wang
Provincial Key Laboratory of Mongolian Information Processing Technology, Hohhot, 010010, China
Jingtao Ma, Hongxi Wei & Yiming Wang
National and Local Joint Engineering Research Center of Mongolian Information Processing Technology, Hohhot, 010010, China
Jingtao Ma, Hongxi Wei & Yiming Wang

Authors

Jingtao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Luleå Tekniska Universitet, Luleå, Sweden
Elisa H. Barney Smith
Luleå Tekniska Universitet, Luleå, Sweden
Marcus Liwicki
Tsinghua University, Beijing, China
Liangrui Peng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, J., Wei, H., Wang, Y. (2024). Recognition and Link Prediction of Onomatopoeia Texts with Arbitrary Shapes. In: Barney Smith, E.H., Liwicki, M., Peng, L. (eds) Document Analysis and Recognition - ICDAR 2024. ICDAR 2024. Lecture Notes in Computer Science, vol 14806. Springer, Cham. https://doi.org/10.1007/978-3-031-70543-4_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-70543-4_9
Published: 09 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70542-7
Online ISBN: 978-3-031-70543-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Recognition and Link Prediction of Onomatopoeia Texts with Arbitrary Shapes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

A Handwritten Text Detection Model Based on Cascade Feature Fusion Network Improved by FCOS

Scene Text Recognition Via k-NN Attention-Based Decoder and Margin-Based Softmax Loss

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Recognition and Link Prediction of Onomatopoeia Texts with Arbitrary Shapes

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

A Handwritten Text Detection Model Based on Cascade Feature Fusion Network Improved by FCOS

Scene Text Recognition Via k-NN Attention-Based Decoder and Margin-Based Softmax Loss

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation