A New Transformer-Based Approach for Text Detection in Shaky and Non-shaky Day-Night Video

Halder, Arnab; Shivakumara, Palaiahnakote; Pal, Umapada; Lu, Tong; Blumenstein, Michael

doi:10.1007/978-3-031-47637-2_3

Arnab Halder¹³,
Palaiahnakote Shivakumara¹⁴,
Umapada Pal¹³,
Tong Lu¹⁵ &
…
Michael Blumenstein¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14407))

Included in the following conference series:

Asian Conference on Pattern Recognition

294 Accesses

Abstract

Text detection in shaky and non-shaky videos is challenging because of variations caused by day and night videos. In addition, moving objects, vehicles, and humans in the video make the text detection problems more challenging in contrast to text detection in normal natural scene images. Motivated by the capacity of the transformer, we propose a new transformer-based approach for detecting text in both shaky and non-shaky day-night videos. To reduce the effect of object movement, poor quality, and other challenges mentioned above, the proposed work explores temporal frames for obtaining activation frames based on similarity and dissimilarity measures. For estimating similarity and dissimilarity, our method extracts luminance, contrast, and structural features. The activation frames are fed to the transformer which comprises an encoder, decoder, and feed-forward network for text detection in shaky and non-shaky day-night video. Since it is the first work, we create our own dataset for experimentation. To show the effectiveness of the proposed method, experiments are conducted on a standard dataset called the ICDAR-2015 video dataset. The results on our dataset and standard dataset show that the proposed model is superior to state-of-the-art methods in terms of recall, precision, and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wu, Y., Kong, Q., Yong, L., Narducci, F., Wan, S.: CDText: scene text detector based on context-aware deformable transformer. Pattern Recogn. Lett. (2023)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings CVPR, pp. 5551–5560 (2017)
Google Scholar
Raisis, Z,. Younes, G., Zelek, J.: Arbitrary shape text detection using transformers. In: Proceedings ICPR, pp. 3238–3245 (2022)
Google Scholar
Cheng, P., Zhao, Y., Wang, W.: Detect arbitrarily shaped text via adaptive thresholding and localization quality estimation. IEEE Trans. Circuits Syst. Video Technol. (2023)
Google Scholar
Zhang, S.X., Zhu, X., Chen, L., Hou, J.B., Yin, X.C.: Arbitrarily shape text detection via segmentation with probability maps. IEEE Trans. Pattern Anal. Mach. Intell., 2736–2750 (2023)
Google Scholar
Mittal, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M.: A new method for detection and prediction of occluded text in natural scene images. Sig. Process. Image Commun. 100, 1–18 (2022)
Google Scholar
Zhao, M., Feng, W., Yin, F., Liu, C.L.: Texts as points: scene text detection with points supervision. Pattern Recogn. Lett. 170, 1–8 (2023)
Article Google Scholar
Wang, F., Xu, X., Chen, Y., Li, X.: Fuzzy semantics for arbitrarily shaped scene text detection. IEEE Trans. Image Process. 32, 1–12 (2023)
Article Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings CVPR, pp. 9336–9345 (2019)
Google Scholar
Benerjee, A., Shivakumara, P., Acharya, P., Pal, U., Canet, J.L.: TWD: a new deep E2E model for text watermark/caption and scene text detection in video. In: Proceedings ICPR, pp. 1492–1498 (2022)
Google Scholar
Bannet, M.A., Srividhya, R., Jayachandran, T., Rajmohan, V.: Deep learning-based Telugu video text detection using coding over digital transmission. In: Proceeding ICOEI, pp 1479–1483 (2022)
Google Scholar
Nandanwar, L., Shivakumara, P., Ramachandra, R., Lu, T., Antonacopoulos, A., Lu, Y.: A new deep wavefront based model for text localization in 3D video. IEEE Trans. Circuits Syst. Video Technol., 3375–3389 (2022)
Google Scholar
Chen, L., Shi, J., Su, F.: Robust video text detection through parametric shape regression, propagation and fusion. In: Proceedings ICME, pp. 1–6 (2021)
Google Scholar
Chaitra, Y.L., Dinesh, R., Jeevan, M., Arpitha, M., Aishwarya, V., Akshitha, K.: An impact of YOLOv5 on text detection and recognition system using TesseractOCR in images/video frames. In: Proceedings ICDSIS (2022)
Google Scholar
Xue, M., et al.: Arbitrarily oriented text detection in low light natural scene images. IEEE Trans. Multimedia, 2706–2720 (2020)
Google Scholar
Chowdhury, P.N., Shivakumara, P., Jalab, H.A., Ibrahim, R.W., Pal, U., Lu, T.: A new fractal series expansion based enhancement model for license plate recognition. Sing. Process. Image Commun. 89 (2020)
Google Scholar
Chowdhury, P.N., Shivakumara, P., Ramachandra, R., Pal, U., Lu, T., Blumenstein, M.: A new U-Net based license plate enhancement model in night and day images. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds.) Proceedings ACPR, Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41404-7_53
Chowdhury, P.N., Shivakumara, P., Pal, U., Lu. T., Blumenstein, M.: A new augmentation-based method for text detection in night and day license plate images. Multimedia Tools Appl. (2020)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings ICDAR, pp. 1156–1160 (2015)
Google Scholar
Farhadi, M., Yang, Y.: TKD: temporal knowledge distillation for active perception. In: Proceedings WACV, pp. 953–962 (2020)
Google Scholar

Download references

Acknowledgement

This work is partly funded by the Ministry of Higher Education of Malaysia for the generous grant Fundamental Research Grant Scheme (FRGS) with code number FRGS/1/2020/ICT02/UM/02/4. Also, this work is partly funded by the Technology Innovation Hub, Indian Statistical Institute, Kolkata, India. This work is also supported by ISI-UTS joint research cluster.

Author information

Authors and Affiliations

Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India
Arnab Halder & Umapada Pal
Faculty of Computer Science and Information Technology, University of Malaya, Kula Lumpur, Malaysia
Palaiahnakote Shivakumara
Nanjing University, Nanjing, China
Tong Lu
University of Technology Sydney, Sydney, Australia
Michael Blumenstein

Authors

Arnab Halder
View author publications
You can also search for this author in PubMed Google Scholar
Palaiahnakote Shivakumara
View author publications
You can also search for this author in PubMed Google Scholar
Umapada Pal
View author publications
You can also search for this author in PubMed Google Scholar
Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Michael Blumenstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Palaiahnakote Shivakumara .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Huimin Lu
The University of Sydney, Sydney, NSW, Australia
Michael Blumenstein
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
Chinese Academy of Sciences, Bejing, China
Cheng-Lin Liu
Osaka University, Osaka, Ibaraki, Japan
Yasushi Yagi
Kyushu Institute of Technology, Kitakyushu, Japan
Tohru Kamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halder, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M. (2023). A New Transformer-Based Approach for Text Detection in Shaky and Non-shaky Day-Night Video. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14407. Springer, Cham. https://doi.org/10.1007/978-3-031-47637-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-47637-2_3
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47636-5
Online ISBN: 978-3-031-47637-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics