Skip to main content

A New Transformer-Based Approach for Text Detection in Shaky and Non-shaky Day-Night Video

  • Conference paper
  • First Online:
Pattern Recognition (ACPR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14407))

Included in the following conference series:

  • 294 Accesses

Abstract

Text detection in shaky and non-shaky videos is challenging because of variations caused by day and night videos. In addition, moving objects, vehicles, and humans in the video make the text detection problems more challenging in contrast to text detection in normal natural scene images. Motivated by the capacity of the transformer, we propose a new transformer-based approach for detecting text in both shaky and non-shaky day-night videos. To reduce the effect of object movement, poor quality, and other challenges mentioned above, the proposed work explores temporal frames for obtaining activation frames based on similarity and dissimilarity measures. For estimating similarity and dissimilarity, our method extracts luminance, contrast, and structural features. The activation frames are fed to the transformer which comprises an encoder, decoder, and feed-forward network for text detection in shaky and non-shaky day-night video. Since it is the first work, we create our own dataset for experimentation. To show the effectiveness of the proposed method, experiments are conducted on a standard dataset called the ICDAR-2015 video dataset. The results on our dataset and standard dataset show that the proposed model is superior to state-of-the-art methods in terms of recall, precision, and F-measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wu, Y., Kong, Q., Yong, L., Narducci, F., Wan, S.: CDText: scene text detector based on context-aware deformable transformer. Pattern Recogn. Lett. (2023)

    Google Scholar 

  2. Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings CVPR, pp. 5551–5560 (2017)

    Google Scholar 

  3. Raisis, Z,. Younes, G., Zelek, J.: Arbitrary shape text detection using transformers. In: Proceedings ICPR, pp. 3238–3245 (2022)

    Google Scholar 

  4. Cheng, P., Zhao, Y., Wang, W.: Detect arbitrarily shaped text via adaptive thresholding and localization quality estimation. IEEE Trans. Circuits Syst. Video Technol. (2023)

    Google Scholar 

  5. Zhang, S.X., Zhu, X., Chen, L., Hou, J.B., Yin, X.C.: Arbitrarily shape text detection via segmentation with probability maps. IEEE Trans. Pattern Anal. Mach. Intell., 2736–2750 (2023)

    Google Scholar 

  6. Mittal, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M.: A new method for detection and prediction of occluded text in natural scene images. Sig. Process. Image Commun. 100, 1–18 (2022)

    Google Scholar 

  7. Zhao, M., Feng, W., Yin, F., Liu, C.L.: Texts as points: scene text detection with points supervision. Pattern Recogn. Lett. 170, 1–8 (2023)

    Article  Google Scholar 

  8. Wang, F., Xu, X., Chen, Y., Li, X.: Fuzzy semantics for arbitrarily shaped scene text detection. IEEE Trans. Image Process. 32, 1–12 (2023)

    Article  Google Scholar 

  9. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings CVPR, pp. 9336–9345 (2019)

    Google Scholar 

  10. Benerjee, A., Shivakumara, P., Acharya, P., Pal, U., Canet, J.L.: TWD: a new deep E2E model for text watermark/caption and scene text detection in video. In: Proceedings ICPR, pp. 1492–1498 (2022)

    Google Scholar 

  11. Bannet, M.A., Srividhya, R., Jayachandran, T., Rajmohan, V.: Deep learning-based Telugu video text detection using coding over digital transmission. In: Proceeding ICOEI, pp 1479–1483 (2022)

    Google Scholar 

  12. Nandanwar, L., Shivakumara, P., Ramachandra, R., Lu, T., Antonacopoulos, A., Lu, Y.: A new deep wavefront based model for text localization in 3D video. IEEE Trans. Circuits Syst. Video Technol., 3375–3389 (2022)

    Google Scholar 

  13. Chen, L., Shi, J., Su, F.: Robust video text detection through parametric shape regression, propagation and fusion. In: Proceedings ICME, pp. 1–6 (2021)

    Google Scholar 

  14. Chaitra, Y.L., Dinesh, R., Jeevan, M., Arpitha, M., Aishwarya, V., Akshitha, K.: An impact of YOLOv5 on text detection and recognition system using TesseractOCR in images/video frames. In: Proceedings ICDSIS (2022)

    Google Scholar 

  15. Xue, M., et al.: Arbitrarily oriented text detection in low light natural scene images. IEEE Trans. Multimedia, 2706–2720 (2020)

    Google Scholar 

  16. Chowdhury, P.N., Shivakumara, P., Jalab, H.A., Ibrahim, R.W., Pal, U., Lu, T.: A new fractal series expansion based enhancement model for license plate recognition. Sing. Process. Image Commun. 89 (2020)

    Google Scholar 

  17. Chowdhury, P.N., Shivakumara, P., Ramachandra, R., Pal, U., Lu, T., Blumenstein, M.: A new U-Net based license plate enhancement model in night and day images. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds.) Proceedings ACPR, Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41404-7_53

  18. Chowdhury, P.N., Shivakumara, P., Pal, U., Lu. T., Blumenstein, M.: A new augmentation-based method for text detection in night and day license plate images. Multimedia Tools Appl. (2020)

    Google Scholar 

  19. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings ICDAR, pp. 1156–1160 (2015)

    Google Scholar 

  20. Farhadi, M., Yang, Y.: TKD: temporal knowledge distillation for active perception. In: Proceedings WACV, pp. 953–962 (2020)

    Google Scholar 

Download references

Acknowledgement

This work is partly funded by the Ministry of Higher Education of Malaysia for the generous grant Fundamental Research Grant Scheme (FRGS) with code number FRGS/1/2020/ICT02/UM/02/4. Also, this work is partly funded by the Technology Innovation Hub, Indian Statistical Institute, Kolkata, India. This work is also supported by ISI-UTS joint research cluster.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Palaiahnakote Shivakumara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Halder, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M. (2023). A New Transformer-Based Approach for Text Detection in Shaky and Non-shaky Day-Night Video. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14407. Springer, Cham. https://doi.org/10.1007/978-3-031-47637-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47637-2_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47636-5

  • Online ISBN: 978-3-031-47637-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics