Abstract
Text detection in shaky and non-shaky videos is challenging because of variations caused by day and night videos. In addition, moving objects, vehicles, and humans in the video make the text detection problems more challenging in contrast to text detection in normal natural scene images. Motivated by the capacity of the transformer, we propose a new transformer-based approach for detecting text in both shaky and non-shaky day-night videos. To reduce the effect of object movement, poor quality, and other challenges mentioned above, the proposed work explores temporal frames for obtaining activation frames based on similarity and dissimilarity measures. For estimating similarity and dissimilarity, our method extracts luminance, contrast, and structural features. The activation frames are fed to the transformer which comprises an encoder, decoder, and feed-forward network for text detection in shaky and non-shaky day-night video. Since it is the first work, we create our own dataset for experimentation. To show the effectiveness of the proposed method, experiments are conducted on a standard dataset called the ICDAR-2015 video dataset. The results on our dataset and standard dataset show that the proposed model is superior to state-of-the-art methods in terms of recall, precision, and F-measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Wu, Y., Kong, Q., Yong, L., Narducci, F., Wan, S.: CDText: scene text detector based on context-aware deformable transformer. Pattern Recogn. Lett. (2023)
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings CVPR, pp. 5551–5560 (2017)
Raisis, Z,. Younes, G., Zelek, J.: Arbitrary shape text detection using transformers. In: Proceedings ICPR, pp. 3238–3245 (2022)
Cheng, P., Zhao, Y., Wang, W.: Detect arbitrarily shaped text via adaptive thresholding and localization quality estimation. IEEE Trans. Circuits Syst. Video Technol. (2023)
Zhang, S.X., Zhu, X., Chen, L., Hou, J.B., Yin, X.C.: Arbitrarily shape text detection via segmentation with probability maps. IEEE Trans. Pattern Anal. Mach. Intell., 2736–2750 (2023)
Mittal, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M.: A new method for detection and prediction of occluded text in natural scene images. Sig. Process. Image Commun. 100, 1–18 (2022)
Zhao, M., Feng, W., Yin, F., Liu, C.L.: Texts as points: scene text detection with points supervision. Pattern Recogn. Lett. 170, 1–8 (2023)
Wang, F., Xu, X., Chen, Y., Li, X.: Fuzzy semantics for arbitrarily shaped scene text detection. IEEE Trans. Image Process. 32, 1–12 (2023)
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings CVPR, pp. 9336–9345 (2019)
Benerjee, A., Shivakumara, P., Acharya, P., Pal, U., Canet, J.L.: TWD: a new deep E2E model for text watermark/caption and scene text detection in video. In: Proceedings ICPR, pp. 1492–1498 (2022)
Bannet, M.A., Srividhya, R., Jayachandran, T., Rajmohan, V.: Deep learning-based Telugu video text detection using coding over digital transmission. In: Proceeding ICOEI, pp 1479–1483 (2022)
Nandanwar, L., Shivakumara, P., Ramachandra, R., Lu, T., Antonacopoulos, A., Lu, Y.: A new deep wavefront based model for text localization in 3D video. IEEE Trans. Circuits Syst. Video Technol., 3375–3389 (2022)
Chen, L., Shi, J., Su, F.: Robust video text detection through parametric shape regression, propagation and fusion. In: Proceedings ICME, pp. 1–6 (2021)
Chaitra, Y.L., Dinesh, R., Jeevan, M., Arpitha, M., Aishwarya, V., Akshitha, K.: An impact of YOLOv5 on text detection and recognition system using TesseractOCR in images/video frames. In: Proceedings ICDSIS (2022)
Xue, M., et al.: Arbitrarily oriented text detection in low light natural scene images. IEEE Trans. Multimedia, 2706–2720 (2020)
Chowdhury, P.N., Shivakumara, P., Jalab, H.A., Ibrahim, R.W., Pal, U., Lu, T.: A new fractal series expansion based enhancement model for license plate recognition. Sing. Process. Image Commun. 89 (2020)
Chowdhury, P.N., Shivakumara, P., Ramachandra, R., Pal, U., Lu, T., Blumenstein, M.: A new U-Net based license plate enhancement model in night and day images. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds.) Proceedings ACPR, Springer, Cham (2020). https://doi.org/10.1007/978-3-030-41404-7_53
Chowdhury, P.N., Shivakumara, P., Pal, U., Lu. T., Blumenstein, M.: A new augmentation-based method for text detection in night and day license plate images. Multimedia Tools Appl. (2020)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings ICDAR, pp. 1156–1160 (2015)
Farhadi, M., Yang, Y.: TKD: temporal knowledge distillation for active perception. In: Proceedings WACV, pp. 953–962 (2020)
Acknowledgement
This work is partly funded by the Ministry of Higher Education of Malaysia for the generous grant Fundamental Research Grant Scheme (FRGS) with code number FRGS/1/2020/ICT02/UM/02/4. Also, this work is partly funded by the Technology Innovation Hub, Indian Statistical Institute, Kolkata, India. This work is also supported by ISI-UTS joint research cluster.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Halder, A., Shivakumara, P., Pal, U., Lu, T., Blumenstein, M. (2023). A New Transformer-Based Approach for Text Detection in Shaky and Non-shaky Day-Night Video. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14407. Springer, Cham. https://doi.org/10.1007/978-3-031-47637-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-47637-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47636-5
Online ISBN: 978-3-031-47637-2
eBook Packages: Computer ScienceComputer Science (R0)