A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos

Wang, Yang; Qian, Ye; Shi, Jiahao; Su, Feng

doi:10.1007/978-3-030-37734-2_10

Yang Wang¹⁶,
Ye Qian¹⁶,
Jiahao Shi¹⁶ &
…
Feng Su¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11962))

Included in the following conference series:

International Conference on Multimedia Modeling

2301 Accesses

Abstract

Scene text in the video is usually vulnerable to various blurs like those caused by camera or text motions, which brings additional difficulty to reliably extract them from the video for content-based video applications. In this paper, we propose a novel fully convolutional deep neural network for deblurring and detecting text in the video. Specifically, to cope with blur of video text, we propose an effective deblurring subnetwork that is composed of multi-level convolutional blocks with both cross-block (long) and within-block (short) skip connections for progressively learning residual deblurred image details as well as a spatial attention mechanism to pay more attention on blurred regions, which generates the sharper image for current frame by fusing multiple surrounding adjacent frames. To further localize text in the frames, we enhance the EAST text detection model by introducing deformable convolution layers and deconvolution layers, which better capture widely varied appearances of video text. Experiments on the public scene text video dataset demonstrate the state-of-the-art performance of the proposed video text deblurring and detection model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cho, S., Wang, J., Lee, S.: Video deblurring for hand-held cameras using patch-based synthesis. ACM Trans. Graph. (TOG) 31(4), 64 (2012)
Article Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: ICCV, October 2017
Google Scholar
Delbracio, M., Sapiro, G.: Burst deblurring: removing camera shake through fourier burst accumulation. In: CVPR, pp. 2385–2393 (2015)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: ICCV, pp. 3047–3055 (2017)
Google Scholar
Karatzas, D., et al.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013)
Google Scholar
Khare, V., Shivakumara, P., Paramesran, R., Blumenstein, M.: Arbitrarily-oriented multi-lingual text detection in video. Multimedia Tools Appl. 76(15), 16625–16655 (2017)
Article Google Scholar
Khare, V., Shivakumara, P., Raveendran, P.: A new histogram oriented moments descriptor for multi-oriented moving text detection in video. Expert Syst. Appl. 42(21), 7627–7640 (2015)
Article Google Scholar
Khare, V., Shivakumara, P., Raveendran, P., Blumenstein, M.: A blind deconvolution model for scene text detection and recognition in video. Pattern Recogn. 54(C), 128–148 (2016)
Article Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. IVC 22(10), 761–767 (2004)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)
Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: CVPR, pp. 3482–3490 (2017)
Google Scholar
Shivakumara, P., Phan, T.Q., Tan, C.L.: New fourier-statistical features in RGB space for video text detection. IEEE TCSVT 20(11), 1520–1532 (2010)
Google Scholar
Shivakumara, P., Sreedhar, R.P., Phan, T.Q., Lu, S., Tan, C.L.: Multioriented video scene text detection through Bayesian classification and boundary growing. IEEE TCSVT 22(8), 1227–1235 (2012)
Google Scholar
Su, S., Delbracio, M., Wang, J., Sapiro, G., Heidrich, W., Wang, O.: Deep video deblurring for hand-held cameras. In: CVPR, pp. 1279–1288, July 2017
Google Scholar
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: ICCV, pp. 4651–4659 (2015)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Wang, L., Wang, Y., Shan, S., Su, F.: Scene text detection and tracking in video with background cues. In: ICMR, pp. 160–168 (2018)
Google Scholar
Yang, C., et al.: Tracking based multi-orientation scene text detection: a unified framework with dynamic programming. IEEE TIP 26(7), 3235–3248 (2017)
MathSciNet MATH Google Scholar
Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE TPAMI 36(5), 970–983 (2014)
Article Google Scholar
Zhao, X., Lin, K.H., Fu, Y., Hu, Y., Liu, Y., Huang, T.S.: Text from corners: a novel approach to detect text and caption in videos. IEEE TIP 20(3), 790–799 (2011)
MathSciNet MATH Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: CVPR, pp. 2642–2651 (2017)
Google Scholar

Download references

Acknowledgments

Research supported by the Natural Science Foundation of Jiangsu Province of China under Grant No. BK20171345 and the National Natural Science Foundation of China under Grant Nos. 61003113, 61321491, 61672273.

Author information

Authors and Affiliations

State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Yang Wang, Ye Qian, Jiahao Shi & Feng Su

Authors

Yang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ye Qian
View author publications
You can also search for this author in PubMed Google Scholar
Jiahao Shi
View author publications
You can also search for this author in PubMed Google Scholar
Feng Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Su .

Editor information

Editors and Affiliations

Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Y., Qian, Y., Shi, J., Su, F. (2020). A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11962. Springer, Cham. https://doi.org/10.1007/978-3-030-37734-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-37734-2_10
Published: 24 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37733-5
Online ISBN: 978-3-030-37734-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics