Light Textspotter: An Extreme Light Scene Text Spotter

Guan, Jiazhi; Zhu, Anna

doi:10.1007/978-3-030-63820-7_50

Jiazhi Guan¹¹ &
Anna Zhu¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1332))

Included in the following conference series:

International Conference on Neural Information Processing

2299 Accesses
1 Citations

Abstract

Scene text spotting is a challenging open problem in computer vision community. Many insightful methods have been proposed, but most of them did not consider the enormous computational burden for better performance. In this work, an extreme light scene text spotter is proposed with a teacher-student (TS) structure. Specifically, light convolutional neural network (CNN) architecture, Shuffle Unit, is adopted with feature pyramid network (FPN) for feature extraction. Knowledge distillation and attention transfer are designed in the TS framework to boost text detection accuracy. Cascaded with a full convolution network (FCN) recognizer, our proposed method can be trained end-to-end. Because the resource consumption is halved, our method runs faster. The experimental results demonstrate that our method is more efficient and can achieve state-of-the-art detection performance comparing with other methods on benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liao, M., Lyu, P., He, M., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans. Pattern Anal. Mach. Intell. (2019)
Google Scholar
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNET: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: Proceedings ICDAR, pp. 1156–1160 (2015)
Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of IEEE International Conference on Computer Vision, pp. 5238–5246 (2017)
Google Scholar
Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 935–942 (2017)
Google Scholar
Busta, M, Neumann, L,, Matas, J.: Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2204–2212 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 71–88. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5
Chapter Google Scholar
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Gupta, A,, Vedaldi, A,, Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of CVPR, pp. 4159–4167 (2016)
Google Scholar
Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: Proceedings of CVPR, pp. 3482–3490 (2017)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of CVPR, pp. 2642–2651 (2017)
Google Scholar
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2
Chapter Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant 61703316.

Author information

Authors and Affiliations

School of Computer Science and Technology, Wuhan University of Technology, Wuhan, China
Jiazhi Guan & Anna Zhu

Authors

Jiazhi Guan
View author publications
You can also search for this author in PubMed Google Scholar
Anna Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Zhu .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, Hong Kong
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut's University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guan, J., Zhu, A. (2020). Light Textspotter: An Extreme Light Scene Text Spotter. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science, vol 1332. Springer, Cham. https://doi.org/10.1007/978-3-030-63820-7_50

Download citation

DOI: https://doi.org/10.1007/978-3-030-63820-7_50
Published: 17 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63819-1
Online ISBN: 978-3-030-63820-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics