A Handwritten Text Detection Model Based on Cascade Feature Fusion Network Improved by FCOS

Feng, Ruiqi; Zhao, Fujia; Chen, Shanxiong; Zhang, Shixue; Wang, Dingwang

doi:10.1007/978-3-030-86159-9_4

Ruiqi Feng¹⁰,
Fujia Zhao¹⁰,
Shanxiong Chen¹⁰,
Shixue Zhang¹¹ &
…
Dingwang Wang¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12917))

Included in the following conference series:

International Conference on Document Analysis and Recognition

2030 Accesses

Abstract

In this paper, we propose a method for detecting handwritten ancient texts. The challenges in detecting this type of data are: the complexity of the layout of handwritten ancient texts, the varying text sizes, mixed arrangement of pictures and texts, the high number of hand-drawn patterns and the high background noise. Unlike general scene text detection tasks (ICDAR, TotalText, etc.), the texts in the images of ancient books are more densely distributed. For the features of the dataset, we propose a detection model based on cascade feature fusion called DFCOS, which aims to improve the fusion of localization information in lower layers. Specifically, bottom-up paths are created to use more localization signals from low-levels, and we incorporate skip connections to better extract information in the backbone, and then improve our model by parallel cascading. We verified the effectiveness of our DFCOS on HWAD (Handwritten Ancient-Books Dataset), a dataset containing four languages - Yi, Chinese, Tibetan and Tangut - provided by the Institute of Yi of Guizhou University of Engineering Science and National Digital Library of China, and its precision, recall and F-measure outperformed most of the existing text detection models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Real-Time Scene Uyghur Text Detection Network Based on Feature Complementation

Adaptive feature fusion for scene text script identification

Article 08 January 2024

Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy

Article 23 September 2015

References

Tian, Z., Shen, C., Chen, H., He, T.: FCOS: fully convolutional one-stage object detection. In: ICCV (2019)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR (2015)
Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature Pyramid Networks for Object Detection, arXiv preprint. arXiv: 1612.03144 (2017)
Handwritten Ancient-Books Dataset: HWAD. Unpublished Data
Google Scholar
Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
Google Scholar
Bodla, N., Singh, B., Chellappa, R., Davis, L.: Improving object detection with one line of code. In: ICCV (2017)
Google Scholar
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: AAAI (2020)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: AAAI (2017)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: CVPR (2017)
Google Scholar
Zhang, C., et al.: Look more than once: an accurate detector for text of arbitrary shapes. In: CVPR (2019)
Google Scholar
Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI, pp. 6773–6780 (2018)
Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: CVPR (2019)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI (2020)
Google Scholar
Su, X., Gao, G.: A knowledge-based recognition system for historical Mongolian documents. Int. J. Document Anal. Recogn. Neural Netw. 124, 117–129 (2020)
Google Scholar
Shi, X., Huang, Y., Liu, Y.: Text on oracle rubbing segmentation method based on connected domain. In: Proceedings of IEEE Advanced Information Management, Communicates Electronic and Automation Control Conference, pp. 414–418. IEEE Computer Society Press, Anyang (2016)
Google Scholar
Hailin, Y., Lianwen, J., Weiguo, H., et al.: Dense and tight detection of Chinese characters in historical documents: datasets and a recognition guided detector. IEEE Access 6, 30174–30183 (2018)
Article Google Scholar
Han, Y.H., Wang, W.L., Wang, Y.Q.: Research on automatic block binarization method of stained Tibetan historical document image based on Lab color space. In: International Forum on Management, Education and Information Technology Application, pp. 327–338 (2018)
Google Scholar
Huang, G., Liu, Z., Maaten, L., Weinberger, Q.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: ICCV (2016)
Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J.Y., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: CVPR (2019)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR 2015 (2015)
Google Scholar
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: ICCV (2019)
Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Google Scholar
Chen, K., et al.: MMDetection: Open MMLab Detection Toolbox and Benchmark, arXiv preprint. arXiv: 1906.07155 (2019)
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of CVPR, pp. 3482–3490 (2017)
Google Scholar
Liao, M., Zhu, Z., Shi, B., Xia, G.-S., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: CVPR (2018)
Google Scholar
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: CVPR (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Southwest University, Chongqing, China
Ruiqi Feng, Fujia Zhao, Shanxiong Chen & Dingwang Wang
Guizhou University of Engineering Science, Bijie, Guizhou, China
Shixue Zhang

Authors

Ruiqi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Fujia Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shanxiong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shixue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dingwang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shanxiong Chen .

Editor information

Editors and Affiliations

Boise State University, Boise, ID, USA
Elisa H. Barney Smith
Indian Statistical Institute, Kolkata, India
Umapada Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, R., Zhao, F., Chen, S., Zhang, S., Wang, D. (2021). A Handwritten Text Detection Model Based on Cascade Feature Fusion Network Improved by FCOS. In: Barney Smith, E.H., Pal, U. (eds) Document Analysis and Recognition – ICDAR 2021 Workshops. ICDAR 2021. Lecture Notes in Computer Science(), vol 12917. Springer, Cham. https://doi.org/10.1007/978-3-030-86159-9_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-86159-9_4
Published: 02 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86158-2
Online ISBN: 978-3-030-86159-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)