HisDoc R-CNN: Robust Chinese Historical Document Text Line Detection with Dynamic Rotational Proposal Network and Iterative Attention Head

Jian, Cheng; Jin, Lianwen; Liang, Lingyu; Liu, Chongyu

doi:10.1007/978-3-031-41676-7_25

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14187))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1150 Accesses

Abstract

Text line detection is an essential task in a historical document analysis system. Although many existing text detection methods have achieved remarkable performance on various scene text datasets, they cannot perform well because of the high density, multiple scales, and multiple orientations of text lines in complex historical documents. Thus, it is crucial and challenging to investigate effective text line detection methods for historical documents. In this paper, we propose a Dynamic Rotational Proposal Network (DRPN) and an Iterative Attention Head (IAH), which are incorporated into Mask R-CNN to detect text lines in historical documents. The DRPN can dynamically generate horizontal or rotational proposals to enhance the robustness of the model for multi-oriented text lines and alleviate the multi-scale problem in historical documents. The proposed IAH integrates a multi-dimensional attention mechanism that can better learn the features of dense historical document text lines while improving detection accuracy and reducing the model parameters via an iterative mechanism. Our HisDoc R-CNN achieves state-of-the-art performance on various historical document benchmarks including CHDAC (the IACC competition (http://iacc.pazhoulab-huangpu.com/shows/108/1.html) dataset), MTHv2, and ICDAR 2019 HDRC CHINESE, thereby demonstrating the robustness of our method. Furthermore, we present special tricks for historical document scenarios, which may provide useful insights for practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
ODConv is a dynamic convolution. For details, please refer to the original paper [7].
2.
This paper uses the datasets provided by the organizers in the preliminaries and finals. The datasets voluntarily submitted by the teams in the final stage are not included.
3.
https://tc11.cvc.uab.es/datasets/ICDAR2019HDRC_1,
4.
EAST and OBD are not implemented by MMOCR.

References

Barakat, B., Droby, A., Kassis, M., El-Sana, J.: Text line segmentation for challenging handwritten document images using fully convolutional network. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 374–379. IEEE (2018)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42(2), 386–397 (2018)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778 (2016)
Google Scholar
Kuang, Z., et al.: MMOCR: a comprehensive toolbox for text detection, recognition and understanding. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 3791–3794 (2021)
Google Scholar
Li, C., Zhou, A., Yao, A.: Omni-dimensional dynamic convolution. In: International Conference on Learning Representations (2022)
Google Scholar
Liao, M., Shi, B., Bai, X.: Textboxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 27(8), 3676–3690 (2018)
Article MathSciNet MATH Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Google Scholar
Liao, M., Zhu, Z., Shi, B., Xia, G.s., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5909–5918 (2018)
Google Scholar
Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans. Pattern Anal. Mach. Intell. 45, 919–931 (2022)
Article Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., et al.: Exploring the capacity of an orderless box discretization network for multi-orientation scene text detection. Int. J. Comput. Vision 129(6), 1972–1992 (2021)
Article Google Scholar
Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3454–3461. IEEE Computer Society (2017)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)
Google Scholar
Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20(11), 3111–3122 (2018)
Article MathSciNet Google Scholar
Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., Wang, Y.: Joint layout analysis, character detection and recognition for historical document digitization. In: 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 31–36. IEEE (2020)
Google Scholar
Mechi, O., Mehri, M., Ingold, R., Amara, N.E.B.: Text line segmentation in historical document images using an adaptive U-Net architecture. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 369–374. IEEE (2019)
Google Scholar
Mechi, O., Mehri, M., Ingold, R., Essoukri Ben Amara, N.: A two-step framework for text line segmentation in historical Arabic and Latin document images. Int. J. Doc. Anal. Recogn. (IJDAR) 24(3), 197–218 (2021)
Article Google Scholar
Prusty, A., Aitha, S., Trivedi, A., Sarvadevabhatla, R.K.: Indiscapes: instance segmentation networks for layout parsing of historical Indic manuscripts. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 999–1006. IEEE (2019)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 1, pp. 91–99 (2015)
Google Scholar
Renton, G., Soullard, Y., Chatelain, C., Adam, S., Kermorvant, C., Paquet, T.: Fully convolutional network with dilated convolutions for handwritten text line segmentation. Int. J. Doc. Anal. Recogn. (IJDAR) 21(3), 177–186 (2018). https://doi.org/10.1007/s10032-018-0304-3
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Saini, R., Dobson, D., Morrey, J., Liwicki, M., Liwicki, F.S.: ICDAR 2019 historical document reading challenge on large structured Chinese family records. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1499–1504. IEEE (2019)
Google Scholar
Sharan, S.P., Aitha, S., Kumar, A., Trivedi, A., Augustine, A., Sarvadevabhatla, R.K.: Palmira: a deep deformable network for instance segmentation of dense and uneven layouts in handwritten manuscripts. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12822, pp. 477–491. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86331-9_31
Chapter Google Scholar
Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Google Scholar
Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8440–8449 (2019)
Google Scholar
Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021)
Google Scholar
Zhou, X., et al.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Google Scholar
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3123–3131 (2021)
Google Scholar

Download references

Acknowledgement

This research is supported in part by NSFC (Grant No.: 61936003), Zhuhai Industry Core and Key Technology Research Project (No. 2220004002350), Science and Technology Foundation of Guangzhou Huangpu Development District (No. 2020GH17) and GD-NSF (No.2021A1515011870), and Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

South China University of Technology, Guangzhou, China
Cheng Jian, Lianwen Jin, Lingyu Liang & Chongyu Liu
SCUT-Zhuhai Institute of Modern Industrial Innovation, Zhuhai, China
Lianwen Jin, Lingyu Liang & Chongyu Liu

Authors

Cheng Jian
View author publications
You can also search for this author in PubMed Google Scholar
Lianwen Jin
View author publications
You can also search for this author in PubMed Google Scholar
Lingyu Liang
View author publications
You can also search for this author in PubMed Google Scholar
Chongyu Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lianwen Jin .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jian, C., Jin, L., Liang, L., Liu, C. (2023). HisDoc R-CNN: Robust Chinese Historical Document Text Line Detection with Dynamic Rotational Proposal Network and Iterative Attention Head. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14187. Springer, Cham. https://doi.org/10.1007/978-3-031-41676-7_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-41676-7_25
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41675-0
Online ISBN: 978-3-031-41676-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

HisDoc R-CNN: Robust Chinese Historical Document Text Line Detection with Dynamic Rotational Proposal Network and Iterative Attention Head