Memory-efficient document layout analysis method using LD-net

Zhao, Haoyu; Min, Weidong; Wang, Qi; Wei, Zitai

doi:10.1007/s11042-022-12497-9

Memory-efficient document layout analysis method using LD-net

Published: 26 July 2022

Volume 82, pages 4371–4386, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Haoyu Zhao¹,
Weidong Min ORCID: orcid.org/0000-0003-2526-2181^1,2,3,
Qi Wang⁴ &
…
Zitai Wei¹

272 Accesses
4 Citations
Explore all metrics

Abstract

Document layout analysis is a critical step in optical character recognition. Traditional handcraft feature-based methods cannot handle various formats to obtain high accuracy. Although, deep-learning based methods obtain satisfactory accuracy, they are not memory-efficient for low-memory devices such as mobile phone. To alleviate such problems, a memory-efficient approach to layout analysis with the Lightweight Dilated Network (LD-Net) is proposed in this study. The initial document page image is segmented into blocks of content via Otsu algorithm and RLSA. Each block is sent into the LD-Net to classify them into four common different classes, figure, table, text, and formula. The main structure of the LD-Net is a shallow network, which performs better than deeper network for layout analysis task. Each convolution layer is composed of depthwise separable convolution and residual structure. In addition, the dilated convolution is also employed in the LD-Net to improve the accuracy of detection results. Experimental results based on benchmarks show that the proposed approach gets better performance in accuracy and memory occupied. The accuracy of the model on ICDAR dataset is 95.7% and the memory of the model occupies 39.7MB, which outperforms the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBAM: Convolutional Block Attention Module

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey of the recent architectures of deep convolutional neural networks

Article 21 April 2020

References

Bhowmik S, Kundu S, Sarkar R (2020) BINYAS: A complex document layout analysis system. Multimedia Tools Appl, pp 1–34
Binmakhashen GM, Mahmoud SA (2019) Document layout analysis: A comprehensive survey. ACM Comput Surv 52(6):1–36
Article Google Scholar
Breuel T (2002) Two geometric algorithms for layout analysis. In: Proc ACM Int Workshop Doc Anal Syst, Princeton, USA, pp 188–199
Breuel T (2008) The OCRopus open source OCR system. In: Proc IS&T/SPIE 20th Annu Symp, San Jose, California, USA, pp 0F1–0F15
Bukhari SS, Shafait F, Breuel T (2011) Improved document image segmentation algorithm using multiresolution morphology. In: SPIE document recognition and retrieval XVIII, DRR’11, San Francisco, USA, pp 78740D–78740D
Bukhari S, Shafait F, Breuel T (2013) Towards generic text-line extraction. In: Proc Int Conf Document Anal Recognit (ICDAR), Washington, pp 748–752
Bukhari S, Shafait F, Breuel T (2013) Coupled snakelets for curled text-line segmentation from warped document images. Int J Doc Anal Recognit. (IJDAR) 16(1):33–53
Article Google Scholar
Campos VB, Calvo-Zaragoza J, Toselli AH, Ruiz EV (2016) Sheet Music Statistical Layout Analysis. In: Proc 14th Int Conf Frontiers Handwriting Recognit (ICFHR), Shenzhen, China, pp 313–318
Chang F, Chu S-Y, Chen C-Y (2005) Chinese document layout analysis using adaptive regrouping strategy. Pattern Recognit 38:261–271
Article Google Scholar
Dai-Ton H, Duc-Dung N, Duc-Hieu L (2016) An, adaptive over-split and merge algorithm for page segmentation. Pattern Recogn Lett 80:137–143
Article Google Scholar
De R, Chakraborty A, Sarkar R (2020) Document image binarization using dual discriminator generative adversarial networks. IEEE Signal Process Lett 27:1090–1094
Article Google Scholar
Gao L, Yi X, Jiang Z, Hao L, Tang Z (2017) ICDAR 2017 competition on page object detection. In: Proc 14th IAPR Int Conf Document Anal Recognit (ICDAR), Kyoto, Japan, pp 141–1422
Hesham AM, Rashwan MA, Al-Barhamtoshy HM, Abdou SM, Badr AA, Farag I (2017) Arabic document layout analysis. Pattern Anal Appl 20:1275–1287
Article MathSciNet Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Kasar T, Barlas P, Adam S, Chatelain C, Paquet T (2013) Learning to detect tables in scanned document images using line information. In: Proc Int Conf Document Anal Recognit (ICDAR), pp 1185–1189
Koci E, Thiele M, Lehner W, Romero O (2018) Table recognition in spreadsheets via a graph representation. In: IAPR international workshop on document analysis systems (DAS). IEEE, Vienna, Austria, pp 139–144
Le VP, Nayef N, Visani M, Ogier J, Tran CD (2015) Text and non-text segmentation based on connected component features. In: Proc Int Conf Document Anal Recognit (ICDAR), Tunis, pp 1096–1100
Li Y, Zou Y, Ma J (2018) DeepLayout: A semantic segmentation approach to page layout analysis. In: Proc Int Conf Intell Comput, Bengaluru, India, pp 266–277
Min W, Fan M, Guo X, Han Q (2018) A new approach to track multiple vehicles with the combination of robust detection and two classifiers. IEEE Trans Intell Trans Syst 19:174–186
Article Google Scholar
Moysset B, Messina R (2019) Are 2d-lstm really dead for offline text recognition. Int J Document Anal Recognit (IJDAR) 22:1–16
Google Scholar
Nayef N, Ogier J (2015) Text zone classification using unsupervised feature learning. In: Proc Int Conf Document Anal Recognit (ICDAR), Tunis, pp 776–780
Nguyen NV, Rigaud C, Burie JC (2019) Comic MTL: optimized multi-task learning for comic book image analysis. Int J Document Anal Recognit (IJDAR) 22:265–284
Article Google Scholar
Niu Y, Wen J, Zhong P, Xue Y (2019) A Hybrid, R-BILSTM-C neural network based text steganalysis. IEEE Signal Process Lett 26(12):1907–1911
Article Google Scholar
Oliveira DAB, Viana PM (2017) Fast CNN-based document layout analysis. In: Proc IEEE Conf Comput Vis Pattern Recog, Waikiki, USA, pp 1173–1180
Otsu N (1979) Threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern SMC-9(1):62–66
Article Google Scholar
Phillips I (1995) User’s reference manual, cd-rom, uw-iii document image database-iii
Qin X, Zhou Y, He Z, Wang Y, Tang Z (2017) A Faster R-CNN based method for comic characters face detection. In: Proc Int Conf Document Anal Recognit (ICDAR), Kyoto, Japan, pp 1074–1080
Royer E, Bouchara F (2017) Guiding text image keypoints extraction through layout analysis. In: Proc Int Conf Document Anal Recognit (ICDAR), Kyoto, Japan, pp 9–14
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-CAM: Visual explanations from deep networks via gradient-based localization. In: Proc IEEE Int Conf Comput Vis, pp 618–626
Tran TA, Na IS, Kim SH (2016) Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology. Int J Doc Anal Recognit (IJDAR) 19(3):191–209
Article Google Scholar
Tran TA, Na IS, Kim SH (2017) A robust system for document layout analysis using multilevel homogeneity structure. Expert Syst Appl 85:99–113
Article Google Scholar
Tran DN, Tran TA, Oh A, Kim SH, Na IS (2005) Table detection from document image using vertical arrangement of text blocks. Int J Contents 11(4):77–85
Article Google Scholar
Wang Q, Min W, He D, Zou S, Huang T, Zhang Y, Liu R (2020) Discriminative fine-grained network for vehicle re-identification using two-stage re-ranking. Sci China Inf Sci. https://doi.org/10.1007/385s11432-019-2811-8
Wong K, Casey R, Wahl F (1982) Document analysis systems. IBM J Res Dev 26(6):647–656
Article Google Scholar
Yang J, Kim H, Kwak H, Kim I (2019) HanFont: large-scale adaptive Hangul font recognizer using CNN and font clustering. Int J Document Anal Recognit (IJDAR) 22:407–416
Article Google Scholar
Yi X, Gao L, Liao Y, Zhang X, Liu R, Jiang Z (2017) CNN based page object detection in document images. In: Proc Int Conf Document Anal Recognit (ICDAR), Kyoto, Japan, pp 230–235
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proc Int Conf Learn Representations
Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proc Conf Computer Vision and Pattern Recognition (CVPR), Salt Lake, pp 6848–6856

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant No. 62076117 and No. 62166026), the Natural Science Foundation of Jiangxi Province, China (Grant No. 20161ACB20004) and Jiangxi Key Laboratory of Smart City (Grant No. 20192BCD40002).

Author information

Authors and Affiliations

School of Mathematics and Computer Science, Nanchang University, Nanchang, 330031, China
Haoyu Zhao, Weidong Min & Zitai Wei
Institute of Metaverse, Nanchang University, Nanchang, 330031, China
Weidong Min
Jiangxi Key Laboratory of Smart City, Nanchang, 330031, China
Weidong Min
School of Software, Nanchang University, Nanchang, 330047, China
Qi Wang

Authors

Haoyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Min
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zitai Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weidong Min.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, H., Min, W., Wang, Q. et al. Memory-efficient document layout analysis method using LD-net. Multimed Tools Appl 82, 4371–4386 (2023). https://doi.org/10.1007/s11042-022-12497-9

Download citation

Received: 15 January 2021
Revised: 27 April 2021
Accepted: 25 January 2022
Published: 26 July 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-12497-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Memory-efficient document layout analysis method using LD-net

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey of the recent architectures of deep convolutional neural networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Memory-efficient document layout analysis method using LD-net

Abstract

Access this article

Similar content being viewed by others

CBAM: Convolutional Block Attention Module

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

A survey of the recent architectures of deep convolutional neural networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation