Deep Layout Analysis of Multi-lingual and Composite Documents

Gader, Takwa Ben Aïcha; Echi, Afef Kacem

doi:10.1007/978-3-031-46335-8_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1940))

Included in the following conference series:

International Conference on Intelligent Systems and Pattern Recognition

208 Accesses

Abstract

It is crucial to accurately analyze the layout to convert document images to high-quality text. With the emergence of publicly available, large ground-truth datasets, deep-learning models have demonstrated their effectiveness in detecting and segmenting document layouts. This study presents a deep learning technique for document structure analysis, an important stage in the optical character recognition (OCR) system. Our method employs the YOLOv7 (Only Look Once version 7) model, a highly efficient and precise object detection model trained on the DocLayNet database. The trained YOLOv7 model quickly and efficiently identified and categorized different document components, such as caption, list item, text, table, section header, and picture. Regarding accuracy and efficiency, our evaluation demonstrates that the suggested method beats existing strategies, with strong generalization ability for diverse document layouts, text styles, and scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

BINYAS: a complex document layout analysis system

Article 04 November 2020

VTLayout: Fusion of Visual and Text Features for Document Layout Analysis

References

Bukhari, S.S., Breuel, T.M., Asi, A., El-Sana, J.: Layout analysis for Arabic historical document images using machine learning. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 639–644. IEEE (2012)
Google Scholar
Huang, Y., et al.: A yolo-based table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818. IEEE (2019)
Google Scholar
Bethesda (MD): National Library of Medicine: PMC open access subset (2003). https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/. Accessed 15 Jan 2023
Minouei, M., Soheili, M.R., Stricker, D.: Document layout analysis with an enhanced object detector. In: 2021 5th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 1–5. IEEE (2021)
Google Scholar
O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)
Article Google Scholar
Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.: Doclaynet: a large human-annotated dataset for document-layout segmentation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3743–3751 (2022)
Google Scholar
Tran, H.T., Nguyen, N.Q., Tran, T.A., Mai, X.T., Nguyen, Q.T.: A deep learning-based system for document layout analysis. In: 2022 The 6th International Conference on Machine Learning and Soft Computing, pp. 20–25 (2022)
Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
WongKinYiu: yolov7 (2022). https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt. Accessed 18 Jan 2023
Zhang, D., Mao, R., Guo, R., Jiang, Y., Zhu, J.: Yolo-table: disclosure document table detection with involution. Int. J. Doc. Anal. Recognit. (IJDAR) 26, 1–14 (2022)
Google Scholar
Zhang, H., Xu, C., Shi, C., Bi, H., Li, Y., Mian, S.: HSCA-Net: a hybrid spatial-channel attention network in multiscale feature pyramid for document layout analysis. J. Artif. Intell. Technol. 3(1), 10–17 (2023)
Google Scholar
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)
Google Scholar
Zulfiqar, A., Ul-Hasan, A., Shafait, F.: Logical layout analysis using deep learning. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–5. IEEE (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

National Superior School of Engineering, University of Tunis, LR: LATICE, Tunis, Tunisia
Takwa Ben Aïcha Gader & Afef Kacem Echi

Authors

Takwa Ben Aïcha Gader
View author publications
You can also search for this author in PubMed Google Scholar
Afef Kacem Echi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Afef Kacem Echi .

Editor information

Editors and Affiliations

Larbi Tebessi University, Tebessa, Algeria
Akram Bennour
Sharjah University, Sharjah, United Arab Emirates
Ahmed Bouridane
University of Toulouse, Toulouse, France
Lotfi Chaari

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gader, T.B.A., Echi, A.K. (2024). Deep Layout Analysis of Multi-lingual and Composite Documents. In: Bennour, A., Bouridane, A., Chaari, L. (eds) Intelligent Systems and Pattern Recognition. ISPR 2023. Communications in Computer and Information Science, vol 1940. Springer, Cham. https://doi.org/10.1007/978-3-031-46335-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-46335-8_7
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-46334-1
Online ISBN: 978-3-031-46335-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Deep Layout Analysis of Multi-lingual and Composite Documents