Skip to main content

Deep Layout Analysis of Multi-lingual and Composite Documents

  • Conference paper
  • First Online:
Intelligent Systems and Pattern Recognition (ISPR 2023)

Abstract

It is crucial to accurately analyze the layout to convert document images to high-quality text. With the emergence of publicly available, large ground-truth datasets, deep-learning models have demonstrated their effectiveness in detecting and segmenting document layouts. This study presents a deep learning technique for document structure analysis, an important stage in the optical character recognition (OCR) system. Our method employs the YOLOv7 (Only Look Once version 7) model, a highly efficient and precise object detection model trained on the DocLayNet database. The trained YOLOv7 model quickly and efficiently identified and categorized different document components, such as caption, list item, text, table, section header, and picture. Regarding accuracy and efficiency, our evaluation demonstrates that the suggested method beats existing strategies, with strong generalization ability for diverse document layouts, text styles, and scripts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bukhari, S.S., Breuel, T.M., Asi, A., El-Sana, J.: Layout analysis for Arabic historical document images using machine learning. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 639–644. IEEE (2012)

    Google Scholar 

  2. Huang, Y., et al.: A yolo-based table detection method. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 813–818. IEEE (2019)

    Google Scholar 

  3. Bethesda (MD): National Library of Medicine: PMC open access subset (2003). https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/. Accessed 15 Jan 2023

  4. Minouei, M., Soheili, M.R., Stricker, D.: Document layout analysis with an enhanced object detector. In: 2021 5th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 1–5. IEEE (2021)

    Google Scholar 

  5. O’Gorman, L.: The document spectrum for page layout analysis. IEEE Trans. Pattern Anal. Mach. Intell. 15(11), 1162–1173 (1993)

    Article  Google Scholar 

  6. Pfitzmann, B., Auer, C., Dolfi, M., Nassar, A.S., Staar, P.: Doclaynet: a large human-annotated dataset for document-layout segmentation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3743–3751 (2022)

    Google Scholar 

  7. Tran, H.T., Nguyen, N.Q., Tran, T.A., Mai, X.T., Nguyen, Q.T.: A deep learning-based system for document layout analysis. In: 2022 The 6th International Conference on Machine Learning and Soft Computing, pp. 20–25 (2022)

    Google Scholar 

  8. Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)

  9. WongKinYiu: yolov7 (2022). https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7_training.pt. Accessed 18 Jan 2023

  10. Zhang, D., Mao, R., Guo, R., Jiang, Y., Zhu, J.: Yolo-table: disclosure document table detection with involution. Int. J. Doc. Anal. Recognit. (IJDAR) 26, 1–14 (2022)

    Google Scholar 

  11. Zhang, H., Xu, C., Shi, C., Bi, H., Li, Y., Mian, S.: HSCA-Net: a hybrid spatial-channel attention network in multiscale feature pyramid for document layout analysis. J. Artif. Intell. Technol. 3(1), 10–17 (2023)

    Google Scholar 

  12. Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1015–1022. IEEE (2019)

    Google Scholar 

  13. Zulfiqar, A., Ul-Hasan, A., Shafait, F.: Logical layout analysis using deep learning. In: 2019 Digital Image Computing: Techniques and Applications (DICTA), pp. 1–5. IEEE (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afef Kacem Echi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gader, T.B.A., Echi, A.K. (2024). Deep Layout Analysis of Multi-lingual and Composite Documents. In: Bennour, A., Bouridane, A., Chaari, L. (eds) Intelligent Systems and Pattern Recognition. ISPR 2023. Communications in Computer and Information Science, vol 1940. Springer, Cham. https://doi.org/10.1007/978-3-031-46335-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-46335-8_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-46334-1

  • Online ISBN: 978-3-031-46335-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics