A New Method “ProjectionP” for Table Structure Recognition

Marczak, Paweł; Omieljanowicz, Mirosław

doi:10.1007/978-3-031-71115-2_6

Paweł Marczak⁹ &
Mirosław Omieljanowicz¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14902))

Included in the following conference series:

International Conference on Computer Information Systems and Industrial Management

207 Accesses

Abstract

This paper introduces computer vision techniques to address the challenge of analyzing tabular data in digital documents. It details the research and experiments conducted by the authors aimed at automating the invoice registration process within the accounting system of a particular private company. Based on the results obtained, the authors assert that in certain contexts, pattern recognition processing techniques can continue to be effectively utilized for this problem, providing commercial benefits while circumventing some of the limitations inherent in newer, high-performance deep learning approaches. The article shows an overview of the latest advancements in the field and presents the authors’ proposed table structure recognition (TSR) algorithm, which is based on the typical profile-projection approach. It also features comparison of the developed TSR algorithm with selected methods described in the literature, undertaking a verification of its efficacy using a dataset of reals documents derived from the accounting department of a private company.

Supported by grant WZ//I-IIT/5/2023.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

TableStrRec: framework for table structure recognition in data sheet images

Article 08 September 2023

Multi-Type-TD-TSR – Extracting Tables from Document Images Using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: From OCR to Structured Table Representations

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

References

Coüasnon, B., Lemaitre, A.: Recognition of tables and forms. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 647–677. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_20
Chapter Google Scholar
Hashmi, K.A., et al.: Current status and performance analysis of table recognition in document images with deep neural networks. IEEE Access 9, 87663–87685 (2021). https://doi.org/10.1109/ACCESS.2021.3087865
Article Google Scholar
Zucker, A., Belkada, Y., Vu, H., Nguyen, V.N.: ClusTi: clustering method for table structure recognition in scanned images. Mobile Netw. Appl. 26(4), 1765–1776 (2021). https://doi.org/10.1007/s11036-021-01759-9
Article Google Scholar
Huynh-Van, T., et al.: Learning to detect tables in document images using line and text information. In: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, pp. 151–155 (2018). https://doi.org/10.1145/3184066.3184091
Handley, J.: Table analysis for multi-line cell identification. In: Document Recognition and Retrieval VIII, vol. 4307, pp. 34–43 (2001). https://doi.org/10.1117/12.410853
Bardhan, S.: Table Detection and Text Extraction - OpenCV and Pytesseract. Analytics Vidhya (2020)
Google Scholar
Zuyev, K.: Table image segmentation. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 705–708. IEEE (1997). https://doi.org/10.1109/ICDAR.1997.620599
Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 128–133. IEEE (2019). https://doi.org/10.1109/icdar.2019.00029
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017). https://doi.org/10.1109/icdar.2017.192
Kieninger, T., Dengel, A.: The T-recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) DAS 1998. LNCS, vol. 1655, pp. 255–270. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48172-9_21
Chapter Google Scholar
Elomaa, T.: Anssi nurminen algorithmic extraction of data in tables in PDF documents. Master’s thesis (2013)
Google Scholar
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020). https://doi.org/10.1109/cvprw50498.2020.00294
Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5
Chapter Google Scholar
CamelotDevelopers: Camelot: PDF table extraction for humans (2021). https://camelot-py.readthedocs.io/en/master/
Clinchant, S., et al.: Comparing machine learning approaches for table recognition in historical register books. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, April 24-27, 2018, pp. 133–138. IEEE Computer Society (2018). https://doi.org/10.1109/DAS.2018.44
Gao, L., et al.: Competition on table detection and recognition (cTDaR). In: 2019 International Conference on Document Analysis and Recognition (ICDAR) (2019). https://doi.org/10.5281/zenodo.3239032

Download references

Author information

Authors and Affiliations

Broniewskiego 5/75, 15-748, Bialystok, Poland
Paweł Marczak
Bialystok University of Technology, Wiejska 45A, 15-385, Bialystok, Poland
Mirosław Omieljanowicz

Authors

Paweł Marczak
View author publications
You can also search for this author in PubMed Google Scholar
Mirosław Omieljanowicz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mirosław Omieljanowicz .

Editor information

Editors and Affiliations

Bialystok University of Technology, Białystok, Poland
Khalid Saeed
VSB - Technical University of Ostrava, Ostrava, Czech Republic
Jiří Dvorský

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marczak, P., Omieljanowicz, M. (2024). A New Method “ProjectionP” for Table Structure Recognition. In: Saeed, K., Dvorský, J. (eds) Computer Information Systems and Industrial Management. CISIM 2024. Lecture Notes in Computer Science, vol 14902. Springer, Cham. https://doi.org/10.1007/978-3-031-71115-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-71115-2_6
Published: 30 August 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71114-5
Online ISBN: 978-3-031-71115-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Method “ProjectionP” for Table Structure Recognition