Skip to main content

A New Method “ProjectionP” for Table Structure Recognition

  • Conference paper
  • First Online:
Computer Information Systems and Industrial Management (CISIM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14902))

  • 207 Accesses

Abstract

This paper introduces computer vision techniques to address the challenge of analyzing tabular data in digital documents. It details the research and experiments conducted by the authors aimed at automating the invoice registration process within the accounting system of a particular private company. Based on the results obtained, the authors assert that in certain contexts, pattern recognition processing techniques can continue to be effectively utilized for this problem, providing commercial benefits while circumventing some of the limitations inherent in newer, high-performance deep learning approaches. The article shows an overview of the latest advancements in the field and presents the authors’ proposed table structure recognition (TSR) algorithm, which is based on the typical profile-projection approach. It also features comparison of the developed TSR algorithm with selected methods described in the literature, undertaking a verification of its efficacy using a dataset of reals documents derived from the accounting department of a private company.

Supported by grant WZ//I-IIT/5/2023.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Coüasnon, B., Lemaitre, A.: Recognition of tables and forms. In: Doermann, D., Tombre, K. (eds.) Handbook of Document Image Processing and Recognition, pp. 647–677. Springer, London (2014). https://doi.org/10.1007/978-0-85729-859-1_20

    Chapter  Google Scholar 

  2. Hashmi, K.A., et al.: Current status and performance analysis of table recognition in document images with deep neural networks. IEEE Access 9, 87663–87685 (2021). https://doi.org/10.1109/ACCESS.2021.3087865

    Article  Google Scholar 

  3. Zucker, A., Belkada, Y., Vu, H., Nguyen, V.N.: ClusTi: clustering method for table structure recognition in scanned images. Mobile Netw. Appl. 26(4), 1765–1776 (2021). https://doi.org/10.1007/s11036-021-01759-9

    Article  Google Scholar 

  4. Huynh-Van, T., et al.: Learning to detect tables in document images using line and text information. In: Proceedings of the 2nd International Conference on Machine Learning and Soft Computing, pp. 151–155 (2018). https://doi.org/10.1145/3184066.3184091

  5. Handley, J.: Table analysis for multi-line cell identification. In: Document Recognition and Retrieval VIII, vol. 4307, pp. 34–43 (2001). https://doi.org/10.1117/12.410853

  6. Bardhan, S.: Table Detection and Text Extraction - OpenCV and Pytesseract. Analytics Vidhya (2020)

    Google Scholar 

  7. Zuyev, K.: Table image segmentation. In: Proceedings of the Fourth International Conference on Document Analysis and Recognition, vol. 2, pp. 705–708. IEEE (1997). https://doi.org/10.1109/ICDAR.1997.620599

  8. Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 128–133. IEEE (2019). https://doi.org/10.1109/icdar.2019.00029

  9. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162–1167. IEEE (2017). https://doi.org/10.1109/icdar.2017.192

  10. Kieninger, T., Dengel, A.: The T-recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) DAS 1998. LNCS, vol. 1655, pp. 255–270. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48172-9_21

    Chapter  Google Scholar 

  11. Elomaa, T.: Anssi nurminen algorithmic extraction of data in tables in PDF documents. Master’s thesis (2013)

    Google Scholar 

  12. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In:Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020). https://doi.org/10.1109/cvprw50498.2020.00294

  13. Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5

    Chapter  Google Scholar 

  14. CamelotDevelopers: Camelot: PDF table extraction for humans (2021). https://camelot-py.readthedocs.io/en/master/

  15. Clinchant, S., et al.: Comparing machine learning approaches for table recognition in historical register books. In: 13th IAPR International Workshop on Document Analysis Systems, DAS 2018, Vienna, Austria, April 24-27, 2018, pp. 133–138. IEEE Computer Society (2018). https://doi.org/10.1109/DAS.2018.44

  16. Gao, L., et al.: Competition on table detection and recognition (cTDaR). In: 2019 International Conference on Document Analysis and Recognition (ICDAR) (2019). https://doi.org/10.5281/zenodo.3239032

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mirosław Omieljanowicz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marczak, P., Omieljanowicz, M. (2024). A New Method “ProjectionP” for Table Structure Recognition. In: Saeed, K., Dvorský, J. (eds) Computer Information Systems and Industrial Management. CISIM 2024. Lecture Notes in Computer Science, vol 14902. Springer, Cham. https://doi.org/10.1007/978-3-031-71115-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71115-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71114-5

  • Online ISBN: 978-3-031-71115-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics