Abstract
Table structure recognition is a key component in document understanding. Many prior methods have addressed this problem with three sequential steps: table detection, table component extraction, and structure analysis based on pairwise relations. However, they have limitations in addressing complexly structured tables and/or practical scenarios (e.g., scanned documents). In this paper, we propose a novel graph-based table structure recognition framework. In order to handle complex tables, we formulate tables as planar graphs, whose faces are cell-regions. Then, we compute vertex (junction) confidence maps and line fields with the heatmap regression networks having a small number of parameters (about 1M) and reconstruct tables by solving a constrained optimization problem. We demonstrate the robustness of the proposed system through experiments on ICDAR 2019 dataset and on challenging table images. Experimental results show that the proposed method outperforms the conventional method for a range of scenarios and delivers good generalization performance.
Similar content being viewed by others
References
Bhowmik S, Kundu S, Sarkar R (2021) Binyas: a complex document layout analysis system. Multimedia Tools and Applications 80(6):8471–8504
Bulat A, Tzimiropoulos G (2016) Human pose estimation via convolutional part heatmap regression. In: European conference on computer vision. Springer, pp 717–732
Cao Z, Hidalgo G, Simon T, Wei SE, Sheikh Y (2019) Openpose: realtime multi-person 2d pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(1):172–186
Chi Z, Huang H, Xu HD, Yu H, Yin W, Mao XL (2019) Complicated table structure recognition. arXiv:1908.04729
Coüasnon B, Lemaitre A (2014) Recognition of tables and forms. In: Handbook of document image processing and recognition. pp 647–677
Deng Y, Kanervisto A, Rush AM (2016) What you get is what you see: A visual markup decompiler. 10:32–37. arXiv:1609.04938
Gao L, Huang Y, Déjean H, Meunier JL, Yan Q, Fang Y, Kleber F, Lang E (2019) Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1510–1515
Gilani A, Qasim SR, Malik I, Shafait F (2017) Table detection using deep learning. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 771–776
Gurobi Optimization L (2021) Gurobi optimizer reference manual. http://www.gurobi.com
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision. pp. 2961–2969
Hirayama Y (1995) A method for table structure analysis using dp matching. In: Proceedings of 3rd international conference on document analysis and recognition, vol 2. IEEE, pp 583–586
Itonori K (1993) Table structure recognition based on textblock arrangement and ruled line position. In: Proceedings of 2nd international conference on document analysis and recognition (ICDAR’93). IEEE, pp 765–768
Khan SA, Khalid SMD, Shahzad MA, Shafait F (2019) Table structure extraction with bi-directional gated recurrent unit networks. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1366–1371
Kieninger T, Dengel A (1998) The t-recs table recognition and analysis system. In: International workshop on document analysis systems. Springer, pp 255–270
Kieninger TG (1998) Table structure recognition based on robust block segmentation. In: Document recognition V, vol 3305, pp. 22–32. International Society for Optics and Photonics
Koo HI, Cho NI (2016) Robust skew estimation using straight lines in document images. Journal of Electronic Imaging 25(3):033014
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV). pp 734–750
Le Vine N, Zeigenfuse M, Rowan M (2019) Extracting tables from documents using conditional generative adversarial networks and genetic algorithms. In: 2019 international joint conference on neural networks (IJCNN). IEEE pp 1–8
Li M, Cui L, Huang S, Wei F, Zhou M, Li Z (2020) Tablebank: Table benchmark for image-based table detection and recognition. In: Proceedings of The 12th language resources and evaluation conference. pp 1918–1925
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: European conference on computer vision. Springer, pp. 483–499
Paliwal SS, Vishwanath D, Rahul R, Sharma M, Vig L (2019) Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 128–133
Pavlakos G, Zhu L, Zhou X, Daniilidis K (2018) Learning to estimate 3d human pose and shape from a single color image. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 459–468
Prasad D, Gadpal A, Kapadni K, Visave M, Sultanpure K (2020) Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp 572–573
Qasim SR, Mahmood H, Shafait F (2019) Rethinking table recognition using graph neural networks. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 142–147
Raja S, Mondal A, Jawahar C (2020) Table structure recognition using top-down and bottom-up cues. In: European conference on computer vision. Springer, pp 70–86
Schreiber S, Agne S, Wolf I, Dengel A, Ahmed S (2017) Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1. IEEE, pp 1162–1167
Seo W, Koo HI, Cho NI (2015) Junction-based table detection in camera-captured document images. International Journal on Document Analysis and Recognition (IJDAR) 18(1):47–57
Shigarov A, Mikhailov A, Altaev A (2016) Configurable table structure recognition in untagged pdf documents. In: Proceedings of the 2016 ACM symposium on document engineering. pp 119–122
Siddiqui SA, Fateh IA, Rizvi STR, Dengel A, Ahmed S (2019) Deeptabstr: Deep learning based table structure recognition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1403–1409
Siddiqui SA, Khan PI, Dengel A, Ahmed S (2019) Rethinking semantic segmentation for table structure recognition in documents. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 1397–1402
Siddiqui SA, Malik MI, Agne S, Dengel A, Ahmed S (2018) Decnt: Deep deformable cnn for table detection. IEEE Access 6:74151–74161
Tensmeyer C, Morariu VI, Price B, Cohen S, Martinez T (2019) Deep splitting and merging for table structure decomposition. In: 2019 international conference on document analysis and recognition (ICDAR). IEEE, pp 114–121
Vanhoucke V (2014) Learning visual representations at scale. ICLR Invited Talk 1:2
Wang Y, Phillips IT, Haralick RM (2004) Table structure understanding and its performance evaluation. Pattern Recognition 37(7):1479–1497
Zanibbi R, Blostein D, Cordy JR (2004) A survey of table recognition. Document Analysis and Recognition 7(1):1–16
Zheng X, Burdick D, Popa L, Zhong X, Wang NXR (2021) Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 697–706
Zhong X, ShafieiBavani E, Yepes AJ (2019) Image-based table recognition: data, model, and evaluation. arXiv:1911.10683
Acknowledgements
This work was supported in part by the Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2021-0-01062, Development of personal information processing technology for collection/utilization of high-quality and trusted training data for autonomous driving), and in part by LG AI Research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Lee, E., Park, J., Koo, H.I. et al. Deep-learning and graph-based approach to table structure recognition. Multimed Tools Appl 81, 5827–5848 (2022). https://doi.org/10.1007/s11042-021-11819-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11819-7