Scene table structure recognition with segmentation collaboration and alignment☆
Introduction
As high-density information expressions, tables have been used extensively in various documents in daily life. Tables in documents always contain key information to be extracted for further data management, such as financial statement analysis and electronic medical record retrieval. Although humans can easily understand tables with various styles and layouts, it remains very challenging for machines to recognize the structures of different tables automatically.
Automatic table structure recognition means that the machine can automatically detect the physical location of all cells in the table and obtain their logical location (i.e., the row–column relationship). In previous studies [3], [5], [14], [20], [21], [23], [25], [26], [27], [30], [33], [34], [35], table structure recognition was generally performed in electronic documents (e.g., PDF) or scanned documents. The tables in these scenarios are usually well aligned with clean and well-structured backgrounds. The above characteristics can be clearly observed from the datasets [5], [11], [14], [35] used in the aforementioned studies. We consider the table structure recognition under this condition as standard scenario table structure recognition (SS-TSR).
However, tables exist not only in standard scenarios, but also in natural scenarios. Natural scenario tables appear in various forms, such as paper documents, advertisements, street announcements, and merchandise packaging. With the advancement of handheld photographic equipment, it is very convenient to capture table images in a natural environment. However, such captured tables often suffer from uncontrolled geometric distortions (i.e., incline, rotation, and bending). Common image quality issues such as blurring and lighting changes may also occur. Fig. 1(a) shows an example of the difference between a standard scenario table and natural scenario table. It is obvious that the table image in natural scenarios is not aligned, which has resulted in many state-of-the-art methods [23], [24], [33] not positioning cells in the table effectively (as illustrated in Fig. 1(b)). Therefore, table structure recognition in natural scenarios remains a challenging task in table understanding. We refer to table structure recognition in natural scenarios as scene table structure recognition (S-TSR). To address the cell detection challenge caused by geometric distortions in scene table structure recognition, a recent study [18] provided the S-TSR dataset Wired Table in the Wild (WTW) and a baseline solution for S-TSR to the community. Although their method can be partially adapted to natural scene tables, the performance needs to be improved for complex scenes such as curves, grids, occlusion, and blurring.
We propose a novel segmentation collaboration and alignment network (SCAN) to solve the problem of table structure recognition in natural environments. This method simultaneously focuses on the location and structural information of the cells through different segmentation branches. By combining different types of information, high-quality cell segmentation maps can be obtained that can address the cell adhesion issues caused by using single cell segmentation (see Fig. 1(c)). These maps also enable our method to localize cells of arbitrary shapes accurately (see Fig. 1(d)). Finally, the cell alignment module aligns the cell segmentation maps to restore the distorted table structure, thereby facilitating post-processing to parse the table structure quickly and robustly.
The contributions of this study are summarized as follows:
- •
The novel SCAN is proposed to address the cell detection challenge in scene table structure recognition.
- •
The segmentation collaboration module is developed to combine the positional and structural information of cells.
- •
The cell alignment module is proposed to align the cell segmentation maps to restore the distorted table structure.
- •
The proposed SCAN achieves state-of-the-art results on the S-TSR datasets WTW and TAL_OCR_TABLE.
Section snippets
Related works
Early table structure recognition [9], [12], [13], [16], [17], [28], [29] was mainly based on handcrafted features and heuristic rules. Most research applications are limited to special data formats, such as PDF, and it is difficult to parse complicated tables. With the development of deep neural networks, image-based table structure recognition methods have gradually become mainstream. We divide the methods used in the deep learning era into four categories: object detection-based methods,
Overall pipeline
The pipeline of the proposed SCAN is illustrated in Fig. 2. Inspired by DeeplabV3+ [4], our method uses an encoder–decoder structure for segmentation. The encoder combines multiscale information through ASPP [3], whereas the decoder combines the results of the encoder with low-level features to generate the feature map . is projected onto three branches to produce the semantic segmentation results of the cell, cell border, and table regions, which are represented by , , and
Experiments
We tested our method on the S-TSR datasets WTW and TAL_OCR_TABLE, and the SS-TSR dataset ICDAR2013. We compared our SCAN to several state-of-the-art methods. We also conducted ablation experiments using this method to explore the robustness of our SCAN in sub-scenarios and the effectiveness of the SCM.
Conclusions and future work
We have proposed the novel SCAN to tackle the natural scene TSR problem. Our method combines the position and structure information of the cells in the table and solves the main weaknesses of existing methods, including the inaccurate geometric prediction of curved tables and processing of unaligned table structures. The experiments revealed that our method is robust in many challenging table scenarios, thereby providing a new concept for natural scene table structure recognition.
There are
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is supported in part by GD-NSF (No. 2021A1 515011870), NSFC (Grant no. 61771199), Zhuhai Industry Core and Key Technology Research Project (No. 222000 4002350), The Major Key Project of PCL (PCL2021A12) and the Science and Technology Foundation of Guangzhou Huangpu Development District (Grant 2020GH17 ).
References (35)
- et al.
Master: multi-aspect non-local network for scene text recognition
Pattern Recognit.
(2021) - et al.
Table structure understanding and its performance evaluation
Pattern Recognit.
(2004) - et al.
Table structure understanding and its performance evaluation
Pattern Recognit.
(2004) - et al.
Global table extractor (GTE): a framework for joint table identification and cell structure recognition using visual context
2021 Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2021
(2021) Principal warps: thin-plate splines and the decomposition of deformations
IEEE Trans. Pattern Anal. Mach. Intell.
(1989)- et al.
A cylindrical surface model to rectify the bound document image
Proceedings Ninth IEEE International Conference on Computer Vision
(2003) - et al.
DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs
IEEE Trans. Pattern Anal. Mach. Intell.
(2017) - et al.
Encoder-decoder with atrous separable convolution for semantic image segmentation
2018 Proceedings of the European Conference on Computer Vision, ECCV 2018
(2018) - Z. Chi, H. Huang, H.-D. Xu, H. Yu, W. Yin, X.-L. Mao, Complicated table structure recognition, arXiv preprint...
- M. Contributors, MMSegmentation: openmmlab semantic segmentation toolbox and benchmark, 2020,...
A tutorial on the cross-entropy method
Ann. Oper. Res.
Detecting and recognizing tables in spreadsheets
2010 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, DAS 2010
A methodology for evaluating algorithms for table understanding in PDF documents
2012 Proceedings of the 2012 ACM symposium on Document engineering, DocEng 2012
ICDAR 2013 table competition
2013 12th International Conference on Document Analysis and Recognition, ICDAR 2013
Table structure recognition based on textblock arrangement and ruled line position
1993 Proceedings of 2nd International Conference on Document Analysis and Recognition, ICDAR 1993
Table structure recognition based on robust block segmentation
Document Recognition V
Cited by (4)
UTTSR: A Novel Non-Structured Text Table Recognition Model Powered by Deep Learning Technology
2023, Applied Sciences (Switzerland)Scene Table Structure Recognition with Segmentation and Key Point Collaboration
2023, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Document Analysis using Deep Learning Techniques
2023, 2023 4th International Conference on Electronics and Sustainable Communication Systems, ICESC 2023 - Proceedings
- ☆
Editor: Sudeep Sarkar