Elsevier

Pattern Recognition Letters

Volume 165, January 2023, Pages 146-153
Pattern Recognition Letters

Scene table structure recognition with segmentation collaboration and alignment

https://doi.org/10.1016/j.patrec.2022.12.014Get rights and content

Highlights

  • The novel SCAN is proposed to address the cell detection challenge in scene table structure recognition.

  • A segmentation collaboration module is developed to combine the positional and structural information of cells.

  • An alignment module is proposed to align the cell segmentation maps to restore the distorted table structure.

  • The proposed SCAN achieves state-of-the-art results on the S-TSR datasets WTW and TAL_OCR_TABLE.

Abstract

This study addresses the challenge of scene table structure recognition (S-TSR). In contrast to the well-aligned tables in PDF documents or screenshots, natural scene tables tend to be inclined, rotated, and curved, thereby damaging the structure of the table. Consequently, many state-of-the-art methods cannot operate effectively owing to a lack of well-aligned priors. We propose a novel segmentation collaboration and alignment network (SCAN) to address the problem of table structure recognition in natural scenarios. Our SCAN combines the location and logical information of table cells through the segmentation collaboration module to segment the cell region accurately. The proposed cell alignment module aligns the segmentation result to restore the distorted table structure. The experimental results demonstrate that our method is robust to different challenging S-TSR sub-scenarios and achieves new state-of-the-art performance, with TEDS scores of 90.7 and 98.5 on the S-TSR benchmarks WTW and TAL_OCR_TABLE, respectively.

Introduction

As high-density information expressions, tables have been used extensively in various documents in daily life. Tables in documents always contain key information to be extracted for further data management, such as financial statement analysis and electronic medical record retrieval. Although humans can easily understand tables with various styles and layouts, it remains very challenging for machines to recognize the structures of different tables automatically.

Automatic table structure recognition means that the machine can automatically detect the physical location of all cells in the table and obtain their logical location (i.e., the row–column relationship). In previous studies [3], [5], [14], [20], [21], [23], [25], [26], [27], [30], [33], [34], [35], table structure recognition was generally performed in electronic documents (e.g., PDF) or scanned documents. The tables in these scenarios are usually well aligned with clean and well-structured backgrounds. The above characteristics can be clearly observed from the datasets [5], [11], [14], [35] used in the aforementioned studies. We consider the table structure recognition under this condition as standard scenario table structure recognition (SS-TSR).

However, tables exist not only in standard scenarios, but also in natural scenarios. Natural scenario tables appear in various forms, such as paper documents, advertisements, street announcements, and merchandise packaging. With the advancement of handheld photographic equipment, it is very convenient to capture table images in a natural environment. However, such captured tables often suffer from uncontrolled geometric distortions (i.e., incline, rotation, and bending). Common image quality issues such as blurring and lighting changes may also occur. Fig. 1(a) shows an example of the difference between a standard scenario table and natural scenario table. It is obvious that the table image in natural scenarios is not aligned, which has resulted in many state-of-the-art methods [23], [24], [33] not positioning cells in the table effectively (as illustrated in Fig. 1(b)). Therefore, table structure recognition in natural scenarios remains a challenging task in table understanding. We refer to table structure recognition in natural scenarios as scene table structure recognition (S-TSR). To address the cell detection challenge caused by geometric distortions in scene table structure recognition, a recent study [18] provided the S-TSR dataset Wired Table in the Wild (WTW) and a baseline solution for S-TSR to the community. Although their method can be partially adapted to natural scene tables, the performance needs to be improved for complex scenes such as curves, grids, occlusion, and blurring.

We propose a novel segmentation collaboration and alignment network (SCAN) to solve the problem of table structure recognition in natural environments. This method simultaneously focuses on the location and structural information of the cells through different segmentation branches. By combining different types of information, high-quality cell segmentation maps can be obtained that can address the cell adhesion issues caused by using single cell segmentation (see Fig. 1(c)). These maps also enable our method to localize cells of arbitrary shapes accurately (see Fig. 1(d)). Finally, the cell alignment module aligns the cell segmentation maps to restore the distorted table structure, thereby facilitating post-processing to parse the table structure quickly and robustly.

The contributions of this study are summarized as follows:

  • The novel SCAN is proposed to address the cell detection challenge in scene table structure recognition.

  • The segmentation collaboration module is developed to combine the positional and structural information of cells.

  • The cell alignment module is proposed to align the cell segmentation maps to restore the distorted table structure.

  • The proposed SCAN achieves state-of-the-art results on the S-TSR datasets WTW and TAL_OCR_TABLE.

Section snippets

Related works

Early table structure recognition [9], [12], [13], [16], [17], [28], [29] was mainly based on handcrafted features and heuristic rules. Most research applications are limited to special data formats, such as PDF, and it is difficult to parse complicated tables. With the development of deep neural networks, image-based table structure recognition methods have gradually become mainstream. We divide the methods used in the deep learning era into four categories: object detection-based methods,

Overall pipeline

The pipeline of the proposed SCAN is illustrated in Fig. 2. Inspired by DeeplabV3+ [4], our method uses an encoder–decoder structure for segmentation. The encoder combines multiscale information through ASPP [3], whereas the decoder combines the results of the encoder with low-level features to generate the feature map Smap. Smap is projected onto three branches to produce the semantic segmentation results of the cell, cell border, and table regions, which are represented by Scell, Sborder, and

Experiments

We tested our method on the S-TSR datasets WTW and TAL_OCR_TABLE, and the SS-TSR dataset ICDAR2013. We compared our SCAN to several state-of-the-art methods. We also conducted ablation experiments using this method to explore the robustness of our SCAN in sub-scenarios and the effectiveness of the SCM.

Conclusions and future work

We have proposed the novel SCAN to tackle the natural scene TSR problem. Our method combines the position and structure information of the cells in the table and solves the main weaknesses of existing methods, including the inaccurate geometric prediction of curved tables and processing of unaligned table structures. The experiments revealed that our method is robust in many challenging table scenarios, thereby providing a new concept for natural scene table structure recognition.

There are

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is supported in part by GD-NSF (No. 2021A1 515011870), NSFC (Grant no. 61771199), Zhuhai Industry Core and Key Technology Research Project (No. 222000 4002350), The Major Key Project of PCL (PCL2021A12) and the Science and Technology Foundation of Guangzhou Huangpu Development District (Grant 2020GH17 ).

References (35)

  • T. Contributors, TAL_OCR_TABLE:a scene table structure recognition benchmark, 2021,...
  • P.-T. De Boer et al.

    A tutorial on the cross-entropy method

    Ann. Oper. Res.

    (2005)
  • I.A. Doush et al.

    Detecting and recognizing tables in spreadsheets

    2010 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, DAS 2010

    (2010)
  • M. Göbel et al.

    A methodology for evaluating algorithms for table understanding in PDF documents

    2012 Proceedings of the 2012 ACM symposium on Document engineering, DocEng 2012

    (2012)
  • M. Göbel et al.

    ICDAR 2013 table competition

    2013 12th International Conference on Document Analysis and Recognition, ICDAR 2013

    (2013)
  • K. Itonori

    Table structure recognition based on textblock arrangement and ruled line position

    1993 Proceedings of 2nd International Conference on Document Analysis and Recognition, ICDAR 1993

    (1993)
  • T.G. Kieninger

    Table structure recognition based on robust block segmentation

    Document Recognition V

    (1998)
  • Cited by (4)

    Editor: Sudeep Sarkar

    View full text