TDeLTA: A Light-Weight and Robust Table Detection Method Based on Learning Text Arrangement

Authors

  • Yang Fan Harbin Institute of Technology (Shenzhen)
  • Xiangping Wu Harbin Institute of Technology (Shenzhen) The Hong Kong Polytechnic University
  • Qingcai Chen Harbin Institute of Technology (Shenzhen) Peng Cheng Laboratory
  • Heng Li Harbin Institute of Technology (Shenzhen)
  • Yan Huang China Mobile Information Technology Co.,Ltd
  • Zhixiang Cai China Mobile Information Technology Co.,Ltd
  • Qitian Wu China Mobile Information Technology Co.,Ltd

DOI:

https://doi.org/10.1609/aaai.v38i2.27934

Keywords:

CV: Applications, CV: Representation Learning for Vision

Abstract

The diversity of tables makes table detection a great challenge, leading to existing models becoming more tedious and complex. Despite achieving high performance, they often overfit to the table style in training set, and suffer from significant performance degradation when encountering out-of-distribution tables in other domains. To tackle this problem, we start from the essence of the table, which is a set of text arranged in rows and columns. Based on this, we propose a novel, light-weighted and robust Table Detection method based on Learning Text Arrangement, namely TDeLTA. TDeLTA takes the text blocks as input, and then models the arrangement of them with a sequential encoder and an attention module. To locate the tables precisely, we design a text-classification task, classifying the text blocks into 4 categories according to their semantic roles in the tables. Experiments are conducted on both the text blocks parsed from PDF and extracted by open-source OCR tools, respectively. Compared to several state-of-the-art methods, TDeLTA achieves competitive results with only 3.1M model parameters on the large-scale public datasets. Moreover, when faced with the cross-domain data under the 0-shot setting, TDeLTA outperforms baselines by a large margin of nearly 7%, which shows the strong robustness and transferability of the proposed model.

Published

2024-03-24

How to Cite

Fan, Y., Wu, X., Chen, Q., Li, H., Huang, Y., Cai, Z., & Wu, Q. (2024). TDeLTA: A Light-Weight and Robust Table Detection Method Based on Learning Text Arrangement. Proceedings of the AAAI Conference on Artificial Intelligence, 38(2), 1670-1678. https://doi.org/10.1609/aaai.v38i2.27934

Issue

Section

AAAI Technical Track on Computer Vision I