Abstract
Table structure recognition is a challenging task due to complex background and various styles of tables. Existing methods address this challenge by exploring adjacency relationship prediction, image-to-text generation, logical position prediction, etc. However, these methods either adopt Graph Convolutional Network (GCN) structures, which mainly focus on the local context information, or Multi-Head Attention (MHA) structures, which mainly focus on the global context information. Both of them ignore the correlation between local and global features. In this paper, we propose a Local-Relationship-Aware Transformer Network (LRATNet) for table structure recognition. LRATNet constructs a robust correlation between local and global information using the LRAT module. The LRAT model has been adapted into three distinct variants: Row-LRAT, Col-LRAT, and Spa-LRAT. These variants are designed to emphasize specific aspects of information: row information, column information, and spatial information, respectively. This is achieved through the exploration of different adjacency relationships. This improves the performance of logical location prediction. Additionally, we have developed a new loss function called Lstage, which is designed to improve accuracy in predicting logical positions. Experimental results demonstrate that our method outperforms existing approaches on three public datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)
Clinchant, S., Déjean, H., Meunier, J.L., Lang, E.M., Kleber, F.: Comparing machine learning approaches for table recognition in historical register books. In: IAPR International Workshop on Document Analysis Systems (DAS), pp. 133–138 (2018)
Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901 (2019)
Göbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 1449–1453 (2013)
Hirayama, Y.: A method for table structure analysis using DP matching. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 583–586 (1995)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, Y., et al.: Improving table structure recognition with visual-alignment sequential coordinate modeling. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11134–11143 (2023)
Kieninger, T., Dengel, A.: The t-recs table recognition and analysis system. In: Lee, S.-W., Nakano, Y. (eds.) DAS 1998. LNCS, vol. 1655, pp. 255–270. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48172-9_21
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: table benchmark for image-based table detection and recognition. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 1918–1925 (2020)
Li, Y., Huang, Z., Yan, J., Zhou, Y., Ye, F., Liu, X.: GFTE: graph-based financial table extraction. In: Del Bimbo, A., Cucchiara, R., Sclaroff, S., Farinella, G.M., Mei, T., Bertini, M., Escalante, H.J., Vezzani, R. (eds.) ICPR 2021, Part II. LNCS, vol. 12662, pp. 644–658. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_50
Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Neural collaborative graph machines for table structure recognition. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4533–4542 (2022)
Liu, H., et al.: Show, read and reason: table structure recognition with flexible context aggregator. In: ACM International Conference on Multimedia (ACM MM), pp. 1084–1092 (2021)
Long, R., et al.: Parsing table structures in the wild. In: International Conference on Computer Vision (ICCV), pp. 944–952 (2021)
Qasim, S.R., Kieseler, J., Iiyama, Y., Pierini, M.: Learning representations of irregular particle-detector geometry with distance-weighted graph networks. Eur. Phys. J. C 79(7), 1–11 (2019)
Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 142–147 (2019)
Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5
Tupaj, S., Shi, Z., Chang, C.H., Alam, H.: Extracting tabular information from text files. EECS Department, Tufts University, Medford, USA (1996)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Xing, H., et al.: LORE: logical location regression network for table structure recognition. In: Association for the Advancement of Artificial Intelligence Conference (AAAI), pp. 2992–3000 (2023)
Xue, W., Li, Q., Tao, D.: ReS2TIM: reconstruct syntactic structures from table images. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 749–755 (2019)
Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: TGRNet: a table graph reconstruction network for table structure recognition. In: International Conference on Computer Vision (ICCV), pp. 1295–1304 (2021)
Ying, C., et al.: Do transformers really perform bad for graph representation? arXiv preprint arXiv:2106.05234 (2019)
Yu, F., Wang, D., Shelhamer, E., Darrell, T.: Deep layer aggregation. In: International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2403–2412 (2018)
Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, G., Zhong, D., Xiong, Yj., Zhan, H. (2024). LRATNet: Local-Relationship-Aware Transformer Network for Table Structure Recognition. In: Rudinac, S., et al. MultiMedia Modeling. MMM 2024. Lecture Notes in Computer Science, vol 14555. Springer, Cham. https://doi.org/10.1007/978-3-031-53308-2_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-53308-2_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53307-5
Online ISBN: 978-3-031-53308-2
eBook Packages: Computer ScienceComputer Science (R0)