Skip to main content

An End-to-End Local Attention Based Model for Table Recognition

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Abstract

Recently, due to the rapid development of deep learning, especially Transformer, many Transformer-based methods have been studied and proven to be very powerful for table recognition. However, Transformer-based models usually struggle to process big tables due to the limitation of their global attention mechanism. In this paper, we propose a local attention mechanism to address the limitation of the global attention mechanism. We also present an end-to-end local attention-based model for recognizing both table structure and table cell content from a table image. The proposed model consists of four main components: 1) an encoder for feature extraction; 2) the three decoders for the three sub-tasks of the table recognition problem. In the experiments, we evaluate the performance of the proposed model and the effectiveness of the local attention mechanism on the two large-scale datasets: PubTabNet and FinTabNet. The experiment results show that the proposed model outperforms the state-of-the-art methods on all benchmark datasets. Furthermore, we demonstrate the effectiveness of the local attention mechanism for table recognition, especially for big table recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jimeno Yepes, A., Zhong, P., Burdick, D.: ICDAR 2021 competition on scientific literature parsing. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition - ICDAR 2021. Lecture Notes in Computer Science, vol. 12824, pp. 605–617. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86337-1_40

    Chapter  Google Scholar 

  2. Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds.) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34/TABLES/3

  3. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: a benchmark dataset for table detection and recognition (2019). https://doi.org/10.48550/arxiv.1903.01949

  4. Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, pp. 894–901 (2019). https://doi.org/10.1109/ICDAR.2019.00148

  5. Kayal, P., Anand, M., Desai, H., Singh, M.: ICDAR 2021 competition on scientific table image recognition to LaTeX. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition - ICDAR 2021. Lecture Notes in Computer Science, vol. 12824, pp. 754–766. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86337-1_50

    Chapter  Google Scholar 

  6. Itonori, K.: Table structure recognition based on textblock arrangement and ruled line position. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR ’93), pp. 765–768 (1993). https://doi.org/10.1109/ICDAR.1993.395625

  7. Kieninger, T.G.: Table structure recognition based on robust block segmentation, vol. 3305, pp. 22–32 (1998). https://doi.org/10.1117/12.304642

  8. Wang, Y., Phillips, I.T., Haralick, R.M.: Table structure understanding and its performance evaluation. Pattern Recognit. 37, 1479–1497 (2004). https://doi.org/10.1016/J.PATCOG.2004.01.012

    Article  Google Scholar 

  9. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, June 2020, pp. 2439–2447 (2020). https://doi.org/10.48550/arxiv.2004.12629

  10. Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-Up Cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds.) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5/FIGURES/8

  11. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: Proceedings of the International Conference on Document Analysis and Recognition, ICDAR, vol. 1, pp. 1162–1167 (2017). https://doi.org/10.1109/ICDAR.2017.192

  12. Qiao, L., et al.: LGPMA: complicated table structure recognition with local and global pyramid mask alignment. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol. 12821, pp. 99–114. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_7/TABLES/4

  13. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems. Neural information processing systems foundation, pp. 5999–6009 (2017)

    Google Scholar 

  14. Nassar, A., Livathinos, N., Lysak, M., Staar, P.: TableFormer: table structure understanding with transformers (2022). https://doi.org/10.48550/arxiv.2203.01017

  15. Ye, J., et al.: PingAn-VCGroup’s solution for ICDAR 2021 competition on scientific literature parsing task B: table recognition to HTML (2021). https://doi.org/10.48550/arxiv.2105.01848

  16. Ly, N.T., Takasu, A., Nguyen, P., Takeda, H.: Rethinking image-based table recognition using weakly supervised methods. In: In Proceedings of the 12th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2023), Lisbon, Portugal (2023)

    Google Scholar 

  17. Ly, N.T., Takasu, A.: An end-to-end multi-task learning model for image-based table recognition. In: In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2023) (2023)

    Google Scholar 

  18. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer. arXiv (2020)

    Google Scholar 

  19. Kovaleva, O., Romanov, A., Rogers, A., Rumshisky, A.: Revealing the dark secrets of BERT. In: EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference, pp. 4365–4374 (2019). https://doi.org/10.18653/V1/D19-1445

  20. Sperber, M., Niehues, J., Neubig, G., Stüker, S., Waibel, A.: Self-attentional acoustic models. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, September 2018, pp. 3723–3727 (2018). https://doi.org/10.21437/INTERSPEECH.2018-1910

  21. Shaw, P., Uszkoreit, J., Vaswani, A.: Self-Attention with relative position representations. In: NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, vol. 2, pp. 464–468 (2018). https://doi.org/10.18653/V1/N18-2074

  22. Lu, N., et al.: MASTER: multi-aspect non-local network for scene text recognition. Pattern Recognit. 117, 107980 (2021). https://doi.org/10.1016/J.PATCOG.2021.107980

    Article  Google Scholar 

  23. Ly, N.T., Nguyen, H.T., Nakagawa, M.: 2D self-attention convolutional recurrent network for offline handwritten text recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol. 12821, pp. 191–204. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_13/COVER

  24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015). https://doi.org/10.48550/arxiv.1506.01497

    Article  Google Scholar 

  25. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 07–12 June 2015, pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965

  26. Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: an accurate table structure recognizer. Pattern Recognit. 126, 108565 (2022). https://doi.org/10.1016/J.PATCOG.2022.108565

    Article  Google Scholar 

  27. Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: 34th International Conference on Machine Learning, ICML 2017, vol. 3, pp. 1631–1640 (2016). https://doi.org/10.48550/arxiv.1609.04938

  28. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): a framework for joint table identification and cell structure recognition using visual context. In: 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 697–706 (2021). https://doi.org/10.1109/WACV48630.2021.00074

  29. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778. IEEE (2016). https://doi.org/10.1109/CVPR.2016.90

  30. MMCV Contributors: {MMCV: OpenMMLab} Computer Vision Foundation (2018). https://github.com/open-mmlab/mmcv

Download references

Acknowledgments

This work was supported by the Cross-ministerial Strategic Innovation Promotion Program (SIP) Second Phase and “Big-data and AI-enabled Cyberspace Technologies” by New Energy and Industrial Technology Development Organization (NEDO) and partially supported by ROIS NII Open Collaborative Research 2023(23S0102).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nam Tuan Ly .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ly, N.T., Takasu, A. (2023). An End-to-End Local Attention Based Model for Table Recognition. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14188. Springer, Cham. https://doi.org/10.1007/978-3-031-41679-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41679-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41678-1

  • Online ISBN: 978-3-031-41679-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics