skip to main content
10.1145/3503161.3548038acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

TSRFormer: Table Structure Recognition with Transformers

Authors Info & Claims
Published:10 October 2022Publication History

ABSTRACT

We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images. Unlike previous methods, we formulate table separation line prediction as a line regression problem instead of an image segmentation problem and propose a new two-stage DETR based separator prediction approach, dubbed Sep arator RE gression TR ansformer (SepRETR), to predict separation lines from table images directly. To make the two-stage DETR framework work efficiently and effectively for the separation line prediction task, we propose two improvements: 1) A prior-enhanced matching strategy to solve the slow convergence issue of DETR; 2) A new cross attention module to sample features from a high-resolution convolutional feature map directly so that high localization accuracy is achieved with low computational cost. After separation line prediction, a simple relation network based cell merging module is used to recover spanning cells. With these new techniques, our TSRFormer achieves state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW. Furthermore, we have validated the robustness of our approach to tables with complex structures, borderless cells, large blank spaces, empty or spanning cells as well as distorted or even curved shapes on a more challenging real-world in-house dataset.

Skip Supplemental Material Section

Supplemental Material

MM22-fp1207.mp4

mp4

222.8 MB

References

  1. Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian- Ling Mao. 2019. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019).Google ScholarGoogle Scholar
  3. Yuntian Deng, David Rosenberg, and Gideon Mann. 2019. Challenges in endto- end neural scientific table recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 894--901.Google ScholarGoogle Scholar
  4. Liangcai Gao, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. 2019. ICDAR 2019 competition on table detection and recognition (cTDaR). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1510--1515.Google ScholarGoogle Scholar
  5. Peng Gao, Minghang Zheng, XiaogangWang, Jifeng Dai, and Hongsheng Li. 2021. Fast convergence of detr with spatially modulated co-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3621--3630.Google ScholarGoogle ScholarCross RefCross Ref
  6. Max Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2012. A methodology for evaluating algorithms for table understanding in PDF documents. In Proceedings of the 2012 ACM symposium on Document engineering. 45--48.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Max Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2013. ICDAR 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition. IEEE, 1449--1453.Google ScholarGoogle Scholar
  8. Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, Muhammad Noman Afzal, and Muhammad Zeshan Afzal. 2021. Guided table structure recognition through anchor optimization. IEEE Access 9 (2021), 113521--113534.Google ScholarGoogle ScholarCross RefCross Ref
  9. Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.Google ScholarGoogle ScholarCross RefCross Ref
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90Google ScholarGoogle Scholar
  11. Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. 2021. PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex. arXiv preprint arXiv:2105.01846 (2021).Google ScholarGoogle Scholar
  12. Katsuhiko Itonori. 1993. Table structure recognition based on textblock arrangement and ruled line position. In ICDAR. 765--768.Google ScholarGoogle Scholar
  13. Saqib Ali Khan, Syed Muhammad Daniyal Khalid, Muhammad Ali Shahzad, and Faisal Shafait. 2019. Table structure extraction with bi-directional gated recurrent unit networks. In ICDAR. 1366--1371.Google ScholarGoogle Scholar
  14. Thomas Kieninger and Andreas Dengel. 1998. The t-recs table recognition and analysis system. In International Workshop on Document Analysis Systems. Springer, 255--270.Google ScholarGoogle Scholar
  15. A Laurentini and P Viada. 1992. Identifying and understanding tabular material in compound documents. In International Conference on Pattern Recognition. IEEE COMPUTER SOCIETY PRESS, 405--405.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV). 734--750.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M Ni, and Lei Zhang. 2022. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. arXiv preprint arXiv:2203.01305 (2022).Google ScholarGoogle Scholar
  18. Minghao Li, Lei Cui, Shaohan Huang, FuruWei, Ming Zhou, and Zhoujun Li. 2020. Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of The 12th language resources and evaluation conference. 1918--1925.Google ScholarGoogle Scholar
  19. Xiao-Hui Li, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2021. Adaptive Scaling for Archival Table Structure Recognition. In International Conference on Document Analysis and Recognition. Springer, 80--95.Google ScholarGoogle Scholar
  20. Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. 2021. GFTE: graph-based financial table extraction. In International Conference on Pattern Recognition. Springer, 644--658.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarGoogle ScholarCross RefCross Ref
  23. Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren, and Rongrong Ji. 2021. Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator. In Proceedings of the 29th ACM International Conference on Multimedia. 1084--1092.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. 2022. DAB-DETR: Dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329 (2022).Google ScholarGoogle Scholar
  25. Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.Google ScholarGoogle ScholarCross RefCross Ref
  26. Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. 2021. Parsing Table Structures in the Wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 944--952.Google ScholarGoogle ScholarCross RefCross Ref
  27. Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).Google ScholarGoogle Scholar
  28. Chixiang Ma,Weihong Lin, Lei Sun, and Qiang Huo. 2022. Robust Table Detection and Structure Recognition from Heterogeneous Document Images. arXiv preprint arXiv:2203.09056 (2022).Google ScholarGoogle Scholar
  29. Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. 2021. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3651--3660.Google ScholarGoogle ScholarCross RefCross Ref
  30. Hwee Tou Ng, Chung Yong Lim, and Jessica Li Teng Koo. 1999. Learning to recognize tables in free text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 443--450.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. 2019. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 128--133.Google ScholarGoogle ScholarCross RefCross Ref
  32. Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2018. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarGoogle ScholarCross RefCross Ref
  33. Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. 2020. CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 572--573.Google ScholarGoogle ScholarCross RefCross Ref
  34. Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. 2019. Rethinking table recognition using graph neural networks. In ICDAR. 142--147.Google ScholarGoogle Scholar
  35. Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang, Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, and Fei Wu. 2021. LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment. In ICDAR.Google ScholarGoogle Scholar
  36. Sachin Raja, Ajoy Mondal, and CV Jawahar. 2020. Table structure recognition using top-down and bottom-up cues. In European Conference on Computer Vision. 70--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Sachin Raja, Ajoy Mondal, and CV Jawahar. 2022. Visual Understanding of Complex Table Structures from Document Images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2299--2308.Google ScholarGoogle ScholarCross RefCross Ref
  38. Roya Rastan, Hye-Young Paik, and John Shepherd. 2019. Texus: A unified framework for extracting and understanding tables in pdf documents. Information Processing & Management 56, 3 (2019), 895--918.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. 2017. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In ICDAR, Vol. 1. 1162--1167.Google ScholarGoogle Scholar
  40. Alexey Shigarov, Andrey Mikhailov, and Andrey Altaev. 2016. Configurable table structure recognition in untagged PDF documents. In Proceedings of the 2016 ACM symposium on document engineering. 119--122.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training regionbased object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition. 761--769.Google ScholarGoogle ScholarCross RefCross Ref
  42. Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. 2019. DeepTabStR: deep learning based table structure recognition. In ICDAR. 1403--1409.Google ScholarGoogle Scholar
  43. Shoaib Ahmed Siddiqui, Pervaiz Iqbal Khan, Andreas Dengel, and Sheraz Ahmed. 2019. Rethinking semantic segmentation for table structure recognition in documents. In ICDAR. 1397--1402.Google ScholarGoogle Scholar
  44. Zhiqing Sun, Shengcao Cao, Yiming Yang, and Kris M Kitani. 2021. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3611--3620.Google ScholarGoogle ScholarCross RefCross Ref
  45. Chris Tensmeyer, Vlad I. Morariu, Brian Price, Scott Cohen, and Tony Martinez. 2019. Deep Splitting and Merging for Table Structure Decomposition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 114--121. https://doi.org/10.1109/ICDAR.2019.00027Google ScholarGoogle Scholar
  46. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  47. Yalin Wang, Ihsin T Phillips, and Robert M Haralick. 2004. Table structure understanding and its performance evaluation. Pattern recognition 37, 7 (2004), 1479--1497.Google ScholarGoogle Scholar
  48. Yingming Wang, Xiangyu Zhang, Tong Yang, and Jian Sun. 2021. Anchor detr: Query design for transformer-based detector. arXiv preprint arXiv:2109.07107 (2021).Google ScholarGoogle Scholar
  49. Wenyuan Xue, Qingyong Li, and Dacheng Tao. 2019. ReS2TIM: Reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 749--755.Google ScholarGoogle ScholarCross RefCross Ref
  50. Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. 2021. TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1295--1304.Google ScholarGoogle ScholarCross RefCross Ref
  51. Zhuyu Yao, Jiangbo Ai, Boxun Li, and Chi Zhang. 2021. Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318 (2021).Google ScholarGoogle Scholar
  52. Ji Zhang, Mohamed Elhoseiny, Scott Cohen,Walter Chang, and Ahmed Elgammal. 2017. Relationship proposal networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5678--5686.Google ScholarGoogle ScholarCross RefCross Ref
  53. Zhenrong Zhang, Jianshu Zhang, Jun Du, and Fengren Wang. 2022. Split, embed and merge: An accurate table structure recognizer. Pattern Recognition (2022), 108565.Google ScholarGoogle Scholar
  54. Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, and Nancy Xin RuWang. 2021. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 697--706.Google ScholarGoogle ScholarCross RefCross Ref
  55. Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. 2020. Image-based table recognition: data, model, and evaluation. In European Conference on Computer Vision. Springer, 564--580.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations.Google ScholarGoogle Scholar
  57. Yajun Zou and Jinwen Ma. 2020. A deep semantic segmentation model for imagebased table structure recognition. In 2020 15th IEEE International Conference on Signal Processing (ICSP), Vol. 1. IEEE, 274--280.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. TSRFormer: Table Structure Recognition with Transformers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '22: Proceedings of the 30th ACM International Conference on Multimedia
        October 2022
        7537 pages
        ISBN:9781450392037
        DOI:10.1145/3503161

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 October 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader