research-article

TSRFormer: Table Structure Recognition with Transformers

Authors:
Weihong Lin

Microsoft Research Aisa, Beijing, China

Microsoft Research Aisa, Beijing, China
View Profile

,
Zheng Sun

University of Chinese Academy of Sciences & CASIA, Beijing, China

University of Chinese Academy of Sciences & CASIA, Beijing, China
View Profile

,
Chixiang Ma

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Mingze Li

Shanghai Jiao Tong University, Shanghai, China

Shanghai Jiao Tong University, Shanghai, China
View Profile

,
Jiawei Wang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China
View Profile

,
Lei Sun

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

,
Qiang Huo

Microsoft Research Asia, Beijing, China

Microsoft Research Asia, Beijing, China
View Profile

MM '22: Proceedings of the 30th ACM International Conference on MultimediaOctober 2022Pages 6473–6482https://doi.org/10.1145/3503161.3548038

Published:10 October 2022Publication History

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Pages 6473–6482

ABSTRACT

We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images. Unlike previous methods, we formulate table separation line prediction as a line regression problem instead of an image segmentation problem and propose a new two-stage DETR based separator prediction approach, dubbed Sep arator RE gression TR ansformer (SepRETR), to predict separation lines from table images directly. To make the two-stage DETR framework work efficiently and effectively for the separation line prediction task, we propose two improvements: 1) A prior-enhanced matching strategy to solve the slow convergence issue of DETR; 2) A new cross attention module to sample features from a high-resolution convolutional feature map directly so that high localization accuracy is achieved with low computational cost. After separation line prediction, a simple relation network based cell merging module is used to recover spanning cells. With these new techniques, our TSRFormer achieves state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet and WTW. Furthermore, we have validated the robustness of our approach to tables with complex structures, borderless cells, large blank spaces, empty or spanning cells as well as distorted or even curved shapes on a more challenging real-world in-house dataset.

Supplemental Material

MM22-fp1207.mp4

mp4

222.8 MB

Download

References

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213--229.Google ScholarDigital Library
Zewen Chi, Heyan Huang, Heng-Da Xu, Houjin Yu, Wanxuan Yin, and Xian- Ling Mao. 2019. Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019).Google Scholar
Yuntian Deng, David Rosenberg, and Gideon Mann. 2019. Challenges in endto- end neural scientific table recognition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 894--901.Google Scholar
Liangcai Gao, Yilun Huang, Hervé Déjean, Jean-Luc Meunier, Qinqin Yan, Yu Fang, Florian Kleber, and Eva Lang. 2019. ICDAR 2019 competition on table detection and recognition (cTDaR). In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 1510--1515.Google Scholar
Peng Gao, Minghang Zheng, XiaogangWang, Jifeng Dai, and Hongsheng Li. 2021. Fast convergence of detr with spatially modulated co-attention. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3621--3630.Google ScholarCross Ref
Max Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2012. A methodology for evaluating algorithms for table understanding in PDF documents. In Proceedings of the 2012 ACM symposium on Document engineering. 45--48.Google ScholarDigital Library
Max Göbel, Tamir Hassan, Ermelinda Oro, and Giorgio Orsi. 2013. ICDAR 2013 table competition. In 2013 12th International Conference on Document Analysis and Recognition. IEEE, 1449--1453.Google Scholar
Khurram Azeem Hashmi, Didier Stricker, Marcus Liwicki, Muhammad Noman Afzal, and Muhammad Zeshan Afzal. 2021. Guided table structure recognition through anchor optimization. IEEE Access 9 (2021), 113521--113534.Google ScholarCross Ref
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision. 2961--2969.Google ScholarCross Ref
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778. https://doi.org/10.1109/CVPR.2016.90Google Scholar
Yelin He, Xianbiao Qi, Jiaquan Ye, Peng Gao, Yihao Chen, Bingcong Li, Xin Tang, and Rong Xiao. 2021. PingAn-VCGroup's Solution for ICDAR 2021 Competition on Scientific Table Image Recognition to Latex. arXiv preprint arXiv:2105.01846 (2021).Google Scholar
Katsuhiko Itonori. 1993. Table structure recognition based on textblock arrangement and ruled line position. In ICDAR. 765--768.Google Scholar
Saqib Ali Khan, Syed Muhammad Daniyal Khalid, Muhammad Ali Shahzad, and Faisal Shafait. 2019. Table structure extraction with bi-directional gated recurrent unit networks. In ICDAR. 1366--1371.Google Scholar
Thomas Kieninger and Andreas Dengel. 1998. The t-recs table recognition and analysis system. In International Workshop on Document Analysis Systems. Springer, 255--270.Google Scholar
A Laurentini and P Viada. 1992. Identifying and understanding tabular material in compound documents. In International Conference on Pattern Recognition. IEEE COMPUTER SOCIETY PRESS, 405--405.Google ScholarCross Ref
Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV). 734--750.Google ScholarDigital Library
Feng Li, Hao Zhang, Shilong Liu, Jian Guo, Lionel M Ni, and Lei Zhang. 2022. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising. arXiv preprint arXiv:2203.01305 (2022).Google Scholar
Minghao Li, Lei Cui, Shaohan Huang, FuruWei, Ming Zhou, and Zhoujun Li. 2020. Tablebank: Table benchmark for image-based table detection and recognition. In Proceedings of The 12th language resources and evaluation conference. 1918--1925.Google Scholar
Xiao-Hui Li, Fei Yin, Xu-Yao Zhang, and Cheng-Lin Liu. 2021. Adaptive Scaling for Archival Table Structure Recognition. In International Conference on Document Analysis and Recognition. Springer, 80--95.Google Scholar
Yiren Li, Zheng Huang, Junchi Yan, Yi Zhou, Fan Ye, and Xianhui Liu. 2021. GFTE: graph-based financial table extraction. In International Conference on Pattern Recognition. Springer, 644--658.Google ScholarDigital Library
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117--2125.Google ScholarCross Ref
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision. 2980--2988.Google ScholarCross Ref
Hao Liu, Xin Li, Bing Liu, Deqiang Jiang, Yinsong Liu, Bo Ren, and Rongrong Ji. 2021. Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator. In Proceedings of the 29th ACM International Conference on Multimedia. 1084--1092.Google ScholarDigital Library
Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang. 2022. DAB-DETR: Dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329 (2022).Google Scholar
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.Google ScholarCross Ref
Rujiao Long, Wen Wang, Nan Xue, Feiyu Gao, Zhibo Yang, Yongpan Wang, and Gui-Song Xia. 2021. Parsing Table Structures in the Wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 944--952.Google ScholarCross Ref
Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).Google Scholar
Chixiang Ma,Weihong Lin, Lei Sun, and Qiang Huo. 2022. Robust Table Detection and Structure Recognition from Heterogeneous Document Images. arXiv preprint arXiv:2203.09056 (2022).Google Scholar
Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang. 2021. Conditional detr for fast training convergence. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3651--3660.Google ScholarCross Ref
Hwee Tou Ng, Chung Yong Lim, and Jessica Li Teng Koo. 1999. Learning to recognize tables in free text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics. 443--450.Google ScholarDigital Library
Shubham Singh Paliwal, D Vishwanath, Rohit Rahul, Monika Sharma, and Lovekesh Vig. 2019. Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 128--133.Google ScholarCross Ref
Xingang Pan, Jianping Shi, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2018. Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence.Google ScholarCross Ref
Devashish Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, and Kavita Sultanpure. 2020. CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 572--573.Google ScholarCross Ref
Shah Rukh Qasim, Hassan Mahmood, and Faisal Shafait. 2019. Rethinking table recognition using graph neural networks. In ICDAR. 142--147.Google Scholar
Liang Qiao, Zaisheng Li, Zhanzhan Cheng, Peng Zhang, Shiliang Pu, Yi Niu, Wenqi Ren, Wenming Tan, and Fei Wu. 2021. LGPMA: Complicated Table Structure Recognition with Local and Global Pyramid Mask Alignment. In ICDAR.Google Scholar
Sachin Raja, Ajoy Mondal, and CV Jawahar. 2020. Table structure recognition using top-down and bottom-up cues. In European Conference on Computer Vision. 70--86.Google ScholarDigital Library
Sachin Raja, Ajoy Mondal, and CV Jawahar. 2022. Visual Understanding of Complex Table Structures from Document Images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2299--2308.Google ScholarCross Ref
Roya Rastan, Hye-Young Paik, and John Shepherd. 2019. Texus: A unified framework for extracting and understanding tables in pdf documents. Information Processing & Management 56, 3 (2019), 895--918.Google ScholarDigital Library
Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, and Sheraz Ahmed. 2017. Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In ICDAR, Vol. 1. 1162--1167.Google Scholar
Alexey Shigarov, Andrey Mikhailov, and Andrey Altaev. 2016. Configurable table structure recognition in untagged PDF documents. In Proceedings of the 2016 ACM symposium on document engineering. 119--122.Google ScholarDigital Library
Abhinav Shrivastava, Abhinav Gupta, and Ross Girshick. 2016. Training regionbased object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition. 761--769.Google ScholarCross Ref
Shoaib Ahmed Siddiqui, Imran Ali Fateh, Syed Tahseen Raza Rizvi, Andreas Dengel, and Sheraz Ahmed. 2019. DeepTabStR: deep learning based table structure recognition. In ICDAR. 1403--1409.Google Scholar
Shoaib Ahmed Siddiqui, Pervaiz Iqbal Khan, Andreas Dengel, and Sheraz Ahmed. 2019. Rethinking semantic segmentation for table structure recognition in documents. In ICDAR. 1397--1402.Google Scholar
Zhiqing Sun, Shengcao Cao, Yiming Yang, and Kris M Kitani. 2021. Rethinking transformer-based set prediction for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3611--3620.Google ScholarCross Ref
Chris Tensmeyer, Vlad I. Morariu, Brian Price, Scott Cohen, and Tony Martinez. 2019. Deep Splitting and Merging for Table Structure Decomposition. In 2019 International Conference on Document Analysis and Recognition (ICDAR). 114--121. https://doi.org/10.1109/ICDAR.2019.00027Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Yalin Wang, Ihsin T Phillips, and Robert M Haralick. 2004. Table structure understanding and its performance evaluation. Pattern recognition 37, 7 (2004), 1479--1497.Google Scholar
Yingming Wang, Xiangyu Zhang, Tong Yang, and Jian Sun. 2021. Anchor detr: Query design for transformer-based detector. arXiv preprint arXiv:2109.07107 (2021).Google Scholar
Wenyuan Xue, Qingyong Li, and Dacheng Tao. 2019. ReS2TIM: Reconstruct syntactic structures from table images. In 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, 749--755.Google ScholarCross Ref
Wenyuan Xue, Baosheng Yu, Wen Wang, Dacheng Tao, and Qingyong Li. 2021. TGRNet: A Table Graph Reconstruction Network for Table Structure Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1295--1304.Google ScholarCross Ref
Zhuyu Yao, Jiangbo Ai, Boxun Li, and Chi Zhang. 2021. Efficient detr: improving end-to-end object detector with dense prior. arXiv preprint arXiv:2104.01318 (2021).Google Scholar
Ji Zhang, Mohamed Elhoseiny, Scott Cohen,Walter Chang, and Ahmed Elgammal. 2017. Relationship proposal networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5678--5686.Google ScholarCross Ref
Zhenrong Zhang, Jianshu Zhang, Jun Du, and Fengren Wang. 2022. Split, embed and merge: An accurate table structure recognizer. Pattern Recognition (2022), 108565.Google Scholar
Xinyi Zheng, Douglas Burdick, Lucian Popa, Xu Zhong, and Nancy Xin RuWang. 2021. Global table extractor (gte): A framework for joint table identification and cell structure recognition using visual context. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 697--706.Google ScholarCross Ref
Xu Zhong, Elaheh ShafieiBavani, and Antonio Jimeno Yepes. 2020. Image-based table recognition: data, model, and evaluation. In European Conference on Computer Vision. Springer, 564--580.Google ScholarDigital Library
Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2021. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations.Google Scholar
Yajun Zou and Jinwen Ma. 2020. A deep semantic segmentation model for imagebased table structure recognition. In 2020 15th IEEE International Conference on Signal Processing (ICSP), Vol. 1. IEEE, 274--280.Google ScholarCross Ref

Index Terms

TSRFormer: Table Structure Recognition with Transformers
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Robust table structure recognition with dynamic queries enhanced detection transformer
Highlights
- We are the first to formulate separation line prediction as a line regression problem. To demonstrate the effectiveness of this new formulation, we propose a new DETR based separation line prediction model, dubbed DQ-DETR, to predict the ...
Abstract
We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognize the structures of complex tables with geometrical distortions from various table images. Unlike previous methods, we formulate table separation ...
Read More
Adaptive Scaling for Archival Table Structure Recognition
Document Analysis and Recognition – ICDAR 2021
Abstract
Table detection and structure recognition from archival document images remain challenging due to diverse table structures, complex document layouts, degraded image qualities and inconsistent table scales. In this paper, we propose an instance ...
Read More
Table Structure Recognition Using CoDec Encoder-Decoder
Document Analysis and Recognition – ICDAR 2021 Workshops
Abstract
Automated document analysis and parsing has been the focus of research since a long time. An important component of document parsing revolves around understanding tabular regions with respect to their structure identification, followed by precise ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
General Chairs:
João Magalhães
NOVA University of Lisbon, Portugal
,
Alberto del Bimbo
University of Florence, Italy
,
Shin'ichi Satoh
National Institute of Informatics, Japan
,
Nicu Sebe
University of Trento, Italy
,
Program Chairs:
Xavier Alameda-Pineda
Inria, Grenoble, France
,
Qin Jin
Renmin University of China, China
,
Vincent Oria
New Jersey Institute of Technology, USA
,
Laura Toni
University College London, UK
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 October 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
separation line regression
table structure recognition
two-stage detr
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate995of4,171submissions,24%
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 258
  Total Downloads
- Downloads (Last 12 months)165
- Downloads (Last 6 weeks)23
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

TSRFormer: Table Structure Recognition with Transformers

MM '22: Proceedings of the 30th ACM International Conference on Multimedia

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Robust table structure recognition with dynamic queries enhanced detection transformer

Adaptive Scaling for Archival Table Structure Recognition

Table Structure Recognition Using CoDec Encoder-Decoder