RCAM-Transformer: A Novel Approach to Table Reconstruction Using Row-Column Attention Mechanism

Guo, Zezhong; Zhang, Yongjian; Chen, Shibo; Wei, Chiching

doi:10.1007/978-3-031-70442-0_7

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14994))

Included in the following conference series:

International Workshop on Document Analysis Systems

296 Accesses

Abstract

Table reconstruction, a critical task in the field of table structure recognition (TSR), plays a vital role in various domains, such as data mining, machine learning, and information retrieval. While many existing TSR methods employ transformer-based models with generally impressive performance, a gap remains in transformer models specifically designed to handle the distinct attributes of table rows and columns. Moreover, there is a lack of robust table reconstruction strategies based on object detection models. To address these issues, we introduce the Row-Column Attention Mechanism (RCAM). When combined with a transformer model and integrated with partial global attention, it forms the RCAM-Transformer. This model is tailored to effectively process the unique properties of tabular data. In addition, we have developed a novel table reconstruction strategy that leverages object detection models, which improves the recognition and treatment of tabular data. Our experiments, conducted using the PubTables-1M and FinTabNet dataset, along with our self-constructed Annual Report TableSet, not only validated the effectiveness of the RCAM but also demonstrated the improved accuracy of table reconstruction with the use of our RCAM-Transformer. Such outcomes highlight the potential of the RCAM-Transformer to advance table extraction in various fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

End to End Table Transformer

TableStrRec: framework for table structure recognition in data sheet images

Article 08 September 2023

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

Notes

1.
When the i-th patch and the j-th patch are in the same row, the values of i and j, when divided by w and rounded down, are identical. Conversely, when the i-th patch and the j-th patch are in the same column, the division of i and j by w results in congruent remainders.

References

Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)
Nassar, A., Livathinos, N., Lysak, M., Staar, P.: Tableformer: table structure understanding with transformers. arXiv e-prints (2022)
Google Scholar
Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: an accurate table structure recognizer. Pattern Recogn. 126, 108565 (2022)
Article Google Scholar
Lin, W., et al.: TSRFormer: table structure recognition with transformers. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6473–6482 (2022)
Google Scholar
Ly, N.T., Takasu, A., Nguyen, P., Takeda, H.: Rethinking image-based table recognition using weakly supervised methods. arXiv preprint arXiv:2303.07641 (2023)
Ly, N.T., Takasu, A.: An end-to-end multi-task learning model for image-based table recognition. arXiv preprint arXiv:2303.08648 (2023)
Beltagy, I., Peters, M.E., Cohan, A.: LongFormer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)
Google Scholar
Qiao, L., et al.: LGPMA: complicated table structure recognition with local and global pyramid mask alignment. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 99–114. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_7
Chapter Google Scholar
Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5
Chapter Google Scholar
Long, R., et al.: Parsing table structures in the wild. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 944–952 (2021)
Google Scholar
Hashmi, K.A., Stricker, D., Liwicki, M., Afzal, M.N., Afzal, M.Z.: Guided table structure recognition through anchor optimization. IEEE Access 9, 113521–113534 (2021)
Article Google Scholar
Ye, J., et al.: PingAn-VCGroup’s solution for ICDAR 2021 competition on scientific literature parsing task B: table recognition to HTML. arXiv preprint arXiv:2105.01848 (2021)
Ma, C., Lin, W., Sun, L., Huo, Q.: Robust table detection and structure recognition from heterogeneous document images. Pattern Recogn. 133, 109006 (2023)
Article Google Scholar
Smock, B., Pesala, R., Abraham, R.: PubTables-1M: towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4634–4642 (2022)
Google Scholar
Xing, H., et al.: Lore: logical location regression network for table structure recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2992–3000 (2023)
Google Scholar
Long, R., et al.: Lore++: logical location regression network for table structure recognition with pre-training. arXiv preprint arXiv:2401.01522 (2024)
Deng, Y., Kanervisto, A., Ling, J., Rush, A.M.: Image-to-markup generation with coarse-to-fine attention. In: International Conference on Machine Learning, pp. 980–989. PMLR (2017)
Google Scholar
Lysak, M., Nassar, A., Livathinos, N., Auer, C., Staar, P.: Optimized table tokenization for table structure recognition. arXiv preprint arXiv:2305.03393 (2023)
Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 142–147. IEEE (2019)
Google Scholar
Li, Y., Huang, Z., Yan, J., Zhou, Y., Ye, F., Liu, X.: GFTE: graph-based financial table extraction. In: Del Bimbo, A., et al. (eds.) ICPR 2021. LNCS, vol. 12662, pp. 644–658. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-68790-8_50
Chapter Google Scholar
Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: TGRNet: a table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1295–1304 (2021)
Google Scholar
Liu, H., Li, X., Liu, B., Jiang, D., Liu, Y., Ren, B.: Neural collaborative graph machines for table structure recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4533–4542 (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth $16\times 16$ words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Duda, R.O., Hart, P.E.: Use of the Hough transformation to detect lines and curves in pictures. Commun. ACM 15, 11–15 (1972)
Article Google Scholar
Zheng, X., Burdick, D., Popa, L., Zhong, P., Wang, N.X.R.: Global table extractor (GTE): a framework for joint table identification and cell structure recognition using visual context. In: Winter Conference for Applications in Computer Vision (WACV) (2021)
Google Scholar
Smock, B., Pesala, R., Abraham, R.: Aligning benchmark datasets for table structure recognition (2023)
Google Scholar
Smock, B., Pesala, R., Abraham, R.: GriTS: grid table similarity metric for table structure recognition. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds.) ICDAR 2023. LNCS, vol. 14191, pp. 535–549. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-41734-4_33
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Foxit Software, Fremont, CA, 94538, USA
Zezhong Guo, Yongjian Zhang, Shibo Chen & Chiching Wei

Authors

Zezhong Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yongjian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shibo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chiching Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zezhong Guo .

Editor information

Editors and Affiliations

University of West Attica, Egaleo, Greece
Giorgos Sfikas
National Technical University of Athens, Zografou, Greece
George Retsinas

A Appendix

This appendix presents details of table reconstruction. Generally, we require the results of the object detection model, that is, accurate coordinates and attributes of the cells. We then proceed to the post-processing steps for table reconstruction. Several details are worth discussing further. One such detail is that the object detection model often outputs overlapping rows or columns, even if the confidence limit of the output is increased. Addressing this necessitates the design of a reasonable IoU to filter these overlaps, because failing to do so would result in redundant table lines in the final table reconstruction results. Another detail pertains to the results of the Hough Line Transform. Interference is often encountered, such as small illustrations, colored backgrounds, or closely connected characters, all of which generate noise. This necessitates the design of more reasonable image morphological operations and subsequent filtering to remove horizontal and vertical lines detected via Hough Line Transform that account for less than a certain proportion of the total length and width of the table. Moreover, blank rows may appear after the results from the model recognition are aligned with Hough Line Transform. Although these factors do not affect the table structure, they can influence the downstream tasks. Therefore, to remove blank rows, it is necessary to lock the results of Hough Line Transform and remove the horizontal lines below them. All the examples shown in Fig. 7 have undergone blank-row processing. During the merging of cross-cells, headers or non-headers can be selectively merged according to the cell attributes. However, we cannot effectively merge areas without text, such as the top-left corners of Figs. 4d, 7a, 7c, 7e, 7g, 7i, and 7k. If we let the model recognize a blank area as a spanning cell, this category of recognition would become unstable.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, Z., Zhang, Y., Chen, S., Wei, C. (2024). RCAM-Transformer: A Novel Approach to Table Reconstruction Using Row-Column Attention Mechanism. In: Sfikas, G., Retsinas, G. (eds) Document Analysis Systems. DAS 2024. Lecture Notes in Computer Science, vol 14994. Springer, Cham. https://doi.org/10.1007/978-3-031-70442-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-70442-0_7
Published: 11 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70441-3
Online ISBN: 978-3-031-70442-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

RCAM-Transformer: A Novel Approach to Table Reconstruction Using Row-Column Attention Mechanism

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

End to End Table Transformer

TableStrRec: framework for table structure recognition in data sheet images

TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships