Formerge: Recover Spanning Cells in Complex Table Structure Using Transformer Network

Nguyen, Nam Quan; Le, Anh Duy; Lu, Anh Khoa; Mai, Xuan Toan; Tran, Tuan Anh

doi:10.1007/978-3-031-41734-4_32

Nam Quan Nguyen¹¹,
Anh Duy Le¹¹,
Anh Khoa Lu¹¹,
Xuan Toan Mai¹² &
…
Tuan Anh Tran¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

International Conference on Document Analysis and Recognition

1098 Accesses
2 Citations

Abstract

Table structure recognition (TSR) task is indispensable in a robust document analysis system. Recently, the split-and-merge-based approach has attracted many researchers to develop the TSR problem. It is a two-stage method: firstly, split table region into row/column separation and obtain grid cells of the table; then recover spanning cells by merging some grid cells and complete the table structure. Most recent proposals focus on the first stage, with few solutions for the merge task. Therefore, this paper proposes a novel method to recover spanning cells using Transformer networks called Formerge. This model contains a Transformer encoder and two parallel left-right/top-down decoders. With grid structure output from a split branch, Formerge extracts cell features with RoIAlign and passes them into the encoder to enhance features before decoding to detect spanning cells. Our technique outperforms other methods on two benchmark datasets, including SciTSR and ICDAR19-cTDaR modern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DTSM: Toward Dense Table Structure Recognition with Text Query Encoder and Adjacent Feature Aggregator

Deep-learning and graph-based approach to table structure recognition

Article 30 December 2021

Multi-Type-TD-TSR – Extracting Tables from Document Images Using a Multi-stage Pipeline for Table Detection and Table Structure Recognition: From OCR to Structured Table Representations

Notes

References

Tran, T.A., Tran, H.T., Na, I.S., Lee, G.S., Yang, H.J., Kim, S.H.: A mixture model using Random Rotation Bounding Box to detect table region in document image. J. Vis. Commun. Image Represent. 39, 196–208 (2016)
Article Google Scholar
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)
Khan, S.A., Khalid, S.M.D., Shahzad, M.A., Shafait, F.: Table structure extraction with bi-directional gated recurrent unit networks. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, pp. 1366–1371 (2019)
Google Scholar
Kieninger, T., Dengel, A.: Table recognition and labeling using intrinsic layout features. In: International Conference on Advances in Pattern Recognition, pp. 307–316 (1999)
Google Scholar
Tensmeyer, C., Morariu, V.I., Price, B.L., Cohen, S., Martinez, T.R.: Deep splitting and merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, pp. 114–121 (2019)
Google Scholar
Li, Y., et al.: Rethinking table structure recognition using sequence labeling methods. In: Document Analysis and Recognition-ICDAR 2021: 16th International Conference, Lausanne, Switzerland, 5–10 September 2021, Proceedings, Part II 16, pp. 541–553 (2021)
Google Scholar
Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: an accurate table structure recognizer. Pattern Recogn. 126, 108565 (2022)
Article Google Scholar
Ma, C., Lin, W., Sun, L., Huo, Q.: Robust table detection and structure recognition from heterogeneous document images. Pattern Recogn. 133, 109006 (2023)
Article Google Scholar
Lin, W., et al.: TSRFormer: Table Structure Recognition with Transformers. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6473–6482 (2022)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, pp. 770–778 (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423
Zhang, J., Elhoseiny, M., Cohen, S., Chang, W., Elgammal, A.: Relationship proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5678–5686 (2017)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)
Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34
Chapter Google Scholar
Chen, J., Lopresti, D.: Model-based tabular structure detection and recognition in noisy handwritten documents. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 75–80 (2012)
Google Scholar
Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (cTDaR). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)
Google Scholar
Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1295–1304 (2021)
Google Scholar
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances In Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)
Google Scholar
Itonori, K.: Table structure recognition based on textblock arrangement and ruled line position. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR 1993), pp. 765–768 (1993)
Google Scholar
Wang, Y., Phillips, I.T., Haralick, R.M.: Table structure understanding and its performance evaluation. Pattern Recogn. 37(7), 1479–1497 (2004)
Article Google Scholar
Tran, T.A., Tran, H.T., Na, Kim, S.H.: A hybrid method for table detection from document image. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 131–135 (2015)
Google Scholar
Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: Spatial cnn for traffic scene understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)
Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)
Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: Table benchmark for image-based table detection and recognition. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 1918–1925 (2020)
Google Scholar
Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1403–1409 (2019)
Google Scholar
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1162–1167 (2017)
Google Scholar
Smock, B., Pesala, R., Abraham, R.: PubTables-1M: towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4634–4642 (2022)
Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)
Article Google Scholar
Siddiqui, S.A., Khan, P.I., Dengel, A., Ahmed, S.: Rethinking semantic segmentation for table structure recognition in document. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1397–1402 (2019)
Google Scholar
Qiao, L., et al.: Lgpma: complicated table structure recognition with local and global pyramid mask alignment. In: Document Analysis and Recognition-ICDAR 2021: 16th International Conference, pp. 99–114 (2021)
Google Scholar
Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901 (2019)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chapter Google Scholar
Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)
Article Google Scholar

Download references

Acknowledgments

We acknowledge Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for supporting this study.

Author information

Authors and Affiliations

Viettel Cyberspace Center, Viettel Group, Vietnam Lot D26 Cau Giay New Urban Area, Yen Hoa Ward, Hanoi, Cau Giay District, Vietnam
Nam Quan Nguyen, Anh Duy Le & Anh Khoa Lu
Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology (HCMUT), VNU-HCM, Ho Chi Minh City, Vietnam
Xuan Toan Mai & Tuan Anh Tran

Authors

Nam Quan Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Anh Duy Le
View author publications
You can also search for this author in PubMed Google Scholar
Anh Khoa Lu
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Toan Mai
View author publications
You can also search for this author in PubMed Google Scholar
Tuan Anh Tran
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Nam Quan Nguyen or Tuan Anh Tran .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, N.Q., Le, A.D., Lu, A.K., Mai, X.T., Tran, T.A. (2023). Formerge: Recover Spanning Cells in Complex Table Structure Using Transformer Network. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-031-41734-4_32
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41733-7
Online ISBN: 978-3-031-41734-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Formerge: Recover Spanning Cells in Complex Table Structure Using Transformer Network