Skip to main content

Formerge: Recover Spanning Cells in Complex Table Structure Using Transformer Network

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

Abstract

Table structure recognition (TSR) task is indispensable in a robust document analysis system. Recently, the split-and-merge-based approach has attracted many researchers to develop the TSR problem. It is a two-stage method: firstly, split table region into row/column separation and obtain grid cells of the table; then recover spanning cells by merging some grid cells and complete the table structure. Most recent proposals focus on the first stage, with few solutions for the merge task. Therefore, this paper proposes a novel method to recover spanning cells using Transformer networks called Formerge. This model contains a Transformer encoder and two parallel left-right/top-down decoders. With grid structure output from a split branch, Formerge extracts cell features with RoIAlign and passes them into the encoder to enhance features before decoding to detect spanning cells. Our technique outperforms other methods on two benchmark datasets, including SciTSR and ICDAR19-cTDaR modern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/Academic-Hammer/SciTSR.

  2. 2.

    https://github.com/cndplab-founder/ctdar_measurement_tool.

References

  1. Tran, T.A., Tran, H.T., Na, I.S., Lee, G.S., Yang, H.J., Kim, S.H.: A mixture model using Random Rotation Bounding Box to detect table region in document image. J. Vis. Commun. Image Represent. 39, 196–208 (2016)

    Article  Google Scholar 

  2. Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)

  3. Khan, S.A., Khalid, S.M.D., Shahzad, M.A., Shafait, F.: Table structure extraction with bi-directional gated recurrent unit networks. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, pp. 1366–1371 (2019)

    Google Scholar 

  4. Kieninger, T., Dengel, A.: Table recognition and labeling using intrinsic layout features. In: International Conference on Advances in Pattern Recognition, pp. 307–316 (1999)

    Google Scholar 

  5. Tensmeyer, C., Morariu, V.I., Price, B.L., Cohen, S., Martinez, T.R.: Deep splitting and merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition, ICDAR 2019, pp. 114–121 (2019)

    Google Scholar 

  6. Li, Y., et al.: Rethinking table structure recognition using sequence labeling methods. In: Document Analysis and Recognition-ICDAR 2021: 16th International Conference, Lausanne, Switzerland, 5–10 September 2021, Proceedings, Part II 16, pp. 541–553 (2021)

    Google Scholar 

  7. Zhang, Z., Zhang, J., Du, J., Wang, F.: Split, embed and merge: an accurate table structure recognizer. Pattern Recogn. 126, 108565 (2022)

    Article  Google Scholar 

  8. Ma, C., Lin, W., Sun, L., Huo, Q.: Robust table detection and structure recognition from heterogeneous document images. Pattern Recogn. 133, 109006 (2023)

    Article  Google Scholar 

  9. Lin, W., et al.: TSRFormer: Table Structure Recognition with Transformers. In: Proceedings of the 30th ACM International Conference on Multimedia, pp. 6473–6482 (2022)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2016, pp. 770–778 (2016)

    Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186 (2019). https://doi.org/10.18653/v1/N19-1423

  12. Zhang, J., Elhoseiny, M., Cohen, S., Chang, W., Elgammal, A.: Relationship proposal networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5678–5686 (2017)

    Google Scholar 

  13. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: Proceedings of the IEEE international Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  14. Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated table structure recognition. arXiv preprint arXiv:1908.04729 (2019)

  15. Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34

    Chapter  Google Scholar 

  16. Chen, J., Lopresti, D.: Model-based tabular structure detection and recognition in noisy handwritten documents. In: 2012 International Conference on Frontiers in Handwriting Recognition, pp. 75–80 (2012)

    Google Scholar 

  17. Gao, L., et al.: ICDAR 2019 competition on table detection and recognition (cTDaR). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1510–1515 (2019)

    Google Scholar 

  18. Xue, W., Yu, B., Wang, W., Tao, D., Li, Q.: Tgrnet: A table graph reconstruction network for table structure recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1295–1304 (2021)

    Google Scholar 

  19. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  20. Vaswani, A., et al.: Attention is all you need. In: Advances In Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)

    Google Scholar 

  21. Itonori, K.: Table structure recognition based on textblock arrangement and ruled line position. In: Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR 1993), pp. 765–768 (1993)

    Google Scholar 

  22. Wang, Y., Phillips, I.T., Haralick, R.M.: Table structure understanding and its performance evaluation. Pattern Recogn. 37(7), 1479–1497 (2004)

    Article  Google Scholar 

  23. Tran, T.A., Tran, H.T., Na, Kim, S.H.: A hybrid method for table detection from document image. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 131–135 (2015)

    Google Scholar 

  24. Pan, X., Shi, J., Luo, P., Wang, X., Tang, X.: Spatial as deep: Spatial cnn for traffic scene understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  25. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 572–573 (2020)

    Google Scholar 

  26. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017)

  27. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: Tablebank: Table benchmark for image-based table detection and recognition. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 1918–1925 (2020)

    Google Scholar 

  28. Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: Deeptabstr: Deep learning based table structure recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1403–1409 (2019)

    Google Scholar 

  29. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1162–1167 (2017)

    Google Scholar 

  30. Smock, B., Pesala, R., Abraham, R.: PubTables-1M: towards comprehensive table extraction from unstructured documents. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4634–4642 (2022)

    Google Scholar 

  31. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9, 62–66 (1979)

    Article  Google Scholar 

  32. Siddiqui, S.A., Khan, P.I., Dengel, A., Ahmed, S.: Rethinking semantic segmentation for table structure recognition in document. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1397–1402 (2019)

    Google Scholar 

  33. Qiao, L., et al.: Lgpma: complicated table structure recognition with local and global pyramid mask alignment. In: Document Analysis and Recognition-ICDAR 2021: 16th International Conference, pp. 99–114 (2021)

    Google Scholar 

  34. Deng, Y., Rosenberg, D., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 894–901 (2019)

    Google Scholar 

  35. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13

    Chapter  Google Scholar 

  36. Zhang, J., et al.: Watch, attend and parse: an end-to-end neural network based approach to handwritten mathematical expression recognition. Pattern Recogn. 71, 196–206 (2017)

    Article  Google Scholar 

Download references

Acknowledgments

We acknowledge Ho Chi Minh City University of Technology (HCMUT), VNU-HCM for supporting this study.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Nam Quan Nguyen or Tuan Anh Tran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, N.Q., Le, A.D., Lu, A.K., Mai, X.T., Tran, T.A. (2023). Formerge: Recover Spanning Cells in Complex Table Structure Using Transformer Network. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41734-4_32

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41733-7

  • Online ISBN: 978-3-031-41734-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics