Skip to main content

PyramidTabNet: Transformer-Based Table Recognition inĀ Image-Based Documents

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14191))

Included in the following conference series:

  • 716 Accesses

Abstract

Table detection and structure recognition is an important component of document analysis systems. Deep learning-based transformer models have recently demonstrated significant success in various computer vision and document analysis tasks. In this paper, we introduce PyramidTabNet (PTN), a method that builds upon Convolution-less Pyramid Vision Transformer to detect tables in document images. Furthermore, we present a tabular image generative augmentation technique to effectively train the architecture. The proposed augmentation process consists of three steps, namely, clustering, fusion, and patching, for the generation of new document images containing tables. Our proposed pipeline demonstrates significant performance improvements for table detection on several standard datasets. Additionally, it achieves performance comparable to the state-of-the-art methods for structure recognition tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agarwal, M., Mondal, A., Jawahar, C.: CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9491ā€“9498. IEEE (2021)

    Google ScholarĀ 

  2. Arif, S., Shafait, F.: Table Detection in Document Images using Foreground and Background Features. In: 2018 20th Digital Image Computing: Techniques and Applications (DICTA), pp. 1ā€“8. IEEE (2018)

    Google ScholarĀ 

  3. Cai, Z., Vasconcelos, N.: Cascade R-CNN: Delving Into High Quality Object Detection. In: 2018 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154ā€“6162. IEEE (2018)

    Google ScholarĀ 

  4. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-End Object Detection With Transformers. In: 2020 16th European Conference on Computer Vision (ECCV), pp. 213ā€“229. Springer (2020)

    Google ScholarĀ 

  5. Chi, Z., Huang, H., Xu, H.D., Yu, H., Yin, W., Mao, X.L.: Complicated Table Structure Recognition. arXiv preprint arXiv:1908.04729 (2019)

  6. Dai, J., et al.: Deformable Convolutional Networks. In: 2017 16th International Conference on Computer Vision (ICCV), pp. 764ā€“773. IEEE (2017)

    Google ScholarĀ 

  7. Duan, D., Xie, M., Mo, Q., Han, Z., Wan, Y.: An Improved Hough Transform for Line Detection. In: 2010 International Conference on Computer Application and System Modeling (ICCASM). vol. 2, pp. 354ā€“357 (2010)

    Google ScholarĀ 

  8. Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, Ground-Truth and Performance Metrics for Table Detection Evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems (DAS), pp. 445ā€“449 (2012)

    Google ScholarĀ 

  9. Fernandes, J., Simsek, M., Kantarci, B., Khan, S.: TableDet: An End-to-End Deep Learning Approach for Table Detection and Table Image Classification in Data Sheet Images. In: Neurocomputing. vol. 468, pp. 317ā€“334. Elsevier (2022)

    Google ScholarĀ 

  10. Gao, L., et al.: ICDAR 2019 Competition on Table Detection and Recognition (cTDaR). In: 2019 16th International Conference on Document Analysis and Recognition (ICDAR), pp. 1510ā€“1515 (2019)

    Google ScholarĀ 

  11. Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR 2017 Competition on Page Object Detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 1, pp. 1417ā€“1422 (2017)

    Google ScholarĀ 

  12. Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table Detection Using Deep Learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 1, pp. 771ā€“776. IEEE (2017)

    Google ScholarĀ 

  13. Gƶbel, M., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 Table Competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1449ā€“1453 (2013)

    Google ScholarĀ 

  14. Hashmi, K.A., Pagani, A., Liwicki, M., Stricker, D., Afzal, M.Z.: CasTabDetectoRS: Cascade Network for Table Detection in Document Images With Recursive Feature Pyramid and Switchable Atrous Convolution. In: Journal of Imaging. vol. 7, p. 214. MDPI (2021)

    Google ScholarĀ 

  15. Hashmi, K.A., Stricker, D., Liwicki, M., Afzal, M.N., Afzal, M.Z.: Guided Table Structure Recognition Through Anchor Optimization. In: IEEE Access. vol. 9, pp. 113521ā€“113534. IEEE (2021)

    Google ScholarĀ 

  16. Khan, S.A., Khalid, S.M.D., Shahzad, M.A., Shafait, F.: Table Structure Extraction with Bi-Directional Gated Recurrent Unit Networks. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1366ā€“1371. IEEE (2019)

    Google ScholarĀ 

  17. Khan, U., Zahid, S., Ali, M.A., Ul-Hasan, A., Shafait, F.: TabAug: Data Driven Augmentation for Enhanced Table Structure Recognition. In: 2021 16th IAPR International Conference on Document Analysis and Recognition (ICDAR). vol. 2, pp. 585ā€“601. Springer (2021)

    Google ScholarĀ 

  18. Li, J., Xu, Y., Lv, T., Cui, L., Zhang, C., Wei, F.: DiT: Self-Supervised Pre-training for Document Image Transformer. In: 2022 30th ACM International Conference on Multimedia (ACM MM), pp. 3530ā€“3539 (2022)

    Google ScholarĀ 

  19. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: Table Benchmark for Image-Based Table Detection and Recognition. In: 2020 12th Language Resources and Evaluation Conference (LREC), pp. 1918ā€“1925 (2020)

    Google ScholarĀ 

  20. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740ā€“755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    ChapterĀ  Google ScholarĀ 

  21. Ma, C., Lin, W., Sun, L., Huo, Q.: Robust Table Detection and Structure Recognition from Heterogeneous Document Images. In: Pattern Recognition. vol. 133, p. 109006. Elsevier (2023)

    Google ScholarĀ 

  22. Nazir, D., Hashmi, K.A., Pagani, A., Liwicki, M., Stricker, D., Afzal, M.Z.: Hybridtabnet: Towards Better Table Detection in Scanned Document Images. In: Applied Sciences. vol. 11, p. 8396. MDPI (2021)

    Google ScholarĀ 

  23. Paliwal, S.S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: TableNet: Deep Learning Model for End-To-End Table Detection and Tabular Data Extraction from Scanned Document Images. In: 2019 15th International Conference on Document Analysis and Recognition (ICDAR), pp. 128ā€“133. IEEE (2019)

    Google ScholarĀ 

  24. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: An Approach for End-to-End Table Detection and Structure Recognition from Image-Based Documents. In: 2020 Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 572ā€“573 (2020)

    Google ScholarĀ 

  25. Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking Table Recognition Using Graph Neural Networks. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 142ā€“147. IEEE (2019)

    Google ScholarĀ 

  26. Raja, S., Mondal, A., Jawahar, C.V.: Table structure recognition using top-down and bottom-up cues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12373, pp. 70ā€“86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58604-1_5

    ChapterĀ  Google ScholarĀ 

  27. Raja, S., Mondal, A., Jawahar, C.: Visual Understanding of Complex Table Structures from Document Images. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2299ā€“2308 (2022)

    Google ScholarĀ 

  28. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1162ā€“1167 (2017)

    Google ScholarĀ 

  29. Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An Open Approach Towards The Benchmarking of Table Structure Recognition Systems. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp. 113ā€“120 (2010)

    Google ScholarĀ 

  30. Siddiqui, S.A., Fateh, I.A., Rizvi, S.T.R., Dengel, A., Ahmed, S.: DeepTabStR: Deep Learning Based Table Structure Recognition. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1403ā€“1409 (2019)

    Google ScholarĀ 

  31. Siddiqui, S.A., Malik, M.I., Agne, S., Dengel, A., Ahmed, S.: DeCNT: Deep Deformable CNN for Table Detection. In: IEEE Access. vol. 6, pp. 74151ā€“74161. IEEE (2018)

    Google ScholarĀ 

  32. Smock, B., Pesala, R., Abraham, R.: PubTables-1M: Towards Comprehensive Table Extraction from Unstructured Documents. In: 2022 Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4634ā€“4642 (2022)

    Google ScholarĀ 

  33. Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep Splitting and Merging for Table Structure Decomposition. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 114ā€“121. IEEE (2019)

    Google ScholarĀ 

  34. Wang, W., et al.: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions. In: 2021 17th International Conference on Computer Vision (ICCV), pp. 568ā€“578. IEEE (2021)

    Google ScholarĀ 

  35. Wang, W., et al.: PVT v2: improved baselines with pyramid vision transformer. Comput. Visual Media 8, 1ā€“10 (2022). https://doi.org/10.1007/s41095-022-0274-8

    ArticleĀ  Google ScholarĀ 

  36. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global Table Extractor (GTE): A Framework for Joint Table Identification and Cell Structure Recognition using Visual Context. In: 2021 Winter Conference on Applications of Computer Vision (WACV), pp. 697ā€“706 (2021)

    Google ScholarĀ 

  37. Zhong, X., Tang, J., Yepes, A.J.: PubLayNet: Largest Dataset Ever for Document Layout Analysis. In: 2019 15th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 1015ā€“1022. IEEE (2019)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Ahmed Mohsin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Umer, M., Mohsin, M.A., Ul-Hasan, A., Shafait, F. (2023). PyramidTabNet: Transformer-Based Table Recognition inĀ Image-Based Documents. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14191. Springer, Cham. https://doi.org/10.1007/978-3-031-41734-4_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41734-4_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41733-7

  • Online ISBN: 978-3-031-41734-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics