Skip to main content
Log in

TableSegNet: a fully convolutional network for table detection and segmentation in document images

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Advances in image object detection lead to applying deep convolution neural networks in the document image analysis domain. Unlike general colorful and pattern-rich objects, tables in document images have properties that limit the capacity of deep learning structures. Significant variation in size and aspect ratio and the local similarity among document components are the main challenges that require both global features for detection and local features for the separation of nearby objects. To deal with these challenges, we present TableSegNet, a compact architecture of a fully convolutional network to detect and separate tables simultaneously. TableSegNet consists of a deep convolution path to detect table regions in low resolution and a shallower path to locate table locations in high resolution and split the detected regions into individual tables. To improve the detection and separation capacity, TableSegNet uses convolution blocks of wide kernel sizes in the feature extraction process and an additional table-border class in the main output. With only 8.1 million parameters and trained purely on document images from the beginning, TableSegNet has achieved state-of-the-art F1 score at the IoU threshold of 0.9 on the ICDAR2019 and the highest number of correctly detected tables on the ICDAR2013 table detection datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Source code is available at: https://github.com/DungDuc/TableSegNet

  2. https://github.com/sgrpanchal31/table-detection-dataset

  3. The ’?’ mark in Table 4 means there is a confusing number in the paper. The F1 score at 0.9 IoU threshold was 0.901, which is equal to the weighted average F1 value.

References

  1. Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, Association for Computing Machinery, New York, NY, USA, DAS ’10, p 65-72, https://doi.org/10.1145/1815330.1815339 (2010)

  2. Tran, D.N., Tran, T.A., Oh, A., Kim, S.H., Na, I.S.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 11(4), 77–85 (2015)

    Article  Google Scholar 

  3. Girshick, R.B.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV) pp 1440–1448 (2015)

  4. Redmon, J., Divvala, S., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 779–788 (2016)

  5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016, pp 21–37 (2016)

  6. Olafand, Ronneberger P., Fischer, Brox T.: U-net: convolutional networks for biomedical image segmentation. Med.l Image Comput. Comput. Assist. Interv. MICCAI 2015, 234–241 (2015)

  7. Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)

    Article  Google Scholar 

  8. Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 142–158 (2016)

    Article  Google Scholar 

  9. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015)

    Article  Google Scholar 

  10. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. (2014) arXiv preprint arXiv:1409.1556

  11. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 770–778 (2016)

  12. Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2261–2269 (2017)

  13. Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., Xiao, B.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021). https://doi.org/10.1109/TPAMI.2020.2983686

    Article  Google Scholar 

  14. Zhang, M., Liu, G., Tina, J., Liu, Y.: Improved u-net with multi-scale cross connection and dilated convolution for brain tumor segmentation. In: 2019 International Conference on Medical Imaging Physics and Engineering (ICMIPE) pp 1–5 (2019)

  15. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 3431–3440 (2015)

  16. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 1162–1167 (2017)

  17. Siddiqui, S.A., Malik, M.I., Agne, S., Dengel, A., Ahmed, S.: Decnt: deep deformable cnn for table detection. IEEE Access 6, 74151–74161 (2018). https://doi.org/10.1109/ACCESS.2018.2880211

    Article  Google Scholar 

  18. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp 2439–2447 (2020)

  19. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 818–833. Springer International Publishing, Cham (2014)

    Chapter  Google Scholar 

  20. Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 6154–6162 (2018)

  21. Kavasidis, I., Pino, C., Palazzo, S., Rundo, F., Giordano, D., Messina, P., Spampinato, C.: A saliency-based convolutional neural network for table and chart detection in digitized documents. Image Anal. Process. ICIAP 2019, 292–302 (2019)

    Google Scholar 

  22. Oliveira, S.A., Seguin, B., Kaplan, F.: dhsegment: A generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) pp 7–12 (2018)

  23. Paliwal, S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR) pp 128–133 (2019)

  24. Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 771–776 (2017)

  25. Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Giles, C.L.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4342–4351 (2017)

  26. He, D., Cohen, S., Price, B., Kifer, D., Giles, C.L.: Multi-scale multi-task fcn for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 254–261, (2017) https://doi.org/10.1109/ICDAR.2017.50

  27. Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155. (2019)

  28. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1–9 (2015)

  29. Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 325–341 (2018)

  30. Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. arXiv preprint arXiv:2004.02147 (2020)

  31. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6230–6239 (2017)

  32. Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1395–1403 (2015)

  33. Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, PMLR, pp 562–570 (2015)

  34. Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision - ECCV 2020, pp. 726–743. Springer International Publishing, New York (2020)

    Chapter  Google Scholar 

  35. Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 5168–5177 (2017)

  36. Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9716–9725 (2021)

  37. Liu, F., Lin, G., Shen, C.: Crf learning with cnn features for image segmentation. Pattern Recognit. 48, 2983–2992 (2015)

    Article  Google Scholar 

  38. Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)

    Article  Google Scholar 

  39. Seo, W., Koo, H., Cho, N.I.: Junction-based table detection in camera-captured document images. Int. J. Doc. Anal. Recognit. (IJDAR) 18, 47–57 (2014)

    Article  Google Scholar 

  40. Gao, L., Huang, Y., Déjean, H., Meunier, J.L., Yan, Q., Fang, Y., Kleber, F., Lang, E.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 1510–1515, (2019) https://doi.org/10.1109/ICDAR.2019.00243

  41. Göbel, M., Hassan, T.A., Oro, E., Orsi, G.: Icdar 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition pp 1449–1453 (2013)

  42. D Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems pp 445–449 (2012)

  43. Silva, A.: Parts that add up to a whole: a framework for the analysis of tables. PhD thesis, Edinburgh University, UK (2010)

  44. Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807, https://doi.org/10.1109/CVPR.2017.195 (2017)

  45. Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. (2015) arXiv preprint arXiv:1511.07122

Download references

Acknowledgements

We would like to thank anonymous reviewers for their valuable comments and suggestions on the submission version of this paper. This work was supported in part by the Vietnam Academy of Science and Technology under the grand NCVCC02.03/21-21 and VAST01.02/19-20 “Development of deep learning methods for document analysis and recognition.”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duc-Dung Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, DD. TableSegNet: a fully convolutional network for table detection and segmentation in document images. IJDAR 25, 1–14 (2022). https://doi.org/10.1007/s10032-021-00390-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-021-00390-4

Keywords

Navigation