TableSegNet: a fully convolutional network for table detection and segmentation in document images

Nguyen, Duc-Dung

doi:10.1007/s10032-021-00390-4

TableSegNet: a fully convolutional network for table detection and segmentation in document images

Original Paper
Published: 22 November 2021

Volume 25, pages 1–14, (2022)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Duc-Dung Nguyen ORCID: orcid.org/0000-0001-9867-0106¹

800 Accesses
6 Citations
Explore all metrics

Abstract

Advances in image object detection lead to applying deep convolution neural networks in the document image analysis domain. Unlike general colorful and pattern-rich objects, tables in document images have properties that limit the capacity of deep learning structures. Significant variation in size and aspect ratio and the local similarity among document components are the main challenges that require both global features for detection and local features for the separation of nearby objects. To deal with these challenges, we present TableSegNet, a compact architecture of a fully convolutional network to detect and separate tables simultaneously. TableSegNet consists of a deep convolution path to detect table regions in low resolution and a shallower path to locate table locations in high resolution and split the detected regions into individual tables. To improve the detection and separation capacity, TableSegNet uses convolution blocks of wide kernel sizes in the feature extraction process and an additional table-border class in the main output. With only 8.1 million parameters and trained purely on document images from the beginning, TableSegNet has achieved state-of-the-art F1 score at the IoU threshold of 0.9 on the ICDAR2019 and the highest number of correctly detected tables on the ICDAR2013 table detection datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

Notes

Source code is available at: https://github.com/DungDuc/TableSegNet
https://github.com/sgrpanchal31/table-detection-dataset
The ’?’ mark in Table 4 means there is a confusing number in the paper. The F1 score at 0.9 IoU threshold was 0.901, which is equal to the weighted average F1 value.

References

Shafait, F., Smith, R.: Table detection in heterogeneous documents. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, Association for Computing Machinery, New York, NY, USA, DAS ’10, p 65-72, https://doi.org/10.1145/1815330.1815339 (2010)
Tran, D.N., Tran, T.A., Oh, A., Kim, S.H., Na, I.S.: Table detection from document image using vertical arrangement of text blocks. Int. J. Contents 11(4), 77–85 (2015)
Article Google Scholar
Girshick, R.B.: Fast r-cnn. In: 2015 IEEE International Conference on Computer Vision (ICCV) pp 1440–1448 (2015)
Redmon, J., Divvala, S., Girshick, R.B., Farhadi, A.: You only look once: Unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision – ECCV 2016, pp 21–37 (2016)
Olafand, Ronneberger P., Fischer, Brox T.: U-net: convolutional networks for biomedical image segmentation. Med.l Image Comput. Comput. Assist. Interv. MICCAI 2015, 234–241 (2015)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2481–2495 (2017)
Article Google Scholar
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38, 142–158 (2016)
Article Google Scholar
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2015)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. (2014) arXiv preprint arXiv:1409.1556
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 770–778 (2016)
Huang, G., Liu, Z., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 2261–2269 (2017)
Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., Xiao, B.: Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3349–3364 (2021). https://doi.org/10.1109/TPAMI.2020.2983686
Article Google Scholar
Zhang, M., Liu, G., Tina, J., Liu, Y.: Improved u-net with multi-scale cross connection and dilated convolution for brain tumor segmentation. In: 2019 International Conference on Medical Imaging Physics and Engineering (ICMIPE) pp 1–5 (2019)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 3431–3440 (2015)
Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: Deepdesrt: Deep learning for detection and structure recognition of tables in document images. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 1162–1167 (2017)
Siddiqui, S.A., Malik, M.I., Agne, S., Dengel, A., Ahmed, S.: Decnt: deep deformable cnn for table detection. IEEE Access 6, 74151–74161 (2018). https://doi.org/10.1109/ACCESS.2018.2880211
Article Google Scholar
Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: Cascadetabnet: An approach for end to end table detection and structure recognition from image-based documents. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) pp 2439–2447 (2020)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision - ECCV 2014, pp. 818–833. Springer International Publishing, Cham (2014)
Chapter Google Scholar
Cai, Z., Vasconcelos, N.: Cascade r-cnn: delving into high quality object detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp 6154–6162 (2018)
Kavasidis, I., Pino, C., Palazzo, S., Rundo, F., Giordano, D., Messina, P., Spampinato, C.: A saliency-based convolutional neural network for table and chart detection in digitized documents. Image Anal. Process. ICIAP 2019, 292–302 (2019)
Google Scholar
Oliveira, S.A., Seguin, B., Kaplan, F.: dhsegment: A generic deep-learning approach for document segmentation. In: 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR) pp 7–12 (2018)
Paliwal, S., Vishwanath, D., Rahul, R., Sharma, M., Vig, L.: Tablenet: Deep learning model for end-to-end table detection and tabular data extraction from scanned document images. In: 2019 International Conference on Document Analysis and Recognition (ICDAR) pp 128–133 (2019)
Gilani, A., Qasim, S.R., Malik, I., Shafait, F.: Table detection using deep learning. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 771–776 (2017)
Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., Giles, C.L.: Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4342–4351 (2017)
He, D., Cohen, S., Price, B., Kifer, D., Giles, C.L.: Multi-scale multi-task fcn for semantic page segmentation and table detection. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 01, pp 254–261, (2017) https://doi.org/10.1109/ICDAR.2017.50
Chen, K., Wang, J., Pang, J., Cao, Y., Xiong, Y., Li, X., Sun, S., Feng, W., Liu, Z., Xu, J., Zhang, Z., Cheng, D., Zhu, C., Cheng, T., Zhao, Q., Li, B., Lu, X., Zhu, R., Wu, Y., Dai, J., Wang, J., Shi, J., Ouyang, W., Loy, C.C., Lin, D.: Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155. (2019)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 1–9 (2015)
Yu, C., Wang, J., Peng, C., Gao, C., Yu, G., Sang N.: Bisenet: Bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 325–341 (2018)
Yu, C., Gao, C., Wang, J., Yu, G., Shen, C., Sang, N.: Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. arXiv preprint arXiv:2004.02147 (2020)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 6230–6239 (2017)
Xie, S., Tu, Z.: Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp 1395–1403 (2015)
Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: Artificial Intelligence and Statistics, PMLR, pp 562–570 (2015)
Cao, B., Araujo, A., Sim, J.: Unifying deep local and global features for image search. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) Computer Vision - ECCV 2020, pp. 726–743. Springer International Publishing, New York (2020)
Chapter Google Scholar
Lin, G., Milan, A., Shen, C., Reid, I.: Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp 5168–5177 (2017)
Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., Wei, X.: Rethinking bisenet for real-time semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 9716–9725 (2021)
Liu, F., Lin, G., Shen, C.: Crf learning with cnn features for image segmentation. Pattern Recognit. 48, 2983–2992 (2015)
Article Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2018)
Article Google Scholar
Seo, W., Koo, H., Cho, N.I.: Junction-based table detection in camera-captured document images. Int. J. Doc. Anal. Recognit. (IJDAR) 18, 47–57 (2014)
Article Google Scholar
Gao, L., Huang, Y., Déjean, H., Meunier, J.L., Yan, Q., Fang, Y., Kleber, F., Lang, E.: Icdar 2019 competition on table detection and recognition (ctdar). In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp 1510–1515, (2019) https://doi.org/10.1109/ICDAR.2019.00243
Göbel, M., Hassan, T.A., Oro, E., Orsi, G.: Icdar 2013 table competition. In: 2013 12th International Conference on Document Analysis and Recognition pp 1449–1453 (2013)
D Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, ground-truth and performance metrics for table detection evaluation. In: 2012 10th IAPR International Workshop on Document Analysis Systems pp 445–449 (2012)
Silva, A.: Parts that add up to a whole: a framework for the analysis of tables. PhD thesis, Edinburgh University, UK (2010)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1800–1807, https://doi.org/10.1109/CVPR.2017.195 (2017)
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. (2015) arXiv preprint arXiv:1511.07122

Download references

Acknowledgements

We would like to thank anonymous reviewers for their valuable comments and suggestions on the submission version of this paper. This work was supported in part by the Vietnam Academy of Science and Technology under the grand NCVCC02.03/21-21 and VAST01.02/19-20 “Development of deep learning methods for document analysis and recognition.”.

Author information

Authors and Affiliations

Department of Pattern Recognition and Knowledge Engineering, Institute of Information Technology, VAST, 18 Hoang Quoc Viet, Hanoi, 10000, Vietnam
Duc-Dung Nguyen

Authors

Duc-Dung Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Duc-Dung Nguyen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, DD. TableSegNet: a fully convolutional network for table detection and segmentation in document images. IJDAR 25, 1–14 (2022). https://doi.org/10.1007/s10032-021-00390-4

Download citation

Received: 21 August 2021
Revised: 12 October 2021
Accepted: 26 October 2021
Published: 22 November 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10032-021-00390-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TableSegNet: a fully convolutional network for table detection and segmentation in document images

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TableSegNet: a fully convolutional network for table detection and segmentation in document images

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation